
Teradata’s Unified Data Architecture is a powerful combination of Teradata, Aster, and Hadoop in a single platform. Viewpoint has always provided monitoring and management of Teradata systems and launched support for monitoring of Aster in Viewpoint 14.01. In order to complete Viewpoint’s monitoring of the different systems in Teradata’s Unified Data Architecture, Viewpoint 14.10 includes support for monitoring of Hadoop running in this architecture.
The biggest technical challenge Viewpoint faced when monitoring a Hadoop system was how to reliably and easily collect the necessary data from Hadoop. The different components of Hadoop expose their data in a variety of different ways, including using Ganglia, Nagios, JMX, and some really ugly web interfaces. There are two primary issues with using these existing technologies for Hadoop monitoring: parsing the data from each different interface and being able to locate and connect to these interfaces on each Hadoop node. Each of these technologies exposes their data in a different format, and it would take significant development time to properly parse the data from each source. There’s also a challenge in locating and communicating with the nodes to obtain this data. Just to collect data from the namenode and jobtracker, the location of these services would have to be configured or discovered, and then failover would have to be accounted for as well. Expanding the monitoring solution beyond that to collect data from every node poses both connectivity and security issues as well. Surely there must be a better way!
Luckily Apache Ambari addresses all of these technical challenges by providing a collection of RESTful APIs from which a plethora of Hadoop monitoring data can be obtained. Ambari handles the work of collecting the monitoring data from a variety of the monitoring technologies mentioned above. It then aggregates this data and provides a series of RESTful APIs. These APIs can all be accessed by making web service calls against a central node in the Hadoop cluster. All data is provided in JSON format so it can easily be parsed by just about any programming language.
Since Viewpoint is written in Java and uses the Spring Framework quite extensively, Spring’s RestTemplate class was a natural choice for calling the RESTful APIs and parsing the results into Java model objects. Here is some sample code to demonstrate the collection of the number of running MapReduce jobs, map tasks, and reduce tasks from Ambari.
package com.teradata.viewpoint.ambari; import java.io.IOException; import java.net.HttpURLConnection; import java.util.ArrayList; import java.util.List; import org.apache.commons.codec.binary.Base64; import org.codehaus.jackson.annotate.JsonProperty; import org.codehaus.jackson.map.DeserializationConfig; import org.springframework.http.MediaType; import org.springframework.http.client.SimpleClientHttpRequestFactory; import org.springframework.http.converter.HttpMessageConverter; import org.springframework.http.converter.json.MappingJacksonHttpMessageConverter; import org.springframework.web.client.RestTemplate; public class AmbariClient { private String host; private String clusterName; private String user; private String password; private RestTemplate restTemplate; public AmbariClient(String host, String clusterName, String user, String password) { this.host = host; this.clusterName = clusterName; this.user = user; this.password = password; List<MediaType> supportedMediaTypes = new ArrayList<MediaType>(); MediaType plainTextType = new MediaType("text", "plain"); MediaType jsonType = new MediaType("application", "json"); supportedMediaTypes.add(plainTextType); supportedMediaTypes.add(jsonType); MappingJacksonHttpMessageConverter mappingJacksonHttpMessageConverter = new MappingJacksonHttpMessageConverter(); mappingJacksonHttpMessageConverter.setSupportedMediaTypes(supportedMediaTypes); mappingJacksonHttpMessageConverter.getObjectMapper().configure( DeserializationConfig.Feature.FAIL_ON_UNKNOWN_PROPERTIES, false); List<HttpMessageConverter<?>> messageConverters = new ArrayList<HttpMessageConverter<?>>(); messageConverters.add(mappingJacksonHttpMessageConverter); restTemplate = new RestTemplate(); restTemplate.setMessageConverters(messageConverters); } public <T> T getAmbariHadoopObject(String url, Class<?> clazz) { SimpleClientHttpRequestFactory requestFactory = new SimpleClientHttpRequestFactory() { @Override protected void prepareConnection(HttpURLConnection connection, String httpMethod) throws IOException { super.prepareConnection(connection, httpMethod); String authorisation = user + ":" + password; String encodedAuthorisation = Base64.encodeBase64String(authorisation.getBytes()); connection.setRequestProperty("Authorization", "Basic " + encodedAuthorisation); connection.setConnectTimeout(30000); connection.setReadTimeout(120000); } }; restTemplate.setRequestFactory(requestFactory); String fullUrl = "http://" + host + "/api/v1/clusters/" + clusterName + url; return (T) restTemplate.getForObject(fullUrl, clazz); } /** * Model class to hold the data from the JSON response. */ private static final class JobTrackerData { public class Metrics { public class MapReduce { public class JobTracker { @JsonProperty("jobs_running") private Integer jobsRunning; @JsonProperty("running_maps") private Integer runningMaps; @JsonProperty("running_reduces") private Integer runningReduces; } @JsonProperty("jobtracker") private JobTracker jobTracker; } @JsonProperty("mapred") private MapReduce mapReduce; } @JsonProperty("metrics") private Metrics metrics; } public static void main(String[] args) { AmbariClient client = new AmbariClient("ambari.teradata.com", "clustername", "admin", "admin"); JobTrackerData data = client.getAmbariHadoopObject( "/services/MAPREDUCE/components/JOBTRACKER", JobTrackerData.class); System.out.println("Jobs running: " + data.metrics.mapReduce.jobTracker.jobsRunning); System.out.println("Map tasks running: " + data.metrics.mapReduce.jobTracker.runningMaps); System.out.println("Reduce tasks running: " + data.metrics.mapReduce.jobTracker.runningReduces); } }
Following Viewpoint’s standard data collection practices, all of the data collected from Ambari is stored in the Viewpoint database. The data is collected from Ambari every minute by default, and therefore the database has a view of the state of the Hadoop system over the course of an hour, day, or week. This historical data is used to generate a variety of different charts in the Viewpoint web portal, and also to enable the use of Rewind to enable users to go back and see exactly what was occurring on the Hadoop cluster at a specific point in time.
By using Ambari for monitoring of a Hadoop cluster, Viewpoint was able to deliver a comprehensive Hadoop monitoring solution in a relatively short amount of time. Viewpoint’s Java and web developers were able to focus on the tasks at which they excel: getting the data from the source system (Ambari) and displaying it in Viewpoint’s portlets. No time was wasted trying to get up to speed on Ganglia, JMX, or many of the details of Hadoop’s inner workings. Ambari was a critical piece of technology to help Viewpoint roll out this solution and enhance Viewpoint’s support of Teradata’s Unified Data Architecture.