Correct Answer : Get Lastest Questions and Answer : Explanation: YARN is more general than MR and it should be possible to run other computing models like BSP besides MR. Prior to YARN, it required a separate cluster for MR, BSP and others. Now they can coexist in a single cluster, which leads to higher usage of the cluster. Here are some of the applications ported to YARN.
From a MapReduce perspective in legacy MR there are separate slots for Map and Reduce tasks, but in YARN there is no fixed purpose of a container. The same container can be used for a Map task, Reduce task, Hama BSP Task or something else. This leads to better utilization.
Also, it makes it possible to run different versions of Hadoop in the same cluster which is not possible with legacy MR, which makes is easy from a maintenance point.
Question : Which of the following main responsibility of JobTracker has been separated in YARN?
Correct Answer : Get Lastest Questions and Answer : Explanation: : In Hadoop 2, MapReduce is split into two components: The cluster resource management capabilities have become YARN, while the MapReduce-specific capabilities remain MapReduce. In the former MR1 architecture, the cluster was managed by a service called the JobTracker. TaskTracker services lived on each node and would launch tasks on behalf of jobs. The JobTracker would serve information about completed jobs. In MR2, the functions of the JobTracker are divided into three services. The ResourceManager is a persistent YARN service that receives and runs applications (a MapReduce job is an application) on the cluster. It contains the scheduler, which, as in MR1, is pluggable.
The MapReduce-specific capabilities of the JobTracker have moved into the MapReduce Application Master, one of which is started to manage each MapReduce job and terminated when the job completes. The JobTracker s function of serving information about completed jobs has been moved to the JobHistoryServer. The TaskTracker has been replaced with the NodeManager, a YARN service that manages resources and deployment on a node. NodeManager is responsible for launching containers, each of which can house a map or reduce task.
Because MR1 functionality has been split into two components in Hadoop 2, MapReduce cluster configuration options have been split into YARN configuration options, which go in yarn-site.xml; and MapReduce configuration options, which go in mapred-site.xml. Many have been given new names to reflect the shift. As JobTrackers and TaskTrackers no longer exist in MR2, all configuration options pertaining to them no longer exist, although many have corresponding options for the ResourceManager, NodeManager, and JobHistoryServer. We ll follow up with a full translation table in a future post.
Question : Which all are the responsibilities of ResourceManager in MRv?
Correct Answer : Get Lastest Questions and Answer : Explanation: ResourceManager is the central authority that manages resources and schedules applications running atop of YARN. The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc.
The Scheduler has a pluggable policy which is responsible for partitioning the cluster resources among the various queues, applications etc. The current schedulers such as the CapacityScheduler and the FairScheduler would be some examples of plug-ins.
Question : You have Sqoop to import the EVENT table from the database, then write a Hadoop streaming job in Python to scrub the data, and use Hive to write the new data into the Hive EVENT table. How would you automate this data pipeline? 1. Using first Sqoop job and then remaining Part using MapReduce job chaining. 2. Define the Sqoop job, the MapReduce job, and the Hive job as an Oozie workflow job, and define an Oozie coordinator job to run the workflow job daily. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Define the Sqoop job, the MapReduce job, and the Hive job as an Oozie workflow job, and define an Zookeeper coordinator job to run the workflow job daily.
Ans :2 Exp : In Oozie, scheduling is the function of an Oozie coordinator job. Oozie does not allow you to schedule workflow jobs Oozie coordinator jobs cannot aggregate tasks or define workflows; coordinator jobs are simple schedules of previously defined worksflows. You must therefore assemble the various tasks into a single workflow job and then use a coordinator job to execute the workflow job.
Question : QuickTechie Inc has a log file which is tab-delimited text file. File contains two columns username and loginid You want use an InputFormat that returns the username as the key and the loginid as the value. Which of the following is the most appropriate InputFormat should you use?