Question : What describes the relationship between MapReduce and Hive? 1. Hive provides additional capabilities that allow certain types of data manipulation not possible with MapReduce. 2. Hive programs rely on MapReduce but are extensible, allowing developers to do special-purpose processing not provided by MapReduce. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hive provides no additional capabilities to MapReduce. Hive programs are executed as MapReduce jobs via the Hive interpreter.
Correct Answer : Get Lastest Questions and Answer : Hive is a framework that translates queries written in Hive QL into jobs that are executed by the MapReduce framework. Hive does not provide any functionality that isn't provided by MapReduce, but it makes some types of data operations significantly easier to perform.
Question : What is HIVE? 1. HIVE is part of the Apache Hadoop project that enables in-memory analysis of real-time streams of data 2. Hive is a way to add data from local file system to HDFS 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing
Hive is a project initially developed by facebook specifically for people with very strong SQL skills and not very strong Java skills who want to query data in Hadoop
Question : Which statement is true about apache Hadoop ?
2. Cloudant indexing is flexible and powerful, and includes real-time MapReduce, Apache Lucene-based full-text search, advanced Geospatial, and declaritive Cloudant Query. 3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2 5. 1,2 and 3
Question : What occurs when you run a Hadoop job, specifying an output directory job output which already exists in HDFS? 1. An error will occur immediately, because the output directory must not already exist when a MapReduce job commences. 2. An error will occur after the Mappers have completed but before any Reducers begin to run, because the output path must not exist when the Reducers commence. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The job will run successfully. Output from the Reducers will overwrite the contents of the existing directory. Ans : 1 Exp : When a job is run, one of the first things done on the client is a check to ensure that the output directory does not already exist. If it does, the client will immediately terminate. The job will not be submitted to the cluster; the check takes place on the client.
Question : Select the correct statement for OOzie workflow 1. OOzie workflow runs on a server which is typically outside of Hadoop Cluster 2. OOzie workflow definition are submitted via HTTP. 3. Access Mostly Uused Products by 50000+ Subscribers 4. All 1,2 and 3 are correct 5. Only 1 and 3 are correct
Ans : 4 Exp : Check the URL http://oozie.apache.org/docs/3.1.3-incubating/WorkflowFunctionalSpec.html
Question : What is the result when you execute: hadoop jar SampleJar.jar MyClass on a client machine? 1. SampleJar.jar is placed in a temporary directory in HDFS 2. An error will occur, because you have not provided input and output directories 3. Access Mostly Uused Products by 50000+ Subscribers 4. SampleJar.jar is sent directly to the JobTracker
Ans : 1 Exp : When a job is submitted to the cluster, it is placed in a temporary directory in HDFS and the JobTracker is notified of that location. The configuration for the job is serialized to an XML file, which is also placed in a directory in HDFS. Some jobs require you to specify the input and output directories on the command line, but this is not a Hadoop requirement.
Question : During a MapReduce v(MRv) job submission, there are a number of steps between the ResourceManager receiving the job submission and the map tasks running on different nodes.
1. Order the following steps according to the flow of job submission in a YARN cluster: 2. The ResourceManager application manager asks a NodeManager to launch the ApplicationMaster 3. Access Mostly Uused Products by 50000+ Subscribers 4. The ApplicationMaster sends a request to the assigned NodeManagers to run the map tasks 5. The ResourceManager scheduler allocates a container for the ApplicationMaster 6. The ApplicationMaster determines the number of map tasks based on the input splits 7. The ResourceManager scheduler makes a decision where to run the map tasks based on the memory requirements and data locality 1. 3,4,6,5,1,2 2. 2,3,4,5,6,1 3. Access Mostly Uused Products by 50000+ Subscribers 4. 3,4,6,5,1,2 5. 6,5,3,4,1,2
Ans : 2 Exp : Link : http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/YARN.html MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks. The ResourceManager has two main components: Scheduler and ApplicationsManager. The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc. In the first version, only memory is supported. The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples of the plug-in. The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler. The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress. MRV2 maintains API compatibility with previous stable release (hadoop-0.20.205). This means that all Map-Reduce jobs should still run unchanged on top of MRv2 with just a recompile.
Question : You decide to create a cluster which runs HDFS in High Availability mode with automatic failover, using Quorum-based Storage. Which service keeps track of which NameNode is active at any given moment? 1. YARN ResourceManager
Ans : 4 Exp : When the first NameNode is started, it connects to ZooKeeper and registers itself as the Active NameNode. The next NameNode then sees that information and sets itself up in Standby mode (in fact, the ZooKeeper Failover Controller is the software responsible for the actual communication with ZooKeeper). Clients never connect to ZooKeeper to discover anything about the NameNodes. In an HDFS HA scenario, ZooKeeper is not used to keep track of filesystem changes. That is the job of the Quorum Journal Manager daemons.
Question : Which two daemons typically run on each slave node in a Hadoop cluster running MapReduce v (MRv) on YARN?
4. After a period of time, the ApplicationMaster will mark the map task attempt as failed and ask the NodeManager to terminate the container for the Map task
Ans : 3 Exp : First job will successfully run and second one will fail, because, if (output directory already exist then it will not run and throws exception, complaining output directory already exist.
Question : Which of the following model can help you to visually represent that how an organization delivers value to its customers or beneficiaries ? 1. Business Model