Question : Which statement is true with respect to MapReduce . or YARN 1. It is the newer version of MapReduce, using this performance of the data processing can be increased. 2. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling or monitoring, into separate daemons. 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above 5. Only 2 and 3 are correct Ans : 5 Exp : MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling or monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question :
Which statement is true about ApplicationsManager
1. is responsible for accepting job-submissions 2. negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above 5. 1 and 2 are correct Ans : 5 Exp : The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question :
Which tool is used to list all the blocks of a file ?
Ans : 3 Exp :The fundamental idea of MRv2(YARN)is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question : Identify the tool best suited to import a portion of a relational database every day as files into HDFS, and generate Java classes to interact with that imported data?
Exp :Sqoop ("SQL-to-Hadoop") is a straightforward command-line tool with the following capabilities: Imports individual tables or entire databases to files in HDFS Generates Java classes to allow you to interact with your imported data Provides the ability to import from SQL databases straight into your Hive data warehouse
Data Movement Between Hadoop and Relational Databases Data can be moved between Hadoop and a relational database as a bulk data transfer, or relational tables can be accessed from within a MapReduce map function. Note:
* Cloudera's Distribution for Hadoop provides a bulk data transfer tool (i.e., Sqoop) that imports individual tables or entire databases into HDFS files. The tool also generates Java classes that support interaction with the imported data. Sqoop supports all relational databases over JDBC, and Quest Software provides a connector (i.e., OraOop) that has been optimized for access to data residing in Oracle databases.
Question : Given no tables in Hive, which command will import the entire contents of the LOGIN table from the database into a Hive table called LOGIN that uses commas (,) to separate the fields in the data files? 1. hive import --connect jdbc:mysql://dbhost/db --table LOGIN --terminated-by ',' --hive-import 2. hive import --connect jdbc:mysql://dbhost/db --table LOGIN --fields-terminated-by ',' --hive-import 3. Access Mostly Uused Products by 50000+ Subscribers 4. sqoop import --connect jdbc:mysql://dbhost/db --table LOGIN --fields-terminated-by ',' --hive-import Ans : 4 Exp : Sqoop import to a Hive table requires the import option followed by the --table option to specify the database table name and the --hive-import option. If --hive-table is not specified, the Hive table will have the same name as the imported database table. If --hive-overwrite is specified, the Hive table will be overwritten if it exists. If the --fields-terminated-by option is set, it controls the character used to separate the fields in the Hive table's data files.
Watch Hadoop Professional training Module : 22 by www.HadoopExam.com http://hadoopexam.com/index.html/#hadoop-training
Question : Which two daemons typically run on each slave node in a Hadoop cluster running MapReduce v (MRv) on YARN?
Explanation: Each slave node in a cluster configured to run MapReduce v2 (MRv2) on YARN typically runs a DataNode daemon (for HDFS functions) and NodeManager daemon (for YARN functions). The NodeManager handles communication with the ResourceManager, oversees application container lifecycles, monitors CPU and memory resource use of the containers, tracks the node health, and handles log management. It also makes available a number of auxiliary services to YARN applications.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question : How does the Hadoop framework determine the number of Mappers required for a MapReduce job on a cluster running MapReduce v (MRv) on YARN? 1. The number of Mappers is equal to the number of InputSplits calculated by the client submitting the job 2. The ApplicationMaster chooses the number based on the number of available nodes
Correct Answer : Get Lastest Questions and Answer : Each Mapper task processes a single InputSplit. The client calculates the InputSplits before submitting the job to the cluster. The developer may specify how the input split is calculated, with a single HDFS block being the most common split. This is true for both MapReduce v1 (MRv1) and YARN MapReduce implementations.
With YARN, each mapper will be run in a container which consists of a specific amount of CPU and memory resources. The ApplicationMaster requests a container for each mapper. The ResourceManager schedules the resources and instructs the ApplicationMaster of available NodeManagers where the container may be launched.
With MRv1, each Tasktracker (slave node) is configured to handle a maximum number of concurrent map tasks. The JobTracker (master node) assigns a Tasktracker a specific Inputslit to process as a single map task.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question : In a Sqoop job Assume $PREVIOUSREFRESH contains a date:time string for the last time the import was run, e.g., '-- ::'. Which of the following import command control arguments prevent a repeating Sqoop job from downloading the entire EVENT table every day? 1. --incremental lastmodified --refresh-column lastmodified --last-value "$PREVIOUSREFRESH" 2. --incremental lastmodified --check-column lastmodified --last-time "$PREVIOUSREFRESH" 3. Access Mostly Uused Products by 50000+ Subscribers 4. --incremental lastmodified --check-column lastmodified --last-value "$PREVIOUSREFRESH"
Which best describes the primary function of Flume? 1. Flume is a platform for analyzing large data sets that consists of a high level language for expressing data analysis programs, coupled with an infrastructure consisting of sources and sinks for importing and evaluating large data sets 2. Flume acts as a Hadoop filesystem for log files 3. Access Mostly Uused Products by 50000+ Subscribers 4. Flume provides a query languages for Hadoop similar to SQL 5. Flume is a distributed server for collecting and moving large amount of data into HDFS as its produced from streaming data flows