You have user profile records in an OLTP database that you want to join with web server logs which you have already ingested into HDFS. What is the best way to acquire the user profile for use in HDFS? A. Ingest with Hadoop streaming B. Ingest with Apache Flume C. Ingest using Hive's LOAD DATA command D. Ingest using Sqoop E. Ingest using Pig's LOAD command
Explanation: Components of Mapreduce Job Flow: Mapreduce job flow on YARN involves below components. A Client node, which submits the Mapreduce job. The YARN Resource Manager, which allocates the cluster resources to jobs. The YARN Node Managers, which launch and monitor the tasks of jobs. The MapReduce Application Master, which coordinates the tasks running in the MapReduce job. The application master and the MapReduce tasks run in containers that are scheduled by the resource manager, and managed by the node managers. The HDFS file system is used for sharing job files between the above entities.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question : Developer has submitted the YARN Job, by calling submitApplication() method on Resource Manager. Please select the correct order of the below stpes after that
1. Container will be managed by Node Manager after job submission 2. Resource Manager triggers its sub-component Scheduler, which allocates containers for mapreduce job execution. 3. Access Mostly Uused Products by 50000+ Subscribers
Explanation: Job Start up: The call to Job.waitForCompletion() in the main driver class is where all the execution starts. The driver is the only piece of code that runs on our local machine, and this call starts the communication with the Resource Manager. Retrieves the new Job ID or Application ID from Resource Manager. The Client Node copies Job Resources specified via the -files, -archives, and -libjars command-line arguments, as well as the job JAR file on to HDFS. Finally, Job is submitted by calling submitApplication() method on Resource Manager. Resource Manager triggers its sub-component Scheduler, which allocates containers for mapreduce job execution. Then Resource Manager starts Application Master in the container provided by the scheduler. This container will be managed by Node Manager from here on wards.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
1. 1. Iterate over the DistributedCache instance in the Mapper and add all the cached file paths to an array. 2. 2. There is a direct method available on the DistributedCache.getAllFilePath() 3. Access Mostly Uused Products by 50000+ Subscribers 4. 4. All of the above
1. create table table_name ( id int, date date, name string ) ) partitioned by (date string) 2. create table table_name ( id int, date date, name string ) ) partitioned by (string) 3. Access Mostly Uused Products by 50000+ Subscribers 4. Only 2 and 3 correct
1. The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement 2. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 2 is correct 5. All 1,2 and 3 are correct