The Secondary NameNode is not a backup of the NameNode and take care of some housekeeping tasks for the NameNode
Question : You have a MapReduce job which is dependent on two external jdbc jars called ojdbc.jar and openJdbc.jar which of the following command will correctly include this external jars in the running Jobs classpath 1. hadoop jar job.jar HadoopExam -cp ojdbc6.jar,openJdbc6.jar 2. hadoop jar job.jar HadoopExam -libjars ojdbc6.jar,openJdbc6.jar 3. Access Mostly Uused Products by 50000+ Subscribers 4. hadoop jar job.jar HadoopExam -libjars ojdbc6.jar openJdbc6.jar Ans : 2 Exp : The syntax for executing a job and including archives in the job's classpath is: hadoop jar -libjars ,[,...]
Question : You have Sqoop to import the EVENT table from the database, then write a Hadoop streaming job in Python to scrub the data, and use Hive to write the new data into the Hive EVENT table. How would you automate this data pipeline? 1. Using first Sqoop job and then remaining Part using MapReduce job chaining. 2. Define the Sqoop job, the MapReduce job, and the Hive job as an Oozie workflow job, and define an Oozie coordinator job to run the workflow job daily. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Define the Sqoop job, the MapReduce job, and the Hive job as an Oozie workflow job, and define an Zookeeper coordinator job to run the workflow job daily.
Ans :2 Exp : In Oozie, scheduling is the function of an Oozie coordinator job. Oozie does not allow you to schedule workflow jobs Oozie coordinator jobs cannot aggregate tasks or define workflows; coordinator jobs are simple schedules of previously defined worksflows. You must therefore assemble the various tasks into a single workflow job and then use a coordinator job to execute the workflow job.
Question : QuickTechie Inc has a log file which is tab-delimited text file. File contains two columns username and loginid You want use an InputFormat that returns the username as the key and the loginid as the value. Which of the following is the most appropriate InputFormat should you use?
Explanation: An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Each line is divided into key and value parts by a separator byte. If no such a byte exists, the key will be the entire line and value will be empty.The KeyValueTextInputFormat parses each line of text as a key, a separator and a value. The default separator is the tab character. In new API (apache.hadoop.mapreduce.KeyValueTextInputFormat) , how to specify separator (delimiter) other than tab(which is default) to separate key and Value.
Sample Input : one,first line two,second line Ouput Required :
Key : one Value : first line Key : two Value : second line
Question : In the QuickTechie Inc Hadoop cluster you have defined block size as MB. The input file contains MB of valid input data and is loaded into HDFS. How many map tasks should run without considering any failure of MapTask during the execution of this job?