Premium

Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)



Question : Select the correct option ?
  : Select the correct option ?
1. NameNode is the bottleneck for reading the file in HDFS
2. NameNode is used to determine the all the blocks of a file
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Correct Answer : Get Lastest Questions and Answer :

Explaination : When a client application wants to read a file
- It communicates with the NameNode to determine which blocks make up the file, and which blocks make up the file
- It the communicates directly with the datanodes to read the data
- The NameNode is not a bottleneck.




Question :

Which is the correct option for accessing the file which is stored in HDFS

 :
1. Application can read and write files in HDFS using JAVA API
2. There is a commnad line option to access the files
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2 and 3 are correct
5. 1 and 2 are correct

Correct Answer : Get Lastest Questions and Answer :

Application can access the files using Java API and typically the files are created in the local filesystem and
then moved to the HDFS, there is also one command hadoop fs which is used to access the files in HDFS




Question : Which is the correct command to copy files from local to HDFS file systems
 : Which is the correct command to copy files from local to HDFS file systems
1. hadoop fs -copy pappu.txt pappu.txt
2. hadoop fs -copyFromPath pappu.txt pappu.txt
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above

Correct Answer : Get Lastest Questions and Answer :


Related Questions


Question : The Apache Hive data warehouse software facilitates querying and managing large datasets residing in
distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a
SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce
programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient
to express this logic in HiveQL. Select the correct statement regarding Hive from the below options ?
 : The Apache Hive data warehouse software facilitates querying and managing large datasets residing in
1. Hive comes with no additional capabilities to MapReduce. Hive programs are executed as MapReduce jobs via the Hive interpreter as well as some logic in memory.
2. Hive comes with additional capabilities to MapReduce. Hive programs are executed as MapReduce jobs via the Hive interpreter.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hive comes with no additional capabilities to MapReduce. Hive programs are executed as MapReduce jobs via the Hive interpreter.


Question : You've written a MapReduce job based on HadoopExam websites log file named MAIN.PROFILE.log file , resulting in an extremely
large amount of output data. Which of the following cluster resources will your job stress? ?
 : You've written a MapReduce job based on HadoopExam websites log file named MAIN.PROFILE.log file , resulting in an extremely
1. network I/O and disk I/O
2. network I/O and RAM
3. Access Mostly Uused Products by 50000+ Subscribers
4. RAM , network I/O and disk I/O


Question : You have written a Mapper which invokes the following five calls to the OutputCollector.collect method:

output.collect(new Text("Flag"), new Text("Rahul"));
output.collect(new Text("Shirt"), new Text("Yakul"));
output.collect(new Text("Shoe"), new Text("Rahul"));
output.collect(new Text("Flag"), new Text("Gemini"));
output.collect(new Text("Socks"), new Text("Yakul"));

How many times will the Reducer's reduce() method be invoked.

 : You have written a Mapper which invokes the following five calls to the OutputCollector.collect method:
1. 5
2. 4
3. Access Mostly Uused Products by 50000+ Subscribers
4. 7
5. 8


Question : ___________ is an optimization technique where a computer system performs some task that may not be actually needed. The main idea is to
do work before it is known whether that work will be needed at all, so as to prevent a delay that would have to be incurred by doing the work after it
is known whether it is needed. If it turns out the work was not needed after all, the results are ignored. The Hadoop framework also provides a
mechanism to handle machine issues such as faulty configuration or hardware failure. The JobTracker detects that one or a number of
machines are performing poorly and starts more copies of a map or reduce task. This behaviour is known as ________________

 : ___________ is an optimization technique where a computer system performs some task that may not be actually needed. The main idea is to
1. Task Execution
2. Job Execution
3. Access Mostly Uused Products by 50000+ Subscribers
4. Speculative Execution


Question :
You are working in the HadoopExam consultency team and written a MapReduce and Pig job, which of the following is correct statement?

 :
1. Pig comes with additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.
2. Pig comes with no additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Pig comes with additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.


Question : Everyday HadoopExam has a good number of subscribers, but the file size created from this information is
smaller than 64MB, and same 64MB is configured as a block size on the cluster.
You are running a job that will process this file as a single input split on a cluster which has no other jobs currently running,
and with all settings at their default values. Each node has an equal number of open Map slots.
On which node will Hadoop first attempt to run the Map task?

 : Everyday HadoopExam has a good number of subscribers, but the file size created from this information is
1. The node containing the first TaskTracker to heartbeat into the JobTracker, regardless of the location of the input split
2. The node containing the first JobTracker to heartbeat into the Namenode, regardless of the location of the input split
3. Access Mostly Uused Products by 50000+ Subscribers
4. The node containing nearest location of the input split