Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Let's assume you have following files in the hdfs directory called merge.
Test1.txt
hadoopexam.com Hadoop Training 1

Test2.txt
www.hadoopexam.com Hadoop YARN Training

Test3.txt
http://hadoopexam.com Amazon WebService Training

Now you run the following command
hadoop fs -getmerge merge/ output1.txt
What is the correct statement?

1. It will create a new file called output1.txt in local file system, with the merged content from the all three files
2. It will create a new file called output1.txt in hdfs file system, with the merged content from the all three files
3. Access Mostly Uused Products by 50000+ Subscribers
4. This command will successful but will not merge the files because of, what to do with new line character is not defined.

Correct Answer : Get Lastest Questions and Answer :

getmerge
Usage: hadoop fs -getmerge (src) (localdst) [addnl]
Takes a source directory and a destination file as input and concatenates files in src into the destination local file. Optionally addnl can be set to enable
adding a newline character at the end of each file.

Question : Let's assume you have following files in the hdfs directory called merge.
Test1.txt
hadoopexam.com Hadoop Training 1
Test2.txt
www.hadoopexam.com Hadoop YARN Training
Test3.txt
http://hadoopexam.com Amazon WebService Training
Now you run the following command
hadoop fs -getmerge merge/ output1.txt
What is the content in the output1.txt file

1.
hadoopexam.com Hadoop Training 1
www.hadoopexam.com Hadoop YARN Training
http://hadoopexam.com Amazon WebService Training

2.
www.hadoopexam.com Hadoop YARN Training
hadoopexam.com Hadoop Training 1
http://hadoopexam.com Amazon WebService Training

3. Access Mostly Uused Products by 50000+ Subscribers
It could be any random order
4.
www.hadoopexam.com Hadoop YARN Traininghadoopexam.com Hadoop Training 1http://hadoopexam.com Amazon WebService Training

Correct Answer : Get Lastest Questions and Answer :

getmerge
Usage: hadoop fs -getmerge (src) (localdst) [addnl]
Takes a source directory and a destination file as input and concatenates files in src into the destination local file. Optionally addnl can be set to enable
adding a newline character at the end of each file.

Question : What happens when a TaskTracker does not send back heartbeat to JobTracker?

1. JobTracker re-tries 3 times to submit the same Job

2. Job Tracker marks Task Tracker as down and won't schedule subsequent tasks.

3. Access Mostly Uused Products by 50000+ Subscribers

4. 2 and 3

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: When Client application submits jobs to the Job tracker, the Job Tracker talks to the Name Node to determine the location of the data.
The Job Tracker locates Task Tracker nodes with available slots at or near the data.
The Job Tracker submits the work to the chosen Task Tracker nodes.
The Task Tracker nodes are monitored and they will submit it heart beat for a certain amount of time if they do not submit heartbeat signals for every few
minutes, they are deemed to
have failed and the work is scheduled on a different Task Tracker.
When the work is completed, the Job Tracker updates its status

The connection is
JobTracker ---> Task Tracker ----> Task. The task if fails is updated to Task Tracker which in turn updates the JobTracker. The task is then started at
another datanode to avoid failure

Related Questions

Question :Is there a map input format?

1. Yes, but only in Hadoop 0.22+
2. Yes, there is a special format for map files
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 2 and 3 are correct answers.

Question :What is the most important feature of map-reduce

1. Ability to store large amount of data
2. Ability to process data on the cluster of the machines without copying all the data over
3. Access Mostly Uused Products by 50000+ Subscribers
4. Ability to process large amounts of data in parallel

Question :What is the role of the namenode (MRv)?

1. Namenode splits big files into smaller blocks and sends them to different datanodes
2. Namenode is responsible for assigning names to each slave node so that they can be identified by the clients
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 2 and 3 are valid answers

Question : What happen on the namenode when a client tries to read a data file?

1. The namenode will look up the information about file in the edit file and then retrieve the remaining
information from filesystem memory snapshot
2. The namenode is not involved in the retrieving data file since the data is stored on datanodes
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of these answers are correct
Ans : 1
Exp : Since the namenode needs to support a large number of the clients, the primary namenode will only send information back for the data location. The datanode itselt is
responsible for the retrieval.

Question :What mode(s) can Hadoop code be run in?

1. Hadoop can be deployed in distributed mode only
2. Hadoop can be deployed in stand-alone mode or distributed mode
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of these are applicable modes for Hadoop
Ans: 3
Exp : Hadoop was specifically designed to be deployed on multi-node cluster. However, it also can be deployed on single machine and as a single process for testing purposes.

Question :What is the role of the secondary namenode?

1. Secondary namenode is a backup namenode that will serve requests if primary namenode goes down
2. Secondary namenode performs CPU intensive operation of combining edit logs and current filesystem snapshots
3. Access Mostly Uused Products by 50000+ Subscribers
4. There is no secondary namenode
Ans : 2
Exp : The secondary namenode was separated out as a process due to having CPU intensive operations and additional requirements for metadata back-up.

Question :What are the side effects of not running a secondary name node?

1. The cluster performance will degrade over time since edit log will grow bigger and bigger
2. The primary namenode will become overloaded and response time be slower.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The only possible impact is when there is an outage a failover to the secondary namenode will not occur. This is a rare occurence
Ans :1
Exp : If the secondary namenode is not running at all, the edit log will grow significantly and it will slow the system down. Also, the system will go into safemode for an
extended time since the namenode needs to combine the edit log and the current filesystem checkpoint image.

Question :What happen if a datanode loses network connection for a few minutes?

1. The namenode will detect that a datanode is not responsive and will start replication of the data from remaining replicas. When datanode comes back online, administrator will need to manually delete the extra replicas
2. All data will be lost on that node. The administrator has to make sure the proper data distribution between nodes
3. Access Mostly Uused Products by 50000+ Subscribers
4. The namenode will detect that a datanode is not responsive and will start replication of the data from remaining replicas. When datanode comes back online, the extra replicas will be deleted

Ans : 4
Exp : The replication factor is actively maintained by the namenode. The namenode monitors the status of all datanodes and keeps track which blocks are located on that node. The
moment the datanode is not available it will trigger replication of the data from the existing replicas. However, if the datanode comes back up, over replicated data will be
deleted. Note: the data might be deleted from the original datanode.

Question :What happen if one of the datanodes has much slower CPU? How will it effect the performance of the cluster?

1. The task execution will be as fast as the slowest worker.
However, if speculative execution is enabled, the slowest worker will not have such big impact
2. The slowest worker will significantly impact job execution time. It will slow everything down
3. Access Mostly Uused Products by 50000+ Subscribers
4. It depends on the level of priority assigned to the task. All high priority tasks are executed in parallel twice. A slower datanode would therefore be bypassed. If task is not high priority, however, performance will be affected.
Ans : 1
Exp : Hadoop was specifically designed to work with commodity hardware. The speculative execution helps to offset the slow workers. The multiple instances of the same task will
be created and job tracker will take the first result into consideration and the second instance of the task will be killed

Question : If you have a file M size and replication factor is set to ,
how many blocks can you find on the cluster that will correspond to that file (assuming the default apache configuration)?

1. 3
2. 6
3. Access Mostly Uused Products by 50000+ Subscribers
4. 12
Ans : 2
Exp : Based on the configuration settings the file will be divided into multiple blocks according to the default block size of 64M. 128M / 64M = 2 . Each block will be replicated
according to replication factor settings (default 3). 2 * 3 = 6 .

Question : What is replication factor?

1. Replication factor controls how many times the namenode replicates its metadata
2. Replication factor creates multiple copies of the same file to be served to clients
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of these answers are correct.
Ans : 3
Exp : Data is replicated in the Hadoop cluster based on the replication factor. The high replication factor guarantees data availability in the event of failure.

Question : What daemons run on Master nodes?

1. NameNode, DataNode, JobTracker and TaskTracker
2. NameNode, DataNode and JobTracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. NameNode, Secondary NameNode, JobTracker, TaskTracker and DataNode
Ans : 3
Exp : Hadoop is comprised of five separate daemons and each of these daemon run in its own JVM. NameNode, Secondary NameNode and JobTracker run on Master nodes. DataNode and
TaskTracker run on each Slave nodes.

Question : What is the role of the jobtracker in an Hadoop cluster?

1. The jobtracker is responsible for scheduling tasks on slave nodes, collecting results, retrying failed tasks
2. The job tracker is responsible for job monitoring on the cluster, sending alerts to master nodes
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 1 and 3 are valid answers
Ans :1
Exp : The job tracker is the main component of the map-reduce execution. It control the division of the job into smaller tasks, submits tasks to individual tasktracker, tracks
the progress of the jobs and reports results back to calling code

Question : How does the Hadoop cluster tolerate datanode failures?

1. Failures are anticipated. When they occur, the jobs are re-executed.
2. Datanodes talk to each other and figure out what need to be re-replicated if one of the nodes goes down
3. Access Mostly Uused Products by 50000+ Subscribers
4. Since Hadoop is design to run on commodity hardware, the datanode failures are expected. Namenode keeps track of all available datanodes and actively
maintains replication factor on all data.
Ans : 4
Exp : The namenode actively tracks the status of all datanodes and acts immediately if the datanodes become non-responsive. The namenode is the central "brain" of the HDFS and
starts replication of the data the moment a disconnect is detected.

Question : In the Reducer, the MapReduce API has an iterator over writable values. Calling the next() method

1. Returns a reference to a different Writable object each time.
2. Returns a reference to a writable object from an object pool.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Returns a reference to a same writable object if the next value is the same as the previous value,or a new writable object otherwise.

Question : When is the Reduce Method is called in a MapReduce Job ?

1. reducers start copying immediate key-value pairs from each mappers as soon as it has completed. The reduce method is called only after
all intermediate data has been copied and sorted.
2. Reduce methods and Map methods all start at the beginning of a Job, in order to provide optimal performance for map-only or reduce only job.
3. Access Mostly Uused Products by 50000+ Subscribers
as the intermediate key-value pairs start to arrive.
4. It depends on the configuration which is done in JobConf object
Ans :1

Question : What types of algorithms are difficult to express as MapReduce

1. Large scale graph algorithm
2. When data needs to be shared among the node
3. Access Mostly Uused Products by 50000+ Subscribers
4. For text Analysis on large web data
Ans 2
Exp : Hadoop was designed to run on a large number of machines that dont share memory or disks, like the shared-nothing model. All processing would be done in self contained units
within the cluster, communicating over a common network but sharing no computing resources. The software breaks large datasets in smaller pieces and spreads it across the different
servers. You run a job by querying each of the servers in the cluster, which compile the data and deliver it back to you, leveraging each servers processing power

Question : you have written a Mapper which invokes the following calls to the outputcollector.collect() :

output.collect(new Text("Square"), new Text("Red");
output.collect(new Text("Circle"), new Text("Yellow");
output.collect(new Text("Square"), new Text("Yellow");
output.collect(new Text("Trangle"), new Text("Red");
output.collect(new Text("square"), new Text("Green");

How many times it is going to call reduce method.

1. 2
2. 3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 5

Ans : 4
Exp : Here small s intentionally used.

Question : While writing a combiner which takes as Input Text keys, IntWritable values and emits Text as a key and IntWritable values Which interface requires to be implemented
1. Reducer ( Text, IntWritable,Text, IntWritable )
2. Mapper ( Text, IntWritable,Text, IntWritable )
3. Combiner ( Text, IntWritable,Text, IntWritable)
4. Reducer (Text, IntWritable,Text, Text )

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4
Ans :1

Question : There is a hadoop cluster with the Data Nodes, each with a single TB hard drive, and it utilizes all disk capacity for HDFS,
no space for MapReduce, with replication setting as 3 what is the storage capacity for full cluster.

1. 10 TB
2. 20 TB
3. Access Mostly Uused Products by 50000+ Subscribers
4. It can be between 10 TB to 30 TB
Ans : 3

Question :

Which of the following is correct for TextInputFormat class ..

1. An InputFormat for plain text files. Files are broken into lines.
Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..
2. It can also work for binary file
3. Access Mostly Uused Products by 50000+ Subscribers
4. It is unable to find the end of the line.
Ans :1

Question : What reducer emit..

1. Finale key-value pairs as configured in JobConf object
2. One final key value pair based per input key
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above
Ans : 2

Question : Hadoop framework provides a mechanism for copying with machine issues such as faulty configuration or impeding hardware failure. MapReduce detects
that one or a number of machines are performing poorly and starts more copies of a map or reduce task. all the task run simultaneously
and the task that finishes first are used. Which term describe this behavior..

1. Partitioning
2. Combining
3. Access Mostly Uused Products by 50000+ Subscribers
4. Speculative Execution
Ans : 4

Question : By using hadoop fs -put command to write a MB file using MB block, but while the file is half written, can other user read the already written block

1. It will throw an exception
2. File block would be accessible which are already written
3. Access Mostly Uused Products by 50000+ Subscribers
4. Until the whole file is copied nothing can be accessible.
Ans :4
Exp : While writing the file of 528MB size using following command
hadoop fs -put tragedies_big4 /user/training/shakespeare/
We tried to read the file using following command and output is below.
[hadoopexam@localhost ~]$ hadoop fs -cat /user/training/shakespeare/tragedies_big4 cat: "/user/training/shakespeare/tragedies_big4": No such file or directory [hadoopexam@localhost
~]$ hadoop fs -cat /user/training/shakespeare/tragedies_big4 cat: "/user/training/shakespeare/tragedies_big4": No such file or directory [training@localhost ~]$ hadoop fs -cat
/user/training/shakespeare/tragedies_big4 cat: "/user/training/shakespeare/tragedies_big4": No such file or directory [training@localhost ~]$
Once the put command finishes then only we are able to "cat" this file.

Question : What happens when keys and values passed to the reducers during standard sort and shuffle phase of MapReduce ?

1. Keys are presented to the reducer in sorted order.
2. Keys and Values both are presented in sorted order.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Keys and values both are presented in Random Order.
Ans : 1

Question : In which scenario MapReduce is not suitable..

1. text mining on the unstructured documents
2. Analyzing web documents
3. Access Mostly Uused Products by 50000+ Subscribers
4. for a large computation of financial risk modeling and performance analysis.
Ans : 3

Question : What is a BloomFilter

1. It is a data structure
2. A bloom filter is a compact representation of a set that support only contain query.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Question : What is the default replication factor in the HDFS...

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4