Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : All keys used for intermediate output from mappers must:

1. Implement a splittable compression algorithm.
2. Be a subclass of FileInputFormat.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Override isSplitable.
5. Implement a comparator for speedy sorting.

Correct Answer : Get Lastest Questions and Answer :
Explanation: The MapReduce framework operates exclusively on pairs, that is, the framework views the input to the job as a set of pairs and produces a
set of pairs as the output of the job, conceivably of different types. The key and value classes have to be serializable by the framework and hence need to
implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.

Question : Which Hadoop component is responsible for managing the distributed file system metadata?

1. NameNode
2. Metanode
3. Access Mostly Uused Products by 50000+ Subscribers
4. NameSpaceManager

Correct Answer : Get Lastest Questions and Answer :
Explanation: Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or
process became unavailable,
the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

This impacted the total availability of the HDFS cluster in two major ways:
In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.
Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime.

The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with
a hot standby.
This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance.

Question : You need to move a file titled "weblogs" into HDFS. When you try to copy the file, you can't.
You know you have ample space on your DataNodes. Which action should you take to
relieve this situation and store more files in HDFS?

1. Increase the block size on all current files in HDFS.
2. Increase the block size on your remaining files.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Increase the amount of memory for the NameNode.
5. Increase the number of disks (or size) for the NameNode.

Correct Answer : Get Lastest Questions and Answer :
Explanation: Namenode: The core metadata server of Hadoop. This is the most critical piece of the system, and there can only be one of these. This stores both the file
system image and the file system journal. The namenode keeps all of the filesystem
layout information (files, blocks , directories, permissions, etc) and the block locations. The filesystem layout is persisted on disk and the block locations are kept solely in
memory. When a client opens a file, the namenode tells the client the
locations of all the blocks in the file; the client then no longer needs to communicate with the namenode for data transfer.

Namenode: We recommend at least 8GB of RAM (minimum is 2GB RAM), preferably 16GB or more. A rough rule of thumb is 1GB per 100TB of raw disk space; the actual requirements is around
1GB per million
objects (files, directories, and blocks). The CPU requirements are any modern multi-core server CPU. Typically, the namenode will only use 2-5% of your CPU. As this is a single
point of failure, the most important requirement
is reliable hardware rather than high performance hardware. We suggest a node with redundant power supplies and at least 2 hard drives.

Related Questions

Question :

Which of the following is a correct way to disable the Speculative-execution

A. In Command Line
bin/hadoop jar -Dmapreduce.map.speculative =false \
-D mapreduce.reduce.speculative=false jar>

B. In JobConfiguration:
jobconf.setBoolean("mapreduce.map.speculative", false);
jobconf.setBoolean("mapreduce.reduce.speculative ", false);

C. In JobConfiguration:
jobconf.setBoolean("mapreduce.speculative", false);

D. In JobConfiguration:
jobconf.setBoolean("mapreduce.mapred.speculative", false);

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,D

Question :

You have written a word count MapReduce program for a big file, almost 5TB in size. Now you want after completion of the job,
you want to create a single file from all the reducers output. Which is the best option. Assuming all the output files of
jobs are written in the output directory

/data/weblogs/weblogs_md5_groups.bcp

1. hadoop fs -getmerge weblogs_md5_ groups.bcp /data/weblogs/weblogs_md5_groups.bcp
2. hadoop fs -getmerge /data/weblogs/weblogs_md5_groups.bcp/*
3. Access Mostly Uused Products by 50000+ Subscribers
4. hadoop fs -getmerge /data/weblogs/weblogs_md5_groups.bcp weblogs_md5_ groups.bcp

Question : Which of the following data format can be analyzed by Hadoop

1. XML
2. CSV
3. Access Mostly Uused Products by 50000+ Subscribers
4. Text
5. All of the above

Question : What are supported programming language for Hadoop

1. Java and Scripting Language
2. Any Programming Language
3. Access Mostly Uused Products by 50000+ Subscribers
4. C , Cobol and Java

Question : How does Hadoop process large volumes of data?

1. Hadoop uses a lot of machines in parallel. This optimizes data processing.
2. Hadoop was specifically designed to process large amount of data by taking advantage of MPP hardware
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hadoop uses sophisticated caching techniques on namenode to speed processing of data

Question : What are sequence files and why are they important?

1. Sequence files are binary format files that are compressed and are splitable.
They are often used in high-performance map-reduce jobs
2. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of above