Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Which method of the FileSystem object is used for reading a file in HDFS

1. open()
2. access()
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above

Correct Answer : Get Lastest Questions and Answer :

Opens an FSDataInputStream at the indicated Path

Question : How many states does Writable interface defines

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4

Correct Answer : Get Lastest Questions and Answer :

A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.

Any key or value type in the Hadoop Map-Reduce framework implements this interface.

Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance

Question : Which of the following are the feature of the Apache Hadoop

1. Data Integration
2. Data Processing
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Correct Answer : Get Lastest Questions and Answer :
Apache Hadoop is the framework for Huge Data Volume processing, and it also creates various child task to process data in parallel
Creating and destroying this child task is monitored by hadoop.

Related Questions

Question : Which statement is true with respect to MapReduce . or YARN

1. It is the newer version of MapReduce, using this performance of the data processing can be increased.
2. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker,
resource management and job scheduling or monitoring, into separate daemons.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. Only 2 and 3 are correct
Ans : 5
Exp : MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker,
resource management and job scheduling or monitoring, into separate daemons. The idea is to have a global ResourceManager (RM)
and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : Which statement is true about ApplicationsManager

1. is responsible for accepting job-submissions
2. negotiating the first container for executing the application specific ApplicationMaster
and provides the service for restarting the ApplicationMaster container on failure.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. 1 and 2 are correct
Ans : 5
Exp : The ApplicationsManager is responsible for accepting job-submissions,
negotiating the first container for executing the application specific ApplicationMaster and provides the
service for restarting the ApplicationMaster container on failure.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : Which tool is used to list all the blocks of a file ?

1. hadoop fs
2. hadoop fsck
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not Possible
Ans : 2

Question : Which two daemons typically run on each slave node in a Hadoop cluster running MapReduce v (MRv) on YARN?

1. TaskTracker

2. Secondary NameNode

3. NodeManager

4. DataNode

5. ZooKeeper

6. JobTracker

7. NameNode

8. JournalNode

1. 1,2
2. 2,3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 5,6
4. 7,8

Question : How does the Hadoop framework determine the number of Mappers required for a MapReduce job on a cluster running MapReduce v (MRv) on YARN?

1. The number of Mappers is equal to the number of InputSplits calculated by the client submitting the job
2. The ApplicationMaster chooses the number based on the number of available nodes

3. Access Mostly Uused Products by 50000+ Subscribers
4. NodeManager where the job's HDFS blocks reside
5. The developer specifies the number in the job configuration

Question :

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25
KB. Because your Hadoop cluster isn't optimized for storing and processing many small files you
decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with
Python using Hadoop streaming
Which data serialization system gives you the flexibility to do this?

A. CSV
B. XML
C. HTML
D. Avro
E. Sequence Files
F. JSON

1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. D,E
5. C,E

Question : Map the following in case of YARN

1. YARN Resource Manager
2. YARN Node Managers
3. Access Mostly Uused Products by 50000+ Subscribers

a. which launch and monitor the tasks of jobs
b. allocates the cluster resources to jobs
c. which coordinates the tasks running in the MapReduce job

1. 1-a, 2-b,3-c
2. 1-b, 2-a,3-c
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1-a, 2-c,3-b

Question :
At line number 4 you replace
with
"this.conf= new Configuration(otherConf)"
where otherConf is an object
of Configuration class.

1. A new configuration with the same settings cloned from another.
2. It will give runtime error
3. Access Mostly Uused Products by 50000+ Subscribers
Ans : 1
Exp : A new configuration with the same settings cloned from another.

Configuration()
A new configuration.
Configuration(boolean loadDefaults)
A new configuration where the behavior of reading from the default resources can be turned off.
Configuration(Configuration other)
A new configuration with the same settings cloned from another.

Question : Suppose that your jobs input is a (huge) set of word tokens and their number of occurrences (word count)
and that you want to sort them by number of occurrences. Then which one of the following class will help you to get globally sorted file

1. Combiner
2. Partitioner
3. Access Mostly Uused Products by 50000+ Subscribers
4. By Default all the files are sorted.

Ans : 2
Exp : it is possible to produce a set of sorted files that, if concatenated,
would form a globally sorted file. The secret to doing this is to use a partitioner that
respects the total order of the output. For example, if we had four partitions, we could put keys
for temperatures less than negative 10 C in the first partition, those between negative 10 C and 0 C in the second,
those between 0 C and 10 C in the third, and those over 10C in the fourth.

Question :
Which of the following could
be replaced safely at line number 9

1. Job job = new Job();
2. Job job = new Job(conf);
3. Access Mostly Uused Products by 50000+ Subscribers
4. You can not change this line from either 1 or 2
Ans : 3
Exp : All 1 and 2 are correct, however not having conf will ignore the custom configuration and 2nd argument present Custom job name.
If you dont provide it take default job name defined by framework.

Question :
If we are processing
input data from database
then at line 10 which of
the following is correct
InputFormat for reading from DB

1. DataBaseInputFormat
2. DBMSInputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not Supported
Ans : 3
Exp : The DBInputFormat is an InputFormat class that allows you to read data from a database.
An InputFormat is Hadoops formalization of a data source; it can mean files formatted in a particular way,
data read from a database, etc. DBInputFormat provides a simple method of scanning entire tables from a database,
as well as the means to read from arbitrary SQL queries performed against the database.
Most queries are supported, subject to a few limitations

Question :
At line number 13 you
replace number of reducer to 1
and Setting Reducer class
as IdenityReducer then
which of the following
statement is correct

1. In both the cases behaviors is same
2. With 0 reducer, reduce step will be skipped and mapper output will be the final out
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 3 both are correct
5. 2 and 3 both are correct

Ans : 5
Exp : If you do not need sorting of map results - you set 0 reduced, and the job is called map only.
If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.
we have a third case : we do need aggregation and, in this case we need reducer.

Question : When you are implementing the secondary sort (Sorting based on values) like, following output is produced as Key Part of the Mapper

2001 24.3
2002 25.5
2003 23.6
2004 29.4
2001 21.3
2002 24.5
2003 25.6
2004 26.4

.
.
2014 21.0

Now you want same year output goes to the same reducer, whic of the following will help yoo to do so;

1. CustomPartitioner
2. Group Comparator
3. Access Mostly Uused Products by 50000+ Subscribers
4. By Implementing Custom WritableComparator
5. Or using Single Reducer

Ans : 2
Exp : Map output key is year and temperature to achieve sorting.
Unless you define a grouping comparator that uses only the year part of the map output key,
you can not make all records of the same year go to same reduce method call
You're right that by partitioning on the year youll get all the data for
a year in the same reducer,
so the comparator will effectively sort the data for each year by the temperature

Question :
What is the use of
job.setJarByClass(MapReduceJob.class)
at line number 16

1. This method sets the jar file in which each node will look for the Mapper and Reducer classes
2. This is used to define which is the Driver class
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2 both are correct
Ans : 1
Exp : This method sets the jar file in which each node will look for the Mapper and Reducer classes.
It does not create a jar from the given class. Rather, it identifies the jar containing the given class.
And yes, that jar file is "executed" (really the Mapper and Reducer in that jar file are executed) for the MapReduce job

Question :
At line number 18 if path "/out"
is already exist in HDFS then

1. Hadoop will delete this directory and create new empty directory and after processing put all output in this directory
2. It will write new data in existing directory and dont delete the existing data in this directory
3. Access Mostly Uused Products by 50000+ Subscribers
4. It will overwrite the existing content with new content
Ans : 3
Exp : It will throw exception, because hadoop will check the input and output specification before running any new job.
So it avoid already existing data being overwritten.

Question :
If you remove both
line 10 and 11 from
this code then what happen.

1. It will throw compile time error
2. Program will run successfully but Output file will not be created
3. Access Mostly Uused Products by 50000+ Subscribers
Ans : 3
Exp : As both are the default Input and Output format hence the program will run without any issue.

Question :
If you replace line 19
return job.waitForCompletion(true) ? 1 : 0;
with
job.submit();
then which is correct statement

1. In the cases MapReduce will run successfully
2. with waitForCompletion, Submit the job to the cluster and wait for it to finish
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above correct

Question :
In above method at line 1 if you replace
context.write(new Text(testCount.toString()), NullWritable.get());
with
context.write(testCount.toString(), NullWritable.get());
what would happen

1. It would not work, because String is directly not supported
2. It would work, but it will not give good performance
3. Access Mostly Uused Products by 50000+ Subscribers
4. Code will not compile at all after this change
Ans : 2
Exp : Text class stores text using standard UTF8 encoding. It provides methods to serialize, deserialize,
and compare texts at byte level. The type of length is integer and is serialized using zero-compressed format.
In addition, it provides methods for string traversal without converting the byte array to a string.
Also includes utilities for serializing/deserialing a string, coding/decoding a string,
checking if a byte array contains valid UTF8 code, calculating the length of an encoded string.

Question : Which is the correct statement when you poorly define the Partioner

1. it has a direct impact on the overall performance of your job and can reduce the performance of the overall job
2. a poorly designed partitioning function will not evenly distributes the values over the reducers
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 1 and 2 are correct
5. All 1, 2 and 3 are correct

Ans : 4
Exp : First, it has a direct impact on the overall performance of your job: a poorly designed partitioning function will not evenly distributes
the charge over the reducers, potentially losing all the interest of the map/reduce distributed infrastructure.

Question :
In above code we will replace
LongWritable with Long then what
would happen, the input to this
file from a file.

1. Code will run, but not produce result as expected
2. Code will not run as key has to be WritableComparable
3. Access Mostly Uused Products by 50000+ Subscribers
4. It will throw java.lang.ClassCastException
Ans : 4
Exp : The key class of a mapper that maps text files is always LongWritable.
That is because it contains the byte offset of the current
line and this could easily overflow an integer.

Question :Select the correct statement regarding reducer

1. Number of reducer is defined as part of Job Configuration
2. All values of the same key can be processed by multiple reducer.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2 and 3 are correct
5. 1 and 3 are correct