Premium

Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : Map the following in case of YARN

1. YARN Resource Manager
2. YARN Node Managers
3. Access Mostly Uused Products by 50000+ Subscribers

a. which launch and monitor the tasks of jobs
b. allocates the cluster resources to jobs
c. which coordinates the tasks running in the MapReduce job

  : Map the following in case of YARN
1. 1-a, 2-b,3-c
2. 1-b, 2-a,3-c
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1-a, 2-c,3-b

Correct Answer : Get Lastest Questions and Answer :


Explanation: Components of Mapreduce Job Flow:
Mapreduce job flow on YARN involves below components.
A Client node, which submits the Mapreduce job.
The YARN Resource Manager, which allocates the cluster resources to jobs.
The YARN Node Managers, which launch and monitor the tasks of jobs.
The MapReduce Application Master, which coordinates the tasks running in the MapReduce job. The application master and the MapReduce tasks run in containers that are scheduled by the
resource manager, and managed by the node managers.
The HDFS file system is used for sharing job files between the above entities.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com




Question :
At line number 4 you replace
with
"this.conf= new Configuration(otherConf)"
where otherConf is an object
of Configuration class.
 :
1. A new configuration with the same settings cloned from another.
2. It will give runtime error
3. Access Mostly Uused Products by 50000+ Subscribers
Ans : 1
Exp : A new configuration with the same settings cloned from another.

Configuration()
A new configuration.
Configuration(boolean loadDefaults)
A new configuration where the behavior of reading from the default resources can be turned off.
Configuration(Configuration other)
A new configuration with the same settings cloned from another.


Question : Suppose that your jobs input is a (huge) set of word tokens and their number of occurrences (word count)
and that you want to sort them by number of occurrences. Then which one of the following class will help you to get globally sorted file
 :
1. Combiner
2. Partitioner
3. Access Mostly Uused Products by 50000+ Subscribers
4. By Default all the files are sorted.

Ans : 2
Exp : it is possible to produce a set of sorted files that, if concatenated,
would form a globally sorted file. The secret to doing this is to use a partitioner that
respects the total order of the output. For example, if we had four partitions, we could put keys
for temperatures less than negative 10 C in the first partition, those between negative 10 C and 0 C in the second,
those between 0 C and 10 C in the third, and those over 10C in the fourth.



Question :
Which of the following could
be replaced safely at line number 9
 :
1. Job job = new Job();
2. Job job = new Job(conf);
3. Access Mostly Uused Products by 50000+ Subscribers
4. You can not change this line from either 1 or 2
Ans : 3
Exp : All 1 and 2 are correct, however not having conf will ignore the custom configuration and 2nd argument present Custom job name.
If you dont provide it take default job name defined by framework.


Question :
If we are processing
input data from database
then at line 10 which of
the following is correct
InputFormat for reading from DB
 :
1. DataBaseInputFormat
2. DBMSInputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not Supported
Ans : 3
Exp : The DBInputFormat is an InputFormat class that allows you to read data from a database.
An InputFormat is Hadoops formalization of a data source; it can mean files formatted in a particular way,
data read from a database, etc. DBInputFormat provides a simple method of scanning entire tables from a database,
as well as the means to read from arbitrary SQL queries performed against the database.
Most queries are supported, subject to a few limitations


Question :
At line number 13 you
replace number of reducer to 1
and Setting Reducer class
as IdenityReducer then
which of the following
statement is correct
 :
1. In both the cases behaviors is same
2. With 0 reducer, reduce step will be skipped and mapper output will be the final out
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 3 both are correct
5. 2 and 3 both are correct

Ans : 5
Exp : If you do not need sorting of map results - you set 0 reduced, and the job is called map only.
If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.
we have a third case : we do need aggregation and, in this case we need reducer.



Question : When you are implementing the secondary sort (Sorting based on values) like, following output is produced as Key Part of the Mapper

2001 24.3
2002 25.5
2003 23.6
2004 29.4
2001 21.3
2002 24.5
2003 25.6
2004 26.4

.
.
2014 21.0

Now you want same year output goes to the same reducer, whic of the following will help yoo to do so;

 :
1. CustomPartitioner
2. Group Comparator
3. Access Mostly Uused Products by 50000+ Subscribers
4. By Implementing Custom WritableComparator
5. Or using Single Reducer

Ans : 2
Exp : Map output key is year and temperature to achieve sorting.
Unless you define a grouping comparator that uses only the year part of the map output key,
you can not make all records of the same year go to same reduce method call
You're right that by partitioning on the year youll get all the data for
a year in the same reducer,
so the comparator will effectively sort the data for each year by the temperature




Question :
What is the use of
job.setJarByClass(MapReduceJob.class)
at line number 16
 :
1. This method sets the jar file in which each node will look for the Mapper and Reducer classes
2. This is used to define which is the Driver class
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2 both are correct
Ans : 1
Exp : This method sets the jar file in which each node will look for the Mapper and Reducer classes.
It does not create a jar from the given class. Rather, it identifies the jar containing the given class.
And yes, that jar file is "executed" (really the Mapper and Reducer in that jar file are executed) for the MapReduce job



Question :
At line number 18 if path "/out"
is already exist in HDFS then

 :
1. Hadoop will delete this directory and create new empty directory and after processing put all output in this directory
2. It will write new data in existing directory and dont delete the existing data in this directory
3. Access Mostly Uused Products by 50000+ Subscribers
4. It will overwrite the existing content with new content
Ans : 3
Exp : It will throw exception, because hadoop will check the input and output specification before running any new job.
So it avoid already existing data being overwritten.


Question :
If you remove both
line 10 and 11 from
this code then what happen.
 :
1. It will throw compile time error
2. Program will run successfully but Output file will not be created
3. Access Mostly Uused Products by 50000+ Subscribers
Ans : 3
Exp : As both are the default Input and Output format hence the program will run without any issue.


Question :
If you replace line 19
return job.waitForCompletion(true) ? 1 : 0;
with
job.submit();
then which is correct statement
 :
1. In the cases MapReduce will run successfully
2. with waitForCompletion, Submit the job to the cluster and wait for it to finish
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above correct

Correct Answer : Get Lastest Questions and Answer :

Explanation:
waitForCompletion Submit the job to the cluster and wait for it to finish.
submit Submit the job to the cluster and return immediately.





Question :
In above method at line 1 if you replace
context.write(new Text(testCount.toString()), NullWritable.get());
with
context.write(testCount.toString(), NullWritable.get());
what would happen
  :
1. It would not work, because String is directly not supported
2. It would work, but it will not give good performance
3. Access Mostly Uused Products by 50000+ Subscribers
4. Code will not compile at all after this change
Ans : 2
Exp : Text class stores text using standard UTF8 encoding. It provides methods to serialize, deserialize,
and compare texts at byte level. The type of length is integer and is serialized using zero-compressed format.
In addition, it provides methods for string traversal without converting the byte array to a string.
Also includes utilities for serializing/deserialing a string, coding/decoding a string,
checking if a byte array contains valid UTF8 code, calculating the length of an encoded string.


Question : Which is the correct statement when you poorly define the Partioner
  :
1. it has a direct impact on the overall performance of your job and can reduce the performance of the overall job
2. a poorly designed partitioning function will not evenly distributes the values over the reducers
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 1 and 2 are correct
5. All 1, 2 and 3 are correct

Ans : 4
Exp : First, it has a direct impact on the overall performance of your job: a poorly designed partitioning function will not evenly distributes
the charge over the reducers, potentially losing all the interest of the map/reduce distributed infrastructure.


Question :
In above code we will replace
LongWritable with Long then what
would happen, the input to this
file from a file.

  :
1. Code will run, but not produce result as expected
2. Code will not run as key has to be WritableComparable
3. Access Mostly Uused Products by 50000+ Subscribers
4. It will throw java.lang.ClassCastException
Ans : 4
Exp : The key class of a mapper that maps text files is always LongWritable.
That is because it contains the byte offset of the current
line and this could easily overflow an integer.


Question :Select the correct statement regarding reducer


  :
1. Number of reducer is defined as part of Job Configuration
2. All values of the same key can be processed by multiple reducer.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2 and 3 are correct
5. 1 and 3 are correct

Correct Answer : Get Lastest Questions and Answer :




Related Questions


Question : Which of the following is correct options to pass different type of files for MapReduce job


 : Which of the following is correct options to pass different type of files for MapReduce job
1. hadoop jar --files

2. hadoop jar --libjars

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,3

5. 1,2,3



Question : How can you use Java API, to submit Distributed Cache file to job in a Driver class

 : How can you use Java API, to submit Distributed Cache file to job in a Driver class
1. DistributedCache.addCacheFile()

2. DistributedCache.addCacheArchive()

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3


Question : When you use HBase as Source and Sink for your MapReduce job, which statement is true


 : When you use HBase as Source and Sink for your MapReduce job, which statement is true
1. Data are splitted based on region, and map task will be launched for each region data.

2. After map tasks partitions will be created and each key will be in same partition. However, a partition can have multiple keys

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3



Question : You have a MapReduce job, which uses HBase as source and Sink. It reads some stock market data PRICE, DIVIDEND, VOLUME (All there are stored in
different column family.) However, data come from various vendor like Bloomberg, Reuters and Markit. Using MapReduce job we filter out most accurate data.
And marked them as valid record and save back in same table with updated flag value. Table name is "MARKET_DATA" . You have written following Driver code.
And also you want to process data for DIVIDEND column family.

Scan scan = new Scan();
scan.setMaxVersions();
scan.addFamily(Bytes.toBytes("AAAAA"))
XXXXX.initTableMapperJob(YYYYY , scan, CustomMapper.Class, Text.class, LongWritable.class , job );
XXXXX.initTableReducerJob(YYYYY , CustomReducer.Class, job );


Please put proper class name and required value , in place of XXXX and YYYY

 : You have a MapReduce job, which uses HBase as source and Sink. It reads some stock market data PRICE, DIVIDEND, VOLUME  (All there are stored in
1. AAAAA->"DIVIDEND", XXXXX-> TableMapReduceUtil , YYYYY->"MARKET_DATA"
2. XXXXX->"DIVIDEND", AAAAA-> TableMapReduceUtil , YYYYY->"MARKET_DATA"
3. Access Mostly Uused Products by 50000+ Subscribers


Question : When you use mapred API, to run your job. Select statement which is true


 : When you use mapred API, to run your job. Select statement which is true
1. JobClient.submitJob() is an Asynchronous call

2. JobClinet.runJob() is a synchronous call

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 2,3



Question : Which of the following is an ideal way to chain multiple jobs, which also has non map-reduce job?

 : Which of the following is an ideal way to chain multiple jobs, which also has non map-reduce job?
1. JobControl

2. OOZie workflow

3. Access Mostly Uused Products by 50000+ Subscribers

4. Streaming