Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Which of the following you can use in MRv to monitor jobs

1. Job Tracker or Task Tracker Web Uis

2. you can use the metrics database (available through the MCS to monitor jobs

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2
5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: Using the MCS to Monitor MRv1 jobs

You can use the MCS to show granular job and task information in a cluster.

To display this level of information in the MCS, you must configure the metrics database. The MCS can be used to display the metrics only for MRv1 jobs. To
view metrics for MRv2 (YARN) jobs, you can use the YARN Resource Manager WebUI.

The first time you log in to the MCS, you ll need to specify the URL for the metrics database (database-server:3306), username, password, and name of
database (metrics).

Question : In MCS, What all information you can track about a Job

1. The time the job started executing

2. Percentage of map tasks executed

3. Access Mostly Uused Products by 50000+ Subscribers

4. 2,3

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: : IN MCS you can monitor all of the following

Status of job (green indicates success and red indicates failure)
Name of job
User that submitted the job
The time the job started executing
Percentage of map tasks executed
Percentage of reduce tasks executed
The total duration between the time the job started executing and when it finished
The job id
The time the job was submitted (same or some time before the job started)
The time the job completed

You can dig into the details of a task by clicking the task id or primary attempt from the previous screen. The details of the task are displayed as follows:

status of the task
task attempt id
type of task (map or reduce)
progress (0-100%)
start time of the task
finish time of the task
time the shuffle phase ended
time the sort phase ended
duration of the task
node the task executed on
A link to the log file for the task
You can dig further into the task details by clicking the task attempt id.

You can display the log file associated with a given task by clicking the log link (from previous screen). The details in the log file include the following:

Standard out generated from this task
Standard error generated from this task
Syslog log file entries generated by this task
Profile output generated by this task
Debug script output generated by this task
Note that debug scripts are optional and must be configured to run.

Question : Which all of the problems can be solved using MapReduce?

1. Summarizing data

2. Filtering Data

3. Access Mostly Uused Products by 50000+ Subscribers

4. 2,3

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: All following problems can be solved with Hadoop
Problems 1. Modeling true risk 2. Customer churn analysis 3. Recommendation 4. Ad targeting 5. PoS transaction analysis 6. Analyzing network data to predict
failure 7. Threat analysis 8. Trade surveillance engine 9. Search quality 10. Data "sandbox"

To solve some of above problem you have to do below operations as well.
A. Summarizing data
B. Filtering Data
C. Organizing Data

Related Questions

Question : Which statement is true with respect to MapReduce . or YARN

1. It is the newer version of MapReduce, using this performance of the data processing can be increased.
2. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker,
resource management and job scheduling or monitoring, into separate daemons.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. Only 2 and 3 are correct
Ans : 5
Exp : MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker,
resource management and job scheduling or monitoring, into separate daemons. The idea is to have a global ResourceManager (RM)
and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : Which statement is true about ApplicationsManager

1. is responsible for accepting job-submissions
2. negotiating the first container for executing the application specific ApplicationMaster
and provides the service for restarting the ApplicationMaster container on failure.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. 1 and 2 are correct
Ans : 5
Exp : The ApplicationsManager is responsible for accepting job-submissions,
negotiating the first container for executing the application specific ApplicationMaster and provides the
service for restarting the ApplicationMaster container on failure.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : Which tool is used to list all the blocks of a file ?

1. hadoop fs
2. hadoop fsck
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not Possible
Ans : 2

Question : Which two daemons typically run on each slave node in a Hadoop cluster running MapReduce v (MRv) on YARN?

1. TaskTracker

2. Secondary NameNode

3. NodeManager

4. DataNode

5. ZooKeeper

6. JobTracker

7. NameNode

8. JournalNode

1. 1,2
2. 2,3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 5,6
4. 7,8

Question : How does the Hadoop framework determine the number of Mappers required for a MapReduce job on a cluster running MapReduce v (MRv) on YARN?

1. The number of Mappers is equal to the number of InputSplits calculated by the client submitting the job
2. The ApplicationMaster chooses the number based on the number of available nodes

3. Access Mostly Uused Products by 50000+ Subscribers
4. NodeManager where the job's HDFS blocks reside
5. The developer specifies the number in the job configuration

Question :

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25
KB. Because your Hadoop cluster isn't optimized for storing and processing many small files you
decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with
Python using Hadoop streaming
Which data serialization system gives you the flexibility to do this?

A. CSV
B. XML
C. HTML
D. Avro
E. Sequence Files
F. JSON

1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. D,E
5. C,E

Question : Map the following in case of YARN

1. YARN Resource Manager
2. YARN Node Managers
3. Access Mostly Uused Products by 50000+ Subscribers

a. which launch and monitor the tasks of jobs
b. allocates the cluster resources to jobs
c. which coordinates the tasks running in the MapReduce job

1. 1-a, 2-b,3-c
2. 1-b, 2-a,3-c
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1-a, 2-c,3-b

Question :
At line number 4 you replace
with
"this.conf= new Configuration(otherConf)"
where otherConf is an object
of Configuration class.

1. A new configuration with the same settings cloned from another.
2. It will give runtime error
3. Access Mostly Uused Products by 50000+ Subscribers
Ans : 1
Exp : A new configuration with the same settings cloned from another.

Configuration()
A new configuration.
Configuration(boolean loadDefaults)
A new configuration where the behavior of reading from the default resources can be turned off.
Configuration(Configuration other)
A new configuration with the same settings cloned from another.

Question : Suppose that your jobs input is a (huge) set of word tokens and their number of occurrences (word count)
and that you want to sort them by number of occurrences. Then which one of the following class will help you to get globally sorted file

1. Combiner
2. Partitioner
3. Access Mostly Uused Products by 50000+ Subscribers
4. By Default all the files are sorted.

Ans : 2
Exp : it is possible to produce a set of sorted files that, if concatenated,
would form a globally sorted file. The secret to doing this is to use a partitioner that
respects the total order of the output. For example, if we had four partitions, we could put keys
for temperatures less than negative 10 C in the first partition, those between negative 10 C and 0 C in the second,
those between 0 C and 10 C in the third, and those over 10C in the fourth.

Question :
Which of the following could
be replaced safely at line number 9

1. Job job = new Job();
2. Job job = new Job(conf);
3. Access Mostly Uused Products by 50000+ Subscribers
4. You can not change this line from either 1 or 2
Ans : 3
Exp : All 1 and 2 are correct, however not having conf will ignore the custom configuration and 2nd argument present Custom job name.
If you dont provide it take default job name defined by framework.

Question :
If we are processing
input data from database
then at line 10 which of
the following is correct
InputFormat for reading from DB

1. DataBaseInputFormat
2. DBMSInputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not Supported
Ans : 3
Exp : The DBInputFormat is an InputFormat class that allows you to read data from a database.
An InputFormat is Hadoops formalization of a data source; it can mean files formatted in a particular way,
data read from a database, etc. DBInputFormat provides a simple method of scanning entire tables from a database,
as well as the means to read from arbitrary SQL queries performed against the database.
Most queries are supported, subject to a few limitations

Question :
At line number 13 you
replace number of reducer to 1
and Setting Reducer class
as IdenityReducer then
which of the following
statement is correct

1. In both the cases behaviors is same
2. With 0 reducer, reduce step will be skipped and mapper output will be the final out
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 3 both are correct
5. 2 and 3 both are correct

Ans : 5
Exp : If you do not need sorting of map results - you set 0 reduced, and the job is called map only.
If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.
we have a third case : we do need aggregation and, in this case we need reducer.

Question : When you are implementing the secondary sort (Sorting based on values) like, following output is produced as Key Part of the Mapper

2001 24.3
2002 25.5
2003 23.6
2004 29.4
2001 21.3
2002 24.5
2003 25.6
2004 26.4

.
.
2014 21.0

Now you want same year output goes to the same reducer, whic of the following will help yoo to do so;

1. CustomPartitioner
2. Group Comparator
3. Access Mostly Uused Products by 50000+ Subscribers
4. By Implementing Custom WritableComparator
5. Or using Single Reducer

Ans : 2
Exp : Map output key is year and temperature to achieve sorting.
Unless you define a grouping comparator that uses only the year part of the map output key,
you can not make all records of the same year go to same reduce method call
You're right that by partitioning on the year youll get all the data for
a year in the same reducer,
so the comparator will effectively sort the data for each year by the temperature

Question :
What is the use of
job.setJarByClass(MapReduceJob.class)
at line number 16

1. This method sets the jar file in which each node will look for the Mapper and Reducer classes
2. This is used to define which is the Driver class
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2 both are correct
Ans : 1
Exp : This method sets the jar file in which each node will look for the Mapper and Reducer classes.
It does not create a jar from the given class. Rather, it identifies the jar containing the given class.
And yes, that jar file is "executed" (really the Mapper and Reducer in that jar file are executed) for the MapReduce job

Question :
At line number 18 if path "/out"
is already exist in HDFS then

1. Hadoop will delete this directory and create new empty directory and after processing put all output in this directory
2. It will write new data in existing directory and dont delete the existing data in this directory
3. Access Mostly Uused Products by 50000+ Subscribers
4. It will overwrite the existing content with new content
Ans : 3
Exp : It will throw exception, because hadoop will check the input and output specification before running any new job.
So it avoid already existing data being overwritten.

Question :
If you remove both
line 10 and 11 from
this code then what happen.

1. It will throw compile time error
2. Program will run successfully but Output file will not be created
3. Access Mostly Uused Products by 50000+ Subscribers
Ans : 3
Exp : As both are the default Input and Output format hence the program will run without any issue.

Question :
If you replace line 19
return job.waitForCompletion(true) ? 1 : 0;
with
job.submit();
then which is correct statement

1. In the cases MapReduce will run successfully
2. with waitForCompletion, Submit the job to the cluster and wait for it to finish
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above correct

Question :
In above method at line 1 if you replace
context.write(new Text(testCount.toString()), NullWritable.get());
with
context.write(testCount.toString(), NullWritable.get());
what would happen

1. It would not work, because String is directly not supported
2. It would work, but it will not give good performance
3. Access Mostly Uused Products by 50000+ Subscribers
4. Code will not compile at all after this change
Ans : 2
Exp : Text class stores text using standard UTF8 encoding. It provides methods to serialize, deserialize,
and compare texts at byte level. The type of length is integer and is serialized using zero-compressed format.
In addition, it provides methods for string traversal without converting the byte array to a string.
Also includes utilities for serializing/deserialing a string, coding/decoding a string,
checking if a byte array contains valid UTF8 code, calculating the length of an encoded string.

Question : Which is the correct statement when you poorly define the Partioner

1. it has a direct impact on the overall performance of your job and can reduce the performance of the overall job
2. a poorly designed partitioning function will not evenly distributes the values over the reducers
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 1 and 2 are correct
5. All 1, 2 and 3 are correct

Ans : 4
Exp : First, it has a direct impact on the overall performance of your job: a poorly designed partitioning function will not evenly distributes
the charge over the reducers, potentially losing all the interest of the map/reduce distributed infrastructure.

Question :
In above code we will replace
LongWritable with Long then what
would happen, the input to this
file from a file.

1. Code will run, but not produce result as expected
2. Code will not run as key has to be WritableComparable
3. Access Mostly Uused Products by 50000+ Subscribers
4. It will throw java.lang.ClassCastException
Ans : 4
Exp : The key class of a mapper that maps text files is always LongWritable.
That is because it contains the byte offset of the current
line and this could easily overflow an integer.

Question :Select the correct statement regarding reducer

1. Number of reducer is defined as part of Job Configuration
2. All values of the same key can be processed by multiple reducer.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2 and 3 are correct
5. 1 and 3 are correct