Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question : What are the common problems with map-side join?

1. The most common problem with map-side joins is introducing a high level of code complexity.
This complexity has several downsides: increased risk of bugs and performance degradation.
Developers are cautioned to rarely use map-side joins.
2. The most common problem with map-side joins is lack of the avaialble map slots since map-side joins require a lot of mappers.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The most common problem with map-side join is not clearly specifying primary index in the join.
This can lead to very slow performance on large datasets.

Correct Answer : Get Lastest Questions and Answer :

Exp: - Map-side join uses memory for joining the data based on a key. As a result the data size is limited to the size of the available memory. If this exceeds available memory an out of memory error will occur

Question : Will settings using Java API overwrite values in configuration files?

1. No. The configuration settings in the configuration file takes precedence
2. Yes. The configuration settings using Java API take precedence
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only global configuration settings are captured in configuration files on namenode.
There are only a very few job parameters that can be set using Java API

Correct Answer : Get Lastest Questions and Answer :

Explanation: Developer has full control over the setting on Hadoop cluster. All configurations can be changed via Java API

Question : What is distributed cache?

1. The distributed cache is special component on namenode that will cache frequently used data for faster client response.
It is used during reduce step
2. The distributed cache is special component on datanode that will cache frequently used data
for faster client response. It is used during map step
3. Access Mostly Uused Products by 50000+ Subscribers
4. The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.

Correct Answer : Get Lastest Questions and Answer :

Explanation: Distributed cache is the Hadoop answer to the problem of deploying third-party libraries. Distributed cache will allow libraries to be deployed to all datanodes.

Related Questions

Question :Hadoop daemon can share the JVM

1. True
2. False

Question :

In a cluster single node can run all the daemons ?

1. Yes
2. No

Question : Which daemons is considered as master

1. NameNode
2. Secondary NameNode
3. Job Tracker
4. 1,2 and 3 are correct
5. 1 and 3 are correct

Question : Which node is considered as slave nodes

1. Secondary NameNode
2. DataNode
3. TaskTracker
4. 1,2 and 3 are correct
5. 2 and 3 are correct

Question : Which daemon stores the file data blocks ?

1. NameNode
2. TaskTracker
3. DataNode
4. Secondary Data Node

Question : When a client submits a Job, its configuration information is packaged into XML file

1. True
2. False