Cloudera Hadoop Administrator Certification Certification Questions and Answer (Dumps and Practice Questions)

Question : Which YARN daemon or service monitors and Controls per-application resource using (e.g., memory CPU)?

1. ApplicationMaster
2. NodeManager
3. Access Mostly Uused Products by 50000+ Subscribers
4. ResourceManager

Correct Answer : Get Lastest Questions and Answer :

Explanation: An important new concept in YARN is the ApplicationMaster. The ApplicationMaster is, in effect, an instance of a framework-specific library and is
responsible for negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the containers and their
resource consumption. It has the responsibility of negotiating for appropriate resource containers from the ResourceManager, tracking their status, and
monitoring progress.
The ApplicationMaster design enables YARN to offer the following important new features:
Scale: The ApplicationMaster provides much of the job-oriented functionality of the JobTracker so that the entire system can scale more
dramatically. Simulations have shown that jobs may scale to 10,000 node clusters composed of modern hardware without significant issue. As a
pure scheduler, the ResourceManager does not, for example, have to provide fault tolerance for resources across the cluster. By shifting fault
tolerance to the ApplicationMaster instance, control becomes local, rather than global. Furthermore, because an instance of an ApplicationMaster
is made available per application, the ApplicationMaster itself is rarely a bottleneck in the cluster.
Openness: Moving all application framework-specific code into the ApplicationMaster generalizes the system so that it can now support multiple
frameworks such as MapReduce, MPI, and Graph Processing.
These features were the result of some key YARN design decisions:
Move all complexity (to the extent possible) to the ApplicationMaster, while providing sufficient functionality to allow application framework
authors sufficient flexibility and power.
Because it is essentially user code, do not trust the ApplicationMaster(s). In other words, no ApplicationMaster is a privileged service.
The YARN system (ResourceManager and NodeManager) has to protect itself from faulty or malicious ApplicationMaster(s) and resources
granted to them at all costs.
In reality, every application has its own instance of an ApplicationMaster. However, its completely feasible to implement an ApplicationMaster to
manage a set of applications (e.g., ApplicationMaster for Pig or Hive to manage a set of MapReduce jobs). Furthermore, this concept has been stretched to
manage long-running services, which manage their own applications (e.g., launching HBase in YARN via a special HBaseAppMaster).

Question : Which YARN process run as container of a submitted job and is responsible for resource qrequests?

1. ApplicationManager
2. JobTracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. JobHistoryServer
5. ResoureManager

Correct Answer : Get Lastest Questions and Answer :

Explanation: A Container is a collection of physical resources on a single node, such as memory (RAM), CPU cores, and disks. There can be multiple Containers on a single Node (or a single large one). Every node in the system is considered to be composed of multiple Containers of minimum memory size (512MB or 1 GB, for example). The Application Master can request any Container as a multiple of the minimum memory size.

A Container thus represents a resource (memory, CPU) on a single node in a given cluster. A Container is supervised by the Node Manager and scheduled by the Resource Manager.

Each application starts out as an Application Master, which is itself a Container (often referred to as container -0). Once started, the Application Master must negotiate with the Resource Manager for more Containers. Container requests (and releases) can take place in a dynamic manner at run-time. For instance, a MapReduce job may request a certain amount of mapper Containers, and as they finish, release them and request that more reducer containers be started.

Question : Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable
time without starting long-running jobs?

1. Complexity Fair Scheduler (CFS)
2. Capacity Scheduler
3. Access Mostly Uused Products by 50000+ Subscribers
4. FIFO Scheduler

Correct Answer : Get Lastest Questions and Answer :

Explanation: The scheduler organizes apps further into queues, and shares resources fairly between these queues. By default, all users share a single queue, named default. If an app specifically lists a queue in a container resource request, the request is submitted to that queue. It is also possible to assign queues based on the user name included with the request through configuration. Within each queue, a scheduling policy is used to share resources between the running apps. The default is memory-based fair sharing, but FIFO and multi-resource with Dominant Resource Fairness can also be configured. Queues can be arranged in a hierarchy to divide resources and configured with weights to share the cluster in specific proportions
mapred.fairscheduler.sizebasedweight Take into account job sizes in calculating their weights for fair sharing. By default, weights are only based on job priorities. Setting this flag to true will make them based on the size of the job (number of tasks needed) as well,though not linearly (the weight will be proportional to the log of the number of tasks needed). This lets larger jobs get larger fair shares while still providing enough of a share to small jobs to let them finish fast. Boolean value, default: false.

Related Questions

Question : You want to understand more about how users browse your public website. For example, you want to know
which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your
website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster
analysis?

A. Sample the web server logs web servers and copy them into HDFS using curl
B. Ingest the server web logs into HDFS using Flume
C. Channel these clickstreams into Hadoop using Hadoop Streaming
D. Import all user clicks from your OLTP databases into Hadoop using Sqoop
E. Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers
4. D
5. E

Question : You need to analyze ,, images stored in JPEG format, each of which is approximately KB.
Because you Hadoop cluster isn't optimized for storing and processing many small files, you decide to do the
following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with python using
Hadoop streaming.
Which data serialization system gives the flexibility to do this?
A. CSV
B. XML
C. HTML
D. Avro
E. SequenceFiles
F. JSON

1. AB
2. AC
3. Access Mostly Uused Products by 50000+ Subscribers
4. CD
5. EF

Question : Identify two features/issues that YARN is designated to address:

A. Standardize on a single MapReduce API
B. Single point of failure in the NameNode
C. Reduce complexity of the MapReduce APIs
D. Resource pressure on the JobTracker
E. Ability to run framework other than MapReduce, such as MPI
F. HDFS latency

1. AB
2. AC
3. Access Mostly Uused Products by 50000+ Subscribers
4. CD
5. EF

Question : Which YARN daemon or service monitors and Controls per-application resource using (e.g., memory CPU)?

1. ApplicationMaster
2. NodeManager
3. Access Mostly Uused Products by 50000+ Subscribers
4. ResourceManager

Question : Which YARN process run as container of a submitted job and is responsible for resource qrequests?

1. ApplicationManager
2. JobTracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. JobHistoryServer
5. ResoureManager

Question : Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable
time without starting long-running jobs?

1. Complexity Fair Scheduler (CFS)
2. Capacity Scheduler
3. Access Mostly Uused Products by 50000+ Subscribers
4. FIFO Scheduler