IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : Suppose cluster resources become scarce and the scheduler ..

1. Stop working
2. All jobs will hangs for some time
3. All jobs will be stopped and needs to be re-started
4. ResourceManager symmetrically request back resources from a running application

Correct Answer : 4
As previously described, the ResourceManager is the master that arbitrates all the available cluster resources, thereby helping manage the distributed

applications running on the YARN system. It works together with the per-node NodeManagers and the per-application ApplicationMasters. In YARN, the ResourceManager is primarily
limited to scheduling-that is, it allocates available resources in the system among the competing applications but does not concern itself with per-application state management.
The scheduler handles only an overall resource profile for each application, ignoring local optimizations and internal application flow. In fact, YARN completely departs from the
static assignment of map and reduce slots because it treats the cluster as a resource pool. Because of this clear separation of responsibilities coupled with the modularity
described previously, the ResourceManager is able to address the important design requirements of scalability and support for alternative programming paradigms. In contrast to
many other workflow schedulers, the ResourceManager also has the ability to symmetrically request back resources from a running application. This situation typically happens when
cluster resources become scarce and the scheduler decides to reclaim some (but not all) of the resources that were given to an application.

In YARN, ResourceRequests can be strict or negotiable. This feature provides ApplicationMasters with a great deal of flexibility on how to fulfill the reclamation requests-for
example, by picking containers to reclaim that are less crucial for the computation, by checkpointing the state of a task, or by migrating the computation to other running
containers. Overall, this scheme allows applications to preserve work, in contrast to platforms that kill containers to satisfy resource constraints. If the application is
noncollaborative, the ResourceManager can, after waiting a certain amount of time, obtain the needed
resources by instructing the NodeManagers to forcibly terminate containers.

ResourceManager failures remain significant events affecting cluster availability. As of this writing, the ResourceManager will restart running ApplicationMasters as it recovers
its state. If the framework supports restart capabilities-and most will for routine fault tolerance-the platform will automatically restore users pipelines. In contrast to the
Hadoop 1.0 JobTracker, it is important to mention the tasks for which the ResourceManager is not responsible. Other than tracking
application execution flow and task fault tolerance, the ResourceManager will not provide access to the application status (servlet; now part of the ApplicationMaster) or track
previously executed jobs, a responsibility that is now delegated to the JobHistoryService (a daemon running on a separated node). This is consistent with the view that the
ResourceManager should handle only live resource scheduling, and helps YARN central components scale better than Hadoop 1.0 JobTracker.

Question :

The ____________ supports a number of features such as weights on queues (heavier queues get more containers), minimum shares, maximum shares, and FIFO policy within queues, but
the basic idea is to share the resources as uniformly as possible.

1. Fair Scheduler
2. Capacity Scheduler
3. FIFO Scheduler
4. Both 1 and 2
5. Both 2 and 3

Correct Answer : 1

The Fair scheduler is a third pluggable scheduler for Hadoop that provides another way to share large clusters. Fair scheduling is a method of assigning
resources to applications such that all applications get, on average, an equal share of resources over time.

In Hadoop version 1, the Fair scheduler uses the term "pool" to refer to a queue. Starting with the YARN Fair scheduler, the term "queue"
will be used instead of "pool." To provide backward compatibility with the original Fair scheduler, "queue" elements can be named as "pool"
elements.

In the Fair scheduler model, every application belongs to a queue. YARN containers are given to the queue with the least amount of allocated resources.
Within the queue, the application that has the fewest resources is assigned the container. By default, all users share a single queue, called "default." If an
application specifically lists a queue in a container resource request, the request is submitted to that queue. It is also possible to configure the Fair
scheduler to assign queues based on the user name included with the request. The Fair scheduler supports a number of features such as weights on
queues (heavier queues get more containers), minimum shares, maximum shares, and FIFO policy within queues, but the basic idea is to share the
resources as uniformly as possible.

Question : In your cluster ResourceManager is configured with the "Fair Scheduler", and on average every hour Hadoop runs jobs in parallel.
Now currently single job is running, how much of the resource capacity of the cluster will be used by this running single job.

1. 1/100 resource of the cluster
2. 20% of the cluster capacity
3. May use full capacity of the cluster
4. It can not be found

Correct Answer : 3
Under the Fair scheduler, when a single application is running, that application may request the entire cluster (if needed). If additional applications are
submitted, resources that are free are assigned "fairly" to the new applications so that each application gets roughly the same amount of resources. The
Fair scheduler also applies the notion of preemption, whereby containers can be requested back from the ApplicationMaster. Depending on the
configuration and application design, preemption and subsequent assignment can be either friendly or forceful.

Related Questions

Question : You are working with a Credit Card company, and your marketing company has access to monthly expense detail of CC. However, to retain customer and give them better
offer they are also like to tap on users social data, to understand CC holder behavior. How best can they achieve this task?

1. By loading both social data in the current Enterprise Data Warehouse, then run analytics.

2. By loading social data in BigInsights for exploration then moving resulting data to Enterprise Data Warehouse, and merging with expense data for analytics

3. Access Mostly Uused Products by 50000+ Subscribers

4. By creating a dedicated data mart in their current Enterprise Data Warehouse

Question : Which of the following are example of unstructured data?
A. HBase table
B. Tweet
C. Netezza table
D. Internet Protocol Detail Record

1. A,B,C
2. B,C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,D
5. A,B,C,D

Question : Which of the following is a key consideration when determining the potential use of Models in Big Data Deployments?

1. Stability of schema

2. Level of Information Processing

3. Access Mostly Uused Products by 50000+ Subscribers

4. All of the above

Question : Which of the following statement is true, with regards to BigInsight

1. Replaces the traditional Data warehouses

2. Can exchange information with the traditional Data warehouses only

3. Access Mostly Uused Products by 50000+ Subscribers

4. Supports data exchange with a number of sources

Question : You are , working as a BigData Project Manager, you have following thins, which needs to be covered

1. Project Requirement for Software Selection
2. Evaluate the initial functional fit of a vendors software solution

Which of the one, will you choose from below ?

1. Component Model

2. Requirements Matrix

3. Access Mostly Uused Products by 50000+ Subscribers

4. Architecture Diagram

Question : Which of the following is NOT a valid Big Data platform integration?

1. Platform plugins

2. Intraplatform integration

3. Access Mostly Uused Products by 50000+ Subscribers

4. Network integration