IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : What determines where blocks are written into HDFS by client applications?

1. The client queries the NameNode, which returns information on which DataNodes to use and the client writes to those DataNodes
2. The client writes immediately to DataNodes based on the cluster's rack locality settings

3. Access Mostly Uused Products by 50000+ Subscribers

4. The client writes immediately to DataNodes at random

Correct Answer : Get Lastest Questions and Answer :

Explanation: The NameNode will return a list of DataNodes to which the client should write. The contents of the file are never sent to the NameNode

Question : How does the NameNode know which DataNodes are currently available on a cluster?

1. DataNodes are listed in the dfs.hosts file. The NameNode uses that as the definitive list of available DataNodes.
2. DataNodes heartbeat in to the master on a regular basis.

3. Access Mostly Uused Products by 50000+ Subscribers
4. The NameNode broadcasts a heartbeat on the network on a regular basis, and DataNodes respond.

Correct Answer : Get Lastest Questions and Answer :

Explanation: DataNodes heartbeat in to the master every three seconds. When a DataNode heartbeats in to the NameNode the first time, the NameNode marks it as being available. DataNodes
can be listed in a file pointed to by thedfs.hosts property, but this only lists the names of possible DataNodes. It is not a definitive list of those which are available but,
rather, a list of the only machines which may be used as DataNodes if they begin to heartbeat.

Question : How does the HDFS architecture provide data reliability?

1. Storing multiple replicas of data blocks on different DataNodes.

2. Reliance on SAN devices as a DataNode interface.
3. Access Mostly Uused Products by 50000+ Subscribers

4. DataNodes make copies of their data blocks, and put them on different local disks.

Correct Answer : Get Lastest Questions and Answer :

Exp: HDFS provides reliability by splitting a file into multiple blocks, and replicating each block on multiple different machines (3 by default). Although it is possible to use
RAID on DataNodes, this is not a recommended configuration as it reduces the amount of raw disk which can be used for data storage and is not necessary.

Related Questions

Question : The ApplicationReport received from the ResourceManager consists of the :

1. General application information: ApplicationId, queue to which the application was submitted,
user who submitted the application and the start time for the application.
2. ApplicationMaster details: the host on which the ApplicationMaster is running, the rpc port (if any)
on which it is listening for requests from clients and a token that the client needs to communicate with the ApplicationMaster.
3. Application tracking information: If the application supports some form of progress tracking,
it can set a tracking url which is available via ApplicationReport#getTrackingUrl that a client can look at to monitor progress.
4. ApplicationStatus: The state of the application as seen by the ResourceManager is available via ApplicationReport#getYarnApplicationState.
If the YarnApplicationState is set to FINISHED, the client should refer to ApplicationReport#getFinalApplicationStatus to check for the actual
success/failure of the application task itself. In case of failures, ApplicationReport#getDiagnostics may be useful to shed some more light on the the failure.
5. All of the above

Question :

Select the correct statement for the YARN

1. The ApplicationMaster is the actual owner of the job. It will be launched by the ResourceManager
and via the client will be provided all the necessary information and resources about the job that it has been tasked with to oversee and complete.
2. As the ApplicationMaster is launched within a container that may (likely will) be sharing a physical
host with other containers, given the multi-tenancy nature, amongst other issues, it cannot make any assumptions of things like
pre-configured ports that it can listen on.
3. When the ApplicationMaster starts up, several parameters are made available to it via the environment.
These include the ContainerId for the ApplicationMaster container, the application submission time and details about
the NodeManager host running the Application Master. Ref ApplicationConstants for parameter names
4. 1 and 2
5. 1,2 and 3

Question :

The ApplicationMaster has to emit __________ to the ResourceManager to keep it informed that the ApplicationMaster is alive and still running.

1. heartbeats
2. messages
3. events
4. Async Messages

Question :
A _____ is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.)

1. Node Manager
2. Container
3. ApplicationMaster
4. DataNode

Question : Select the correct option which is/are correct

1. YARN takes into account all of the available compute resources on each machine in the cluster.
2. Based on the available resources, YARN negotiates resource requests from applications (such as MapReduce) running in the cluster.
3. YARN then provides processing capacity to each application by allocating Containers.
4. 1 and 3
5. 1,2 and 3

Question : As a general recommendation, allowing for ______ Containers per disk and per core gives the best balance for cluster utilization.

1. One
2. Two
3. Three
4. Four