IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : The ApplicationReport received from the ResourceManager consists of the :

1. General application information: ApplicationId, queue to which the application was submitted,
user who submitted the application and the start time for the application.
2. ApplicationMaster details: the host on which the ApplicationMaster is running, the rpc port (if any)
on which it is listening for requests from clients and a token that the client needs to communicate with the ApplicationMaster.
3. Application tracking information: If the application supports some form of progress tracking,
it can set a tracking url which is available via ApplicationReport#getTrackingUrl that a client can look at to monitor progress.
4. ApplicationStatus: The state of the application as seen by the ResourceManager is available via ApplicationReport#getYarnApplicationState.
If the YarnApplicationState is set to FINISHED, the client should refer to ApplicationReport#getFinalApplicationStatus to check for the actual
success/failure of the application task itself. In case of failures, ApplicationReport#getDiagnostics may be useful to shed some more light on the the failure.
5. All of the above

Correct Answer : 5

The ApplicationReport received from the ResourceManager consists of the following:

General application information: ApplicationId, queue to which the application was submitted, user who submitted the application and the start time for the application.
ApplicationMaster details: the host on which the ApplicationMaster is running, the rpc port (if any) on which it is listening for requests from clients and a token that the client
needs to communicate with the ApplicationMaster.
Application tracking information: If the application supports some form of progress tracking, it can set a tracking url which is available via ApplicationReport#getTrackingUrl that
a client can look at to monitor progress.
ApplicationStatus: The state of the application as seen by the ResourceManager is available via ApplicationReport#getYarnApplicationState. If the YarnApplicationState is set to
FINISHED, the client should refer to ApplicationReport#getFinalApplicationStatus to check for the actual success/failure of the application task itself. In case of failures,
ApplicationReport#getDiagnostics may be useful to shed some more light on the the failure.
If the ApplicationMaster supports it, a client can directly query the ApplicationMaster itself for progress updates via the host:rpcport information obtained from the
ApplicationReport. It can also use the tracking url obtained from the report if available.
In certain situations, if the application is taking too long or due to other factors, the client may wish to kill the application. The ClientRMProtocol supports the
forceKillApplication call that allows a client to send a kill signal to the ApplicationMaster via the ResourceManager. An ApplicationMaster if so designed may also support an abort
call via its rpc layer that a client may be able to leverage.

Question :

Select the correct statement for the YARN

1. The ApplicationMaster is the actual owner of the job. It will be launched by the ResourceManager
and via the client will be provided all the necessary information and resources about the job that it has been tasked with to oversee and complete.
2. As the ApplicationMaster is launched within a container that may (likely will) be sharing a physical
host with other containers, given the multi-tenancy nature, amongst other issues, it cannot make any assumptions of things like
pre-configured ports that it can listen on.
3. When the ApplicationMaster starts up, several parameters are made available to it via the environment.
These include the ContainerId for the ApplicationMaster container, the application submission time and details about
the NodeManager host running the Application Master. Ref ApplicationConstants for parameter names
4. 1 and 2
5. 1,2 and 3

Correct Answer : 5

Explanation: The ApplicationMaster is the actual owner of the job. It will be launched by the ResourceManager and via the client will be provided all the necessary information and
resources about the job that it has been tasked with to oversee and complete.
As the ApplicationMaster is launched within a container that may (likely will) be sharing a physical host with other containers, given the multi-tenancy nature, amongst other
issues, it cannot make any assumptions of things like pre-configured ports that it can listen on.
When the ApplicationMaster starts up, several parameters are made available to it via the environment. These include the ContainerId for the ApplicationMaster container, the
application submission time and details about the NodeManager host running the Application Master. Ref ApplicationConstants for parameter names.
All interactions with the ResourceManager require an ApplicationAttemptId (there can be multiple attempts per application in case of failures). The ApplicationAttemptId can be
obtained from the ApplicationMaster containerId. There are helper apis to convert the value obtained from the environment into objects.

After an ApplicationMaster has initialized itself completely, it needs to register with the ResourceManager via AMRMProtocol#registerApplicationMaster. The ApplicationMaster always
communicate via the Scheduler interface of the ResourceManager.

Question :

The ApplicationMaster has to emit __________ to the ResourceManager to keep it informed that the ApplicationMaster is alive and still running.

1. heartbeats
2. messages
3. events
4. Async Messages

Correct Answer : 1

Explanation: The ApplicationMaster has to emit heartbeats to the ResourceManager to keep it informed that the ApplicationMaster is alive and still running. The timeout expiry interval at
the ResourceManager is defined by a config setting accessible via YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS with the default being defined by
YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS. The AMRMProtocol#allocate calls to the ResourceManager count as heartbeats as it also supports sending progress update
information. Therefore, an allocate call with no containers requested and progress information updated if any is a valid way for making heartbeat calls to the ResourceManager.

Related Questions

Question : You are working in an organization, which provide data storage solutions for many companies and government. However, their are various types of data , which of the
following approach can help to solve the impact on performance data and capacity

1. Define a data catalog in a traditional data warehouse

2. Create different solutions to handle every kind of data

3. Store a wide range of data formats on the same platform

4. Define a comprehensive taxonomy and constantly review

Question : Which of the following is a browser based virtualization tool?

1. BigR

2. BigSheets

3. Analytics Workbench

4. Watson Explorer

Question : You are working in a Financial Risk Analytics company, where you have last years of history data, which is stored in TeraData data warehouse system and underline
storage is very costly. Hence, you decided to move this data to Hadoop commodity hardware for historical data and for ongoing data they will still use Teradata and uses federation
method to
access both sets of data. which of the following Big Data value proposition for this use case?

1. IBM Logical Data Warehouse and IBM Big SQL
2. Enterprise Data Warehouse
3. Pure Data for Analytics
4. InfoSphere Information Server

Question : Which of the following statements is TRUE regarding IAAS vs PAAS?

1. Performance and scalability requirements are a critical factor for deciding between Platform as a Service and Infrastructure as a Service deployment models
2. In PAAS, you will be getting Root access to the operating system.

3. If your web application has a very high transactions volumes are good candidates for Platform as a Service

4. In an infrastructure as a service deployment, the cloud provider provides security patching, monitoring and fail over capabilities

Question : You are an enterprise architect of ARINIKA Inc. You have after every days there will be a big spike of new data with multiple TB. And regulations says data
older than one year needs to be archived older than 3 years data needs to be removed. Which of the following is a best solution as well as low cost.

1. Estimate the peak volume over a 3 year period and set up a Hadoop system with commodity HW and storage to accommodate that volume.

2. Estimate the peak volume over a 3 year period and set up a Hadoop system with NAS to accommodate the expected volume

3. Use Cloud elasticity capabilities to handle the peak and valley data volume

4. Use SAN storage with compression to handle the peak and valley data volume

Question : SAN or NAS should not be used to set up HDFS

1. True
2. False