IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : My container is being killed by the Node Manager, why ?

1. This is likely due to high memory usage exceeding your requested container memory size.
2. you have exceeded physical memory limits
3. you have exceeded virtual memory
4. 1 and 2
5. 1,2 and 3

Correct Answer : 5

Explanation: This is likely due to high memory usage exceeding your requested container memory size. There are a number of reasons that can cause this. First, look at the process tree
that the node manager dumps when it kills your container. The two things you're interested in are physical memory and virtual memory. If you have exceeded physical memory limits
your app is using too much physical memory. If you're running a Java app, you can use -hprof to look at what is taking up space in the heap. If you have exceeded virtual memory,
you may need to increase the value of the the cluster-wide configuration variable yarn.nodemanager.vmem-pmem-ratio.

Question : I have written a Hadoop MapReduce job, which uses the native library in the Job. So which is the best way to include the native libraries.

1. Setting -Djava.library.path on the command line while launching a container
2. use LD_LIBRARY_PATH
3. Setting -Dnative.library.path on the command line while launching a container
4. By Adding the Jar's in the Hadoop Job Jar

Correct Answer :2
Setting -Djava.library.path on the command line while launching a container can cause native libraries used by Hadoop to not be loaded correctly and can result in errors. It is
cleaner to use LD_LIBRARY_PATH instead.

Question : Select the correct flow, for submitting the YARN application

1. ApplicationMaster needs to register itself with the ResourceManager
2. The client communicates with the ResourceManager using the 'ClientRMProtocol' to first acquire a new 'ApplicationId'
3. Client submits an 'Application' to the YARN Resource Manager
4. The YARN ResourceManager will then launch the ApplicationMaster (as specified) on an allocated container
5. ApplicationMaster has to signal the ResourceManager of its completion
6. ApplicationMaster communicates with the NodeManager using ContainerManager
7. ApplicationMaster can then request for and receive container

1. 7,2,1,4,3,6,5
2. 2,3,4,1,7,5,6
3. 3,2,4,1,7,6,5
4. 3,4,2,7,1,6,5

Correct Answer : 3The general concept is that an 'Application Submission Client' submits an 'Application' to the YARN Resource Manager. The client communicates with the
ResourceManager using the 'ClientRMProtocol' to first acquire a new 'ApplicationId' if needed via ClientRMProtocol#getNewApplication and then submit the 'Application' to be run via
ClientRMProtocol#submitApplication. As part of the ClientRMProtocol#submitApplication call, the client needs to provide sufficient information to the ResourceManager to 'launch'
the application's first container i.e. the ApplicationMaster. You need to provide information such as the details about the local files/jars that need to be available for your
application to run, the actual command that needs to be executed (with the necessary command line arguments), any Unix environment settings (optional), etc. Effectively, you need
to describe the Unix process(es) that needs to be launched for your ApplicationMaster.

The YARN ResourceManager will then launch the ApplicationMaster (as specified) on an allocated container. The ApplicationMaster is then expected to communicate with the
ResourceManager using the 'AMRMProtocol'. Firstly, the ApplicationMaster needs to register itself with the ResourceManager. To complete the task assigned to it, the
ApplicationMaster can then request for and receive containers via AMRMProtocol#allocate. After a container is allocated to it, the ApplicationMaster communicates with the
NodeManager using ContainerManager#startContainer to launch the container for its task. As part of launching this container, the ApplicationMaster has to specify the
ContainerLaunchContext which, similar to the ApplicationSubmissionContext, has the launch information such as command line specification, environment, etc. Once the task is
completed, the ApplicationMaster has to signal the ResourceManager of its completion via the AMRMProtocol#finishApplicationMaster.

Meanwhile, the client can monitor the application's status by querying the ResourceManager or by directly querying the ApplicationMaster if it supports such a service. If needed,
it can also kill the application via ClientRMProtocol#forceKillApplication.

Related Questions

Question : You are working with a Social Networking Company. Now, they have huge volume of unstructured and semi structured data like Video file, Audio Files, Images, HTMLs and
PDFs etc. Which of the following technology will be helpful for that type of storage and data should be available online in highly available manner

1. Spark

2. RDBMS

3. Hadoop

4. IBM Netzza

Question : You are working as a BigData Solution Architect, and you have implemented a Hadoop based solution in your organization and it is working fine since last years. Now
suddenly user base is increasing as well as new requirements are coming for using Flume, Spark Streaming etc. How, would you tackle this solution? Storage is not an issue with the
cluster

1. Increase number of data nodes in Hadoop Cluster

2. Start using Oozie- workflow for your existing and new jobs

3. Implement Yarn to decouple MapReduce and resource management

4. Integrate RDBMS based system to manage new resources in hadoop Cluster.

Question : Which of the following does NOT consist of collaboration from various
organizations to drive innovation and standardization across big data technologies?

1. Cloudera Enterprise

2. Hortonworks Data Platform

3. IBM BigInsights

4. Pivotal Big Data Suite

Question : : You are working with a data scientists, and data scientist has to provide deep analytics from existing history data so that the new data which will be coming can
be scored based on that. However, you also have to provide the better solution for new data which will be received in real-time with low latency. What combination of solution will
work for both of you

1. SPSS Modeler with InfoSphere Streams

2. Info Sphere DataStage with InfoSphere Streams

3. Hive with MapReduce

4. Pig with YARN

5. BigInsight Platform

Question : Big R utilizes the big SQL query engine for processing

1. True
2. False

Question : The goals of Big R is to enable the use of R as a query language for big data: Big R hides many of the complexities pertaining to the underlying Hadoop / MapReduce
framework. Using classes such as bigr.frame, bigr.vector and bigr.list, a user is presented with an API that is heavily inspired by R's foundational API on data.frames, vectors
and frames.

1. True
2. False