IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : You are working with a Social Networking Company. Now, they have huge volume of unstructured and semi structured data like Video file, Audio Files, Images, HTMLs and
PDFs etc. Which of the following technology will be helpful for that type of storage and data should be available online in highly available manner

1. Spark

2. RDBMS

3. Hadoop

4. IBM Netzza

Correct Answer : 3
Explanation:

Question : You are working as a BigData Solution Architect, and you have implemented a Hadoop based solution in your organization and it is working fine since last years. Now
suddenly user base is increasing as well as new requirements are coming for using Flume, Spark Streaming etc. How, would you tackle this solution? Storage is not an issue with the
cluster

1. Increase number of data nodes in Hadoop Cluster

2. Start using Oozie- workflow for your existing and new jobs

3. Implement Yarn to decouple MapReduce and resource management

4. Integrate RDBMS based system to manage new resources in hadoop Cluster.

Correct Answer : 3
Explanation: As its architectural center, YARN enhances a Hadoop compute cluster in the following ways:

Multi-tenancy : YARN allows multiple access engines (either open-source or proprietary) to use Hadoop as the common standard for batch, interactive and real-time engines that can
simultaneously access the same data set.

Multi-tenant data processing improves an enterprise's return on its Hadoop investments.

Cluster utilization :YARN's dynamic allocation of cluster resources improves utilization over more static MapReduce rules used in early versions of Hadoop

Scalability :Data center processing power continues to rapidly expand. YARN's ResourceManager focuses exclusively on scheduling and keeps pace as clusters expand to thousands of
nodes managing petabytes of data.

Compatibility : Existing MapReduce applications developed for Hadoop 1 can run YARN without any disruption to existing processes that already work

Given problem , is related to cluster utilization.

Question : Which of the following does NOT consist of collaboration from various
organizations to drive innovation and standardization across big data technologies?

1. Cloudera Enterprise

2. Hortonworks Data Platform

3. IBM BigInsights

4. Pivotal Big Data Suite

Correct Answer : 3
Explanation:

Related Questions

Question :

Select the correct statement which applies to "Fair Scheduler"

1. Fair Scheduler allows assigning guaranteed minimum shares to queues
2. queue does not need its full guaranteed share, the excess will not be splitted between other running apps.
3. it is also possible to limit the number of running apps per user and per queue
4. 1 and 3
5. 1,2 and 3

Question : YARN then provides processing capacity to each application by allocating Containers.
A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements

1. CPU
2. Memory
3. CPU and Memory
4. Each Data Node of Hadoop Cluster

Question : A developer has submitted a long running MapReduce job with wrong data sets.
You want to kill the running MapReduce job so that a new job with the correct data sets can be started.
What method can be used to terminate the submitted MapReduce job?

1. Open a remote terminal to the node running the ApplicationMaster and kill the JVM.

2. yarn application -kill "application_id"
3. Use CTRL-C from the terminal where the MapReduce job was started.
4. hadoop datanode -rollback
5. rmadmin -refreshQueues

Question : Which of the following is a correct command to submit yarn job, assuming your code is deployed in hadoopexam.jar

1. java jar hadoopexam.jar [mainClass] args...
2. yarn jar hadoopexam.jar [mainClass] args...
3. yarn hadoopexam.jar [mainClass] args...
4. yarn jar hadoopexam.jar args...

Question : Which of the following, command can be used to list all the jobs or application running in the resource manager

1. yarn application -list
2. yarn application -listAll
3. Access Mostly Uused Products by 50000+ Subscribers
4. yarn application -allJobs

Question : Select the correct command/commands which can be used to Dump the container logs

1. yarn logs -applicationId ApplicationId
2. yarn logs -appOwner AppOwner
3. Access Mostly Uused Products by 50000+ Subscribers
4. yarn logs -nodeAddress NodeAddress

1. 1,2,3
2. 2,3,4
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2,4
5. All 1,2,3,4