IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : : You are working with a data scientists, and data scientist has to provide deep analytics from existing history data so that the new data which will be coming can
be scored based on that. However, you also have to provide the better solution for new data which will be received in real-time with low latency. What combination of solution will
work for both of you

1. SPSS Modeler with InfoSphere Streams

2. Info Sphere DataStage with InfoSphere Streams

3. Hive with MapReduce

4. Pig with YARN

5. BigInsight Platform

Correct Answer : 1
Explanation: IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic
tasks. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.

IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere. It uses a graphical notation to construct data integration
solutions and is available in various versions such as the Server Edition, the Enterprise Edition, and the MVS Edition.

InfoSphere Streams radically extends the state-of-the-art in big data processing; it's a high-performance computing platform that allows users to develop and reuse applications to
rapidly ingest, analyze, and correlate information as it arrives from thousands of real-time sources.

Question : Big R utilizes the big SQL query engine for processing

1. True
2. False

Correct Answer : 1
Explanation: use Big R, one needs to take the following steps:
(1) Install the "bigr" package on the client. Big R requires several pre-requisite packages including rJava, data.table, and base64enc.
(2) To enable function pushdown via the "Apply" functions, each node of a BigInsights cluster needs to have the R interpreter installed on it. In addition, each node also needs
the same "bigr" package installed.
(3) Ensure that Big SQL server is running on the BigInsights cluster. As Big R statements are executed, they are transparently converted into corresponding SQL and JaQL
statements, and these are executed by the Big SQL server.

Question : The goals of Big R is to enable the use of R as a query language for big data: Big R hides many of the complexities pertaining to the underlying Hadoop / MapReduce
framework. Using classes such as bigr.frame, bigr.vector and bigr.list, a user is presented with an API that is heavily inspired by R's foundational API on data.frames, vectors
and frames.

1. True
2. False

Correct Answer : 1
Explanation: Using Big R, an R user can explore, transform, and analyze big data hosted in a BigInsights cluster using familiar R syntax and paradigm. All of theses
capabilities are accessible from a standard R client. The goals of Big R are two-fold:
(1) Enable the use of R as a query language for big data: Big R hides many of the complexities pertaining to the underlying Hadoop / MapReduce framework. Using classes such as
bigr.frame, bigr.vector and bigr.list, a user is presented with an API that is heavily inspired by R's foundational API on data.frames, vectors and frames.
(2) Enable the pushdown of R functions such that they run right on the data: Via mechanisms such as groupApply, rowApply and tableApply, user-written functions composed in R can
be shipped to the cluster. BigInsights transparently parallelizes exection of these function and provides consolidated results back to the user. Almost any R code, including most
packages available on open-source repositories such as CRAN, can be run using this mechanism.

Related Questions

Question : You have a cluster of Nodes in Geneva Datacenter , and you find a specific node in your cluster appears to be running
slower than other nodes with all having same hardware configuration. You think that RAM could be failure in the system.
Which commands may be used to the view the memory seen in the system?

1. free
2. df
3. Access Mostly Uused Products by 50000+ Subscribers
4. jps

Question : You have a cluster of Nodes in Geneva Datacenter , and you find a specific node in your cluster appears to be running
slower than other nodes with all having same hardware configuration. You think that RAM could be failure in the system.
Which commands may be used to the view the memory seen in the system?

1. free

2. df

3. Access Mostly Uused Products by 50000+ Subscribers

4. dmidecode

5. lsram

6. jps

7. memusage

1. 1,4,5
2. 1,2,4
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,4,6
5. 1,4,7

Question : What must you do if you are running a Hadoop cluster with a single NameNode, called HadoopExam and DataNodes,
and you wish to change the configuration of all DataNodes.

1. You must modify the configuration files on your NameNode where the master configuration files reside for all DataNodes.
2. You must restart all 100 DataNode daemons to apply the changes.

3. Access Mostly Uused Products by 50000+ Subscribers
4. You must restart the NameNode daemon to apply the changes to the cluster.

Question : What describes the relationship between MapReduce and Hive?

1. Hive provides additional capabilities that allow certain types of data manipulation not possible with MapReduce.
2. Hive programs rely on MapReduce but are extensible, allowing developers to do special-purpose processing not provided by MapReduce.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hive provides no additional capabilities to MapReduce. Hive programs are executed as MapReduce jobs via the Hive interpreter.

Question : What is HIVE?

1. HIVE is part of the Apache Hadoop project that enables in-memory analysis of real-time streams of data
2. Hive is a way to add data from local file system to HDFS
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing

Question : Which statement is true about apache Hadoop ?

1. HDFS performs best with a modest number of large files
2. No Randome Writes is alowed to the file
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above