IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : You can visualize workbook data in a map or a chart.

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :
Explanation: Workbooks
Workbooks contain a set of data from one or more master or child workbooks. You can create a workbook to save a particular set of data results and then tailor the format, content,
and structure of those results to refine and explore only the data that is pertinent to your business questions. You can visualize workbook data in a map or a chart.

Question : Master workbooks are created, using the results generated by applying analytical function on it.

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :
Explanation: Master workbooks
Master workbooks are the initial collection of data that is created from raw data stored in the distributed file system or catalog tables. Data in master workbooks is read-only;
however, you can manipulate copies of the original data by creating editable child workbooks from the master. You can visualize the data in a master workbook through a map or a
chart. But if you want to further explore the data in the master workbook, you create new workbooks from the master workbook.

Question : A parent workbook can be a master workbook, or it can be a derivation of a master from which a child is created.

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :
Explanation: Parent workbooks
A parent workbook is the workbook from which a child workbook is created. A parent workbook can be a master workbook, or it can be a derivation of a master from which a child is
created.

Master workbooks
Master workbooks are the initial collection of data that is created from raw data stored in the distributed file system or catalog tables. Data in master workbooks is read-only;
however, you can manipulate copies of the original data by creating editable child workbooks from the master. You can visualize the data in a master workbook through a map or a
chart. But if you want to further explore the data in the master workbook, you create new workbooks from the master workbook.

Parent workbooks

A parent workbook is the workbook from which a child workbook is created. A parent workbook can be a master workbook, or it can be a derivation of a master from which a child is
created.

Child workbooks
A child workbook is a workbook that is created from a separate master or child workbook. A child workbook is always created initially from one parent workbook but additional
workbooks may also be loaded into a workbook by using the Load sheet. Any workbook that is loaded into a child workbook through the Load sheet may be considered a parent of that
workbook, meaning that child workbooks may have one or more parents.

Related workbooks
A related workbook relationship is created between all parent workbooks and their children and between all child workbooks and their parents.
You can specify one of two workbook privacy settings at the bottom of a workbook when it is in collapsed (Normal) mode. To toggle between normal and full screen, click the icon in
the upper right corner of the window.

Private (the default)
Only the creator of a workbook can see that workbook.

Shared
All users can see that workbook. When a workbook is shared, all users can see it in read-only state. However, a shared workbook can be copied, at which point it is owned by that
user, who may edit it or assign it a private or shared privacy setting.

Related Questions

Question : What determines where blocks are written into HDFS by client applications?

1. The client queries the NameNode, which returns information on which DataNodes to use and the client writes to those DataNodes
2. The client writes immediately to DataNodes based on the cluster's rack locality settings

3. Access Mostly Uused Products by 50000+ Subscribers

4. The client writes immediately to DataNodes at random

Question : How does the NameNode know which DataNodes are currently available on a cluster?

1. DataNodes are listed in the dfs.hosts file. The NameNode uses that as the definitive list of available DataNodes.
2. DataNodes heartbeat in to the master on a regular basis.

3. Access Mostly Uused Products by 50000+ Subscribers
4. The NameNode broadcasts a heartbeat on the network on a regular basis, and DataNodes respond.

Question : How does the HDFS architecture provide data reliability?

1. Storing multiple replicas of data blocks on different DataNodes.

2. Reliance on SAN devices as a DataNode interface.
3. Access Mostly Uused Products by 50000+ Subscribers

4. DataNodes make copies of their data blocks, and put them on different local disks.

Question :What is HBASE?

1. Hbase is separate set of the Java API for Hadoop cluster
2. Hbase is a part of the Apache Hadoop project that provides interface for scanning large amount of data using Hadoop infrastructure
3. Access Mostly Uused Products by 50000+ Subscribers
4. HBase is a part of the Apache Hadoop project that provides a SQL like interface for data processing.

Question :What is the role of the namenode?

1. Namenode splits big files into smaller blocks and sends them to different datanodes
2. Namenode is responsible for assigning names to each slave node so that they can be identified by the clients
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 2 and 3 are valid answers

Question : What happen if a datanode loses network connection for a few minutes?

1. The namenode will detect that a datanode is not responsive and will start replication of the data from remaining replicas. When datanode comes back
online, administrator will need to manually delete the extra replicas
2. All data will be lost on that node. The administrator has to make sure the proper data distribution between nodes
3. Access Mostly Uused Products by 50000+ Subscribers
4. The namenode will detect that a datanode is not responsive and will start replication of the data from remaining replicas. When datanode comes back online, the extra
replicas will be deleted

Ans : 4
Exp : : The replication factor is actively maintained by the namenode. The namenode monitors the status of all datanodes and keeps track which blocks are located on that node.
The moment the datanode is not available it will trigger replication of the data from the existing replicas. However, if the datanode comes back up, over replicated data will be
deleted. Note: the data might be deleted from the original datanode.

Question : What happen if one of the datanodes has much slower CPU? How will it affect the performance of the cluster?

1. The task execution will be as fast as the slowest worker.
However, if speculative execution is enabled, the slowest worker will not have such big impact
2. The slowest worker will significantly impact job execution time. It will slow everything down
3. Access Mostly Uused Products by 50000+ Subscribers
4. It depends on the level of priority assigned to the task. All high priority tasks are executed in parallel twice. A slower datanode would therefore be
bypassed. If task is not high priority, however, performance will be affected.
Ans : 1
Exp : Hadoop was specifically designed to work with commodity hardware. The speculative execution helps to offset the slow workers. The multiple instances of the same task will
be created and job tracker will take the first result into consideration and the second instance of the task will be killed

Question :

If you have a file 128M size and replication factor is set to 3, how many blocks can you find on the cluster that will correspond to
that file (assuming the default apache hadoop configuration)?

1. 3
2. 6
3. Access Mostly Uused Products by 50000+ Subscribers
4. 12
Ans : 2
Exp : Based on the configuration settings the file will be divided into multiple blocks according to the default block size of 64M. 128M / 64M = 2 . Each block will be
replicated according to replication factor settings (default 3). 2 * 3 = 6 .

Question : What is replication factor?

1. Replication factor controls how many times the namenode replicates its metadata
2. Replication factor creates multiple copies of the same file to be served to clients
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of these answers are correct.
Ans : 3
Exp : Data is replicated in the Hadoop cluster based on the replication factor. The high replication factor guarantees data availability in the event of failure.

Question :

How does the Hadoop cluster tolerate datanode failures?

1. Failures are anticipated. When they occur, the jobs are re-executed.
2. Datanodes talk to each other and figure out what need to be re-replicated if one of the nodes goes down
3. Access Mostly Uused Products by 50000+ Subscribers
4. Since Hadoop is design to run on commodity hardware, the datanode failures are expected. Namenode keeps track of all available datanodes and actively
maintains replication factor on all data.
Ans : 4
Exp : The namenode actively tracks the status of all datanodes and acts immediately if the datanodes become non-responsive. The namenode is the central "brain" of the HDFS and
starts replication of the data the moment a disconnect is detected.

Question :

Which of the following tool, defines a SQL like language..

1. Pig
2. Hive
3. Access Mostly Uused Products by 50000+ Subscribers
4. Flume
Ans 2

Question : As a client of HadoopExam, you are able to access the Hadoop cluster of HadoopExam Inc, Once a your application validates
its identity and is granted access to a file in a cluster, what is the remainder of the read path back to the client?

1. The NameNode gives the client the block IDs and a list of DataNodes on which those blocks are found, and the application reads the blocks directly from the DataNodes.
2. The NameNode maps the read request against the block locations in its stored metadata, and reads those blocks from the DataNodes. The client application then reads
the blocks from the NameNode.
3. Access Mostly Uused Products by 50000+ Subscribers
4. DataNode closest to the client according to Hadoop's rack topology. The client application then reads the blocks from that single DataNode.