IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : The NameNode uses a file in its _______ to store the EditLog.

1. Any HDFS Block
2. metastore
3. Access Mostly Uused Products by 50000+ Subscribers
4. local hdfs block

Correct Answer : Get Lastest Questions and Answer :

Explanation:

The HDFS namespace is stored by the NameNode. The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata. For
example, creating a new file in HDFS causes the NameNode to insert a record into the EditLog indicating this. Similarly, changing the replication factor of a file causes a new
record to be inserted into the EditLog. The NameNode uses a file in its local host OS file system to store the EditLog. The entire file system namespace, including the mapping of
blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNodes local file system too.

The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. This key metadata item is designed to be compact, such that a NameNode with 4 GB of
RAM is plenty to support a huge number of files and directories. When the NameNode starts up, it reads the FsImage and EditLog from disk, applies all the transactions from the
EditLog to the in-memory representation of the FsImage, and flushes out this new version into a new FsImage on disk. It can then truncate the old EditLog because its transactions
have been applied to the persistent FsImage. This process is called a checkpoint. In the current implementation, a checkpoint only occurs when the NameNode starts up. Work is in
progress to support periodic checkpointing in the near future.

The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge about HDFS files. It stores each block of HDFS data in a separate file in its local
file system. The DataNode does not create all files in the same directory. Instead, it uses a heuristic to determine the optimal number of files per directory and creates
subdirectories appropriately. It is not optimal to create all local files in the same directory because the local file system might not be able to efficiently support a huge
number of files in a single directory. When a DataNode starts up, it scans through its local file system, generates a list of all HDFS data blocks that correspond to each of these
local files and sends this report to the NameNode: this is the Blockreport.

Question : Select the correct option

1. When a file is deleted by a user or an application, it is immediately removed from HDFS
2. When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the /trash directory.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2
5. 2,3

Correct Answer : Get Lastest Questions and Answer :

Explanation: File Deletes and Undeletes

When a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the /trash directory. The file can be
restored quickly as long as it remains in /trash. A file remains in /trash for a configurable amount of time. After the expiry of its life in /trash, the NameNode deletes the file
from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file
is deleted by a user and the time of the corresponding increase in free space in HDFS.

A user can Undelete a file after deleting it as long as it remains in the /trash directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the
/trash directory and retrieve the file. The /trash directory contains only the latest copy of the file that was deleted. The /trash directory is just like any other directory with
one special feature: HDFS applies specified policies to automatically delete files from this directory. The current default policy is to delete files from /trash that are more
than 6 hours old. In the future, this policy will be configurable through a well defined interface.

Decrease Replication Factor

When the replication factor of a file is reduced, the NameNode selects excess replicas that can be deleted. The next Heartbeat transfers this information to the DataNode. The
DataNode then removes the corresponding blocks and the corresponding free space appears in the cluster. Once again, there might be a time delay between the completion of the
setReplication API call and the appearance of free space in the cluster.

Question : You have data already stored in HDFS and are considering using HBase. Which additional feature does HBase provide to HDFS?

1. Random writes
2. Fault tolerance
3. Access Mostly Uused Products by 50000+ Subscribers
4. Batch processing
5. 2,3

Correct Answer : Get Lastest Questions and Answer :

Explanation: Apache HBase provides random, realtime read/write access to your data. HDFS does not allow random writes. HDFS is built for scalability, fault tolerance, and batch processing.

Related Questions

Question : You are working for a VOIP solution provider company. Which has around Billion customer and they don't want to lose these customer. Hence, to analyze data about
each user they are creating a solution , where each user complete profile needs to be created and also their social media data should be available. Which of the following will help
to you to accomplish the given task

1. Hadoop

2. Apche Spark

3. Access Mostly Uused Products by 50000+ Subscribers

4. Cloudant

5. Storm

Question : You are working as a Solution Architect in a Heathcare Solution Provider software consulting firm. Now, as per regulations, you have to make sure that whatever,
patient records you are storing as part of BigData solution, you have to think

- Where patient data will be stored
- How these data is related
- Privacy policy of Patient Data
- Security of patient data

this all will fall under,

1. PCI DSS

2. IT Data Storage Regulations

3. Access Mostly Uused Products by 50000+ Subscribers

4. HIPAA Requirements

Question : You are well aware of below two systems provided by IBM

IBM Netezza : designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business
intelligence, predictive analytics and business continuity planning.
SPSS Modeler : is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface
which allows users to leverage statistical and data mining algorithms without programming.

Which of the following can help to integrate both the system

1. you need to create an ODBC system data source name

2. you need to create a JDBC system data source name

3. Access Mostly Uused Products by 50000+ Subscribers

4. It is not possible.

5. You have to RESTFul API by both the system

Question : Service level agreements can contain numerous service performance metrics with corresponding service level objectives. A common case in IT service management is a
call center or service desk. Metrics commonly agreed to in these cases include

A. Abandonment Rate: Percentage of calls abandoned while waiting to be answered.

Question : Which of the following correctly applies to NoSQL databases like HBase, Cloudant etc

1. It does not permit the use of SQL

2. It is not limited to relational database technology

3. Access Mostly Uused Products by 50000+ Subscribers

4. It does not permit UPDATE

Question : : Select the statement which applies correctly for the WorkBook in BigSheet?

A. Workbooks can have one or more sheets.
B. By default, the first sheet in your workbook is named the Results sheet
C. When you save and run the workbook, the data in a Child Workbook is the output for that workbook
D. When you add sheets to workbooks, saving the sheets runs the individual data for the sheet but not for the full workbook

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,D
5. A,C