IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : Performing a mathematical operation on a Big R vector variable will automatically loop through each item in the vector

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :
Explanation:

Question : You are working with an online e-commerce company. Where you need to find recommendations for each user based on their interest. Which of the following tool will be
most useful

1. Spark

2. Hadoop

3. Access Mostly Uused Products by 50000+ Subscribers

4. Cloudant

Correct Answer : Get Lastest Questions and Answer :
Explanation: Spark has, well developed machine learning library. You can use that machine learning library to create recommendation engine.

Question : Which of the following is a correct statement for IBM Business Data Model?

1. Enterprise-wide and applies to the industry, independently of line-of-business considerations

2. Independent of organizational or technological considerations, providing a stable basis for business modeling

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1 and 2
5. 1,2,and 3

Correct Answer : Get Lastest Questions and Answer :
Explanation: The Business Data Model (BDM) is a conceptual data model that specifies the third-normal-form data structures that are required to represent the concepts
that are defined in the business terms. BDM does not contain technical information, such as primary keys, foreign keys, technical attributes for history support. BDM provides an
enterprise-wide, generic, and flexible data representation for the design of operational or informational systems, serving as an overall reference point for business and IT.
The main characteristics of BDM:
Enterprise-wide and applies to the industry, independently of line-of-business considerations
Understood by business and IT professionals, providing a powerful and precise means of communication, and helping to bridge the innate gap between business and IT perspectives
Independent of organizational or technological considerations, providing a stable basis for business modeling
Provides a flexible view of the business that can be customized according to specific requirements
Provides a strong starting point for analysis and design of operational or informational systems, that potentially use design models, such as:
Operational data store (ODS)
Data warehouse model
Service model
Component design model
Mapped to the upstream business term concepts it represents
Downstream design models are mapped to BDM

Related Questions

Question : You are working as a chief data scientist, in Arinika Inc for a market research company.'
You have a team of data scientist, who knows Python and Machine Learning. Now which of the best tool, which your team can use with their existing skill sets. Hence, learning curve
can be reduced

1. Big Insight

2. Big R

3. Access Mostly Uused Products by 50000+ Subscribers

4. Spark

Question : You are working as Chief Data Architect in a media comapny, where everyday B viewers watch their media programs (News, Movies etc). You have a very well setup, so
that you can continuosly receiving customer behaviour data like their viewing habits, peak usage. Your company has various advertiers, hence you need to have segmentation of
customer data with the public data, such as voter registration, so that more accurate targeted campaigns to sepcific demographics can be launched. Which all technology will be
required in such scenario ?

A. InfoSphere Streams
B. Big Insight
C. PureData for Analytics
D. SPSS
E. Spark
F. BigR

1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,E
5. A,C,F

Question : You have considered to store data , such that they can take lesser space and effeciently processed by Hadoop framework. Hence, you decided to go for parquet data
format. You still want to compress this parquet data, which is the default codec for compressing parquet data?

1. Gzip

2. Snappy

3. Access Mostly Uused Products by 50000+ Subscribers

4. LZO

Question : You need to set up a distributed storage system for being able to process very large
data sets and you want to be able to leverage the Open Data Platform (ODP) Core.
Which one of the following would you use?

1. Apache Spark

2. IBM GPFS

3. Access Mostly Uused Products by 50000+ Subscribers

4. HDFS

Question : In traditional SAN based storage, to provide High Availability for the data , RAID (redundant array of independent disks) were used.

About RAID : RAID is a data storage virtualization technology that combines multiple physical disk drive components into a single logical unit for the purposes of data redundancy,
performance improvement, or both.

Similar protection have been introduced in Hadoop Distributed File System . Which is known as a

1. NameNode

2. Secondary NameNode

3. Access Mostly Uused Products by 50000+ Subscribers

4. Resource Manager

5. Replication

Question : You are working with a storage company, which is helping a market reasearch company, which is having access to billions of record. However, this reaserch comapny is
looking for a solution, where they can store this billions of records for some temporary days like 90 days until their analysis finishes and also want to run analytics on that
data. which of the solution recommended from IBM

1. BigInsight

2. Spark

3. Access Mostly Uused Products by 50000+ Subscribers

4. Pure Data System for Analytics

5. SPSS