IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : You are working as a chief data scientist, in Arinika Inc for a market research company.'
You have a team of data scientist, who knows Python and Machine Learning. Now which of the best tool, which your team can use with their existing skill sets. Hence, learning curve
can be reduced

1. Big Insight

2. Big R

3. Access Mostly Uused Products by 50000+ Subscribers

4. Spark

Correct Answer : Get Lastest Questions and Answer :
Explanation: Apache Spark is the open standard for flexible in-memory data processing that enables batch, real-time, and advanced analytics on the Apache Hadoop
platform.

Simple, yet rich, APIs for Java, Scala, and Python open up data for interactive discovery and iterative development of applications. Through shared common code, data scientists and
developers can increase productivity with rapid prototyping for batch and streaming applications, using the language and third-party tools on which they already rely.

Question : You are working as Chief Data Architect in a media comapny, where everyday B viewers watch their media programs (News, Movies etc). You have a very well setup, so
that you can continuosly receiving customer behaviour data like their viewing habits, peak usage. Your company has various advertiers, hence you need to have segmentation of
customer data with the public data, such as voter registration, so that more accurate targeted campaigns to sepcific demographics can be launched. Which all technology will be
required in such scenario ?

A. InfoSphere Streams
B. Big Insight
C. PureData for Analytics
D. SPSS
E. Spark
F. BigR

1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,E
5. A,C,F

Correct Answer : Get Lastest Questions and Answer :
Explanation: SPSS Statistics has several statistical algorithms for creating segmentation.

Two step
K-Means
Hierarchical
Tree
Discriminant
Nearest neighbor

These are the top hits of the clustering algorithms in general use. You can also throw a neural network on that list, but in SPSS Statistics, that algorithm is listed separately.
Each of these algorithms has strengths and weaknesses, depending on the amount of data you have, the type or characteristics of the variables, and your end purpose in classifying
the data. I concentrate on two of the algorithms for this article: K-Means and Tree. (Tree in this case really is more broadly called Decision Trees.)

IBM PureData System for Analytics is a purpose-built, standards-based data warehouse and analytics appliance that integrates database, server, storage and analytics into an
easy-to-manage system. It is designed for high-speed analysis of big data volumes, scaling into the petabytes.

Question : You have considered to store data , such that they can take lesser space and effeciently processed by Hadoop framework. Hence, you decided to go for parquet data
format. You still want to compress this parquet data, which is the default codec for compressing parquet data?

1. Gzip

2. Snappy

3. Access Mostly Uused Products by 50000+ Subscribers

4. LZO

Correct Answer : Get Lastest Questions and Answer :
Explanation: The supported compression types are UNCOMPRESSED, GZIP, and SNAPPY for parquet and Snappy is the default one.

Related Questions

Question : Whihc one of the following statemnet is true about BigSQL

1. Big SQL doesnt need any secondary indices to access HBase tables

2. Big SQL processes queries locally either on disk or in memory

3. Access Mostly Uused Products by 50000+ Subscribers

4. Executing Big SQL queries through MapReduce framework would always be a better choice

Question : You are working , with a product based company. Thee are launching their new product of data visualization. And before launching this product they run advertising
campaign on Twitter and FaceBook. Based on the response on Twitter and Facebook you want to decide whetheror not they should continue a particular campaign. Which of the
following should be selected to meet these requirements?

1. IBM Cloudant

2. Apche Spark

3. Access Mostly Uused Products by 50000+ Subscribers

4. Unica (IBM Campaign)

5. IBM Analytics Engine

Question : Big data is often defined as the ability to derive new insights from data that has
scaled up along three axes known as the three vs. Which of the following is the
fourth v? (Hint: It has something to do with the uncertainty.)

1. volume

2. variety

3. Access Mostly Uused Products by 50000+ Subscribers

4. veracity

Question : Which of the following is the section of the Component Model that details how the solution integrates?

1. Component Relationship Diagram

2. Component Interface Diagram

3. Access Mostly Uused Products by 50000+ Subscribers

4. Component Reaction Diagram

Question : Which of the following statements regarding SPSS is TRUE?

1. SPSS software provides a security framework

2. SPSS analytics are primarily accessed through a scripting language

3. Access Mostly Uused Products by 50000+ Subscribers

4. SPSS can directly use BigR syntax such as bigr.list

Question : You have downloaded last months data from social network. To analyze this data for a month , it will take around days. Now, you want to use better solution to
this problem. So that data can be analyzed in just hours. Which solution is best fit for your requiremnt

1. IBM Infosphere

2. IBM Warehouse

3. Access Mostly Uused Products by 50000+ Subscribers

4. IBM BigInsight (Open Data Platform)