IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : You want to create a BigData Solution, using open source product. Which is having following requirement.

- Text Search Solution, in existing data
- Infrastructure Monitoring is required

Which of the following components can be used

A. HBase
B. Lucene
C. Nagios
D. OOzie
E. Spark

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. D,E
5. A,E

Correct Answer : Get Lastest Questions and Answer :
Explanation: Nagios is known for being the best server monitoring software on the market. Server monitoring is made easy in Nagios because of the flexibility to
monitor your servers with both agent-based and agentless monitoring. With over 5000 different addons available to monitor your servers, the community at the Nagios Exchange have
left no stone unturned.
More Info:
Server Monitoring Software
Windows Server Monitoring
Linux Server Monitoring

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires
full-text search, especially cross-platform.
Apache Lucene is an open source project available for free download. Please use the links on the right to access Lucene.

Question : A large global enterprise customer has a Big Data environment set up on Hadoop.
After a year in operation they are now looking to extend access to multiple
functions that will need different views into different aspects/portions of the data.
As you consider these requirements, which of the following statements is TRUE
and also applies to the scenario?

1. Hadoop does not support multi tenancy but can easily scale to support this by replicating data to new clusters with commodity hardware.

2. Hadoop can support multi tenancy but only if YARN is used, so if not already used, the customer will need to upgrade to a YARN supported version.

3. Access Mostly Uused Products by 50000+ Subscribers

4. Hadoop can support multi tenancy by using a distributed file system for storage, allowing all nodes to access the data.

Correct Answer : Get Lastest Questions and Answer :
Explanation: Multitenant, Multi-modal

To borrow from object-oriented programming terminology, multitenancy is an over-loaded term. It means different things to different people depending on their orientation
and context. To say a solution is multitenant is not helpful unless we are specific about the meaning. Some interpretations of multitenancy in Big Data environments are:
Support for multiple concurrent Hadoop jobs
Support for multiple lines of business on a shared infrastructure
Support for multiple application workloads of different types (Hadoop and non-Hadoop)
Provisions for security isolation between tenants
Contract-oriented service level guarantees for tenants
Support for multiple versions of applications and application frameworks concurrently

Yet another resource negotiator

YARN is well named. While an important technology, the world is not suffering from a shortage of resource managers. Some Hadoop providers (including IBM) are supporting YARN
while others are supporting Apache Mesos. In addition, there is a plethora of general purpose batch workload managers supporting Hadoop as yet another workload pattern
(YAWP " you heard it here first!) on their own scheduling and resource management products. This includes our own Platform LSF where a freely available Hadoop Connector for LSF
enables existing Platform Computing customers to support Hadoop MapReduce applications natively on existing HPC clusters. Also, many distributed applications embed their own
proprietary solutions for workload management in clustered environments or support one or more commercial solutions. In short "workload and resource management for distributed
applications is a big topic.

YARN comes with built-in multitenancy support. Now, let's have a look at what multitenancy means. Consider a society that has multiple apartments in it, so there are different
types of family living in different apartments with security and privacy, but they all share the society's common areas, such as the society gate, garden, play area, and other
amenities. Their apartments also share common walls. The same concept is followed in YARN: the that run running into the cluster share the cluster resources in a multitenant way.
They share cluster processing capacity, cluster storage capacity, data access securities, and so on. Multitenancy is achieved in the cluster by differentiating applications into
multiple business units, for example, different queues and users for different types of applications.

Security and privacy can be achieved by configuring Linux and HDFS permissions to separate files and directories to create tenant boundaries. This can be achieved by integrating
with LDAP or Active Directory. Security is used to enforce the tenant application boundaries, and this can be integrated...

Question : What term applies to the data elements in Infosphere Streams?

1. Tuples

2. Operators

3. Access Mostly Uused Products by 50000+ Subscribers

4. Composite operators

Correct Answer : Get Lastest Questions and Answer :
Explanation: tuple : An individual piece of data in a stream that is represented as a set of attributes and data values. Typically, the data values in a tuple
represent a single observation of data, such as a stock ticker quote or a temperature reading from an individual sensor.

operator : A program that processes tuples in an incoming stream and produces an output stream as a result. An operator can have any number of input ports and any number of output
ports.

sink operator : An operator that sends information as a stream to an external system, such as a dashboard, web server, mail server, or a database.

composite operator : An operator that is implemented in the Streams Processing Language (SPL) that encapsulates a subgraph of a data flow graph that can be parameterized to make
it reusable in multiple streams processing applications.

Related Questions

Question : Which of the following type of data well supported on IBM Big Data platform?

1. Semi-structured

2. unstructured

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1 and 3

5. 1,2 and 3

Question : A big data solution typically comprises these logical layers

A. Big data sources
B. Data massaging and store layer
C. Analysis layer
D. Consumption layer

1. A,B,C
2. B,C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,C,D

Question : Which of the layer, can do this

An image might need to be converted so it can be stored in a Hadoop Distributed File System (HDFS) store or a Relational Database Management System (RDBMS) warehouse for further

1. Big data sources

2. Data massaging and store layer

3. Access Mostly Uused Products by 50000+ Subscribers

4. Consumption layer

Question : The analysis layer reads the data digested by the data massaging and store layer. In some cases, the analysis layer accesses the data directly from the data source.
Designing the analysis layer requires careful forethought and planning. Decisions must be made with regard to how to manage the tasks to

A. Produce the desired analytics
B. Derive insight from the data
C. Find the entities required
D. Locate the data sources that can provide data for these entities
E. Understand what algorithms and tools are required to perform the analytics.

1. A,B,C
2. C,D,E
3. Access Mostly Uused Products by 50000+ Subscribers
4. A.B,C,D
5. A,B,C,D,E

Question : Visualization applications, human beings, business processes, or services can be considered under which logical layer of BigData

1. Big data sources

2. Data massaging and store layer

3. Access Mostly Uused Products by 50000+ Subscribers

4. Consumption layer

Question : You are working in Arinika INC, now you need to look for all the characteristics of BigData. Which of the following cannot be a characteristics of BigData

1. Data frequency and size

2. Software

3. Access Mostly Uused Products by 50000+ Subscribers

4. Processing methodology