IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : Which of the following can be achieved, using IBM SPSS tool

1. Statistical analysis and reporting

2. Predictive modeling and data mining

3. Access Mostly Uused Products by 50000+ Subscribers

4. Only 1 and 2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: What is IBM SPSS?
What if you could get deeper, more meaningful insights from your data and predict what is likely to happen next? IBM SPSS predictive analytics software offers advanced techniques
in an easy-to-use package to help you find new opportunities, improve efficiency and minimize risk.

Statistical analysis and reporting : Address the entire analytical process: planning, data collection, analysis, reporting, and deployment.
Predictive modeling and data mining : Use powerful model-building, evaluation, and automation capabilities.
Decision management and deployment : Activate your analytics with advanced model management and analytic decision management on prem, on cloud or as hybrid.
Big data analytics : Analyze big data to gain predictive insights and build effective business strategies.

Question : You are building an application, where you will be using existing data in your system and also you are getting real time data from one of the websites web server logs.
Which of the following tool will help you to get real time data output by combing both the existing data and real-time log data

1. BigSQL

2. Apche Spark

3. Access Mostly Uused Products by 50000+ Subscribers

4. Infosphere Stream

Correct Answer : Get Lastest Questions and Answer :
Explanation: Part of IBMs big data platform, IBM InfoSphere Streams allows you to capture and act on all of your business data... all of the time... just in time.
InfoSphere Streams radically extends the state-of-the-art in big data processing; its a high-performance computing platform that allows users to develop and reuse applications
to rapidly ingest, analyze, and correlate information as it arrives from thousands of real-time sources. Users are able to:
Continuously analyze massive volumes of data at rates up to petabytes per day.
Perform complex analytics of heterogeneous data types including text, images, audio, voice, VoIP, video, web traffic, email, GPS data, financial transaction data, satellite data,
and sensors.
Leverage sub-millisecond latencies to react to events and trends as they are unfolding, while it is still possible to improve business outcomes.
Adapt to rapidly changing data forms and types.
Easily visualize data with out-of-the-box ability to route all streaming records into a single operator and display them in a variety of ways on an html dashboard.
Seamlessly deploy applications on any size computer cluster. Meet current reaction time and scalability requirements with the flexibility to evolve with future changes in data
volumes and business rules.
Fuse a broad range of traditional and non-traditional data with support for complex data types such as XML
Meet current reaction time and scalability requirements with the flexibility to evolve with future changes in data volumes and business rules.
Quickly and easily develop new applications, using drag and drop operators, that can be mapped to a variety of hardware configurations, and adapted with shifting priorities.
Perform real-time and look-ahead analysis of regularly-generated data, using digital filtering, pattern/correlation analysis and decomposition.
Conduct geospatial analysis, calculating Calculate distances and directions between points on the globe.
Automate options trading and equity trading analytics.
Analyze telephone call records to perform fraud detection in real-time.
Provide security and information confidentiality for shared information.

Question : A bigr.frame is an R object that mimics Rs own data.frame. However, unlike R, a bigr.frame does not load that data in memory as that would be impractical. The
data stays in HDFS. However, you will still be able to explore this data using the Big R API.

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :
Explanation: A bigr.frame is an R object that mimics Rs own data.frame. However, unlike R, a bigr.frame does not load that data in memory as that would be
impractical. The data stays in HDFS. However, you will still be able to explore this data using the Big R API.

Related Questions

Question : Select the correct statement regarding Capacity Scheduler

1. The Capacity scheduler permits sharing a cluster while giving each user or group certain minimum capacity guarantees.
2. The Capacity scheduler currently supports memory-intensive applications, where an application can optionally specify higher memory resource requirements than the default.
3. The Capacity scheduler works best when the workloads are not known
4. 1 and 3
5. 1 and 2

Question :

Select the correct statement which applies to container

1. a container is a collection of physical resources such as RAM, CPU cores, and disks on a single node.
2. There can be only one container on a single node
3. A container is supervised by the NodeManager and scheduled by the ResourceManager
4. 1 and 2
5. 1 and 3

Question : Select the correct statement which applies to Node Manager

1. On start-up, the NodeManager registers with the ResourceManager
2. Its primary goal is to manage only the containers (On the node) assigned to it by the ResourceManager
3. The NodeManager is YARNs per-node "worker" agent, taking care of the individual compute nodes in a Hadoop cluster.
4. 1 and 2
5. 1 and 3

Question : In the YARN design, Map-Reduce is just one

1. Resource Manager
2. Application
3. Container
4. None of the above

Question : Select the correct statement for HDFS in Hadoop .

1. NameNode federation significantly improves the scalability and performance of HDFS by introducing the ability to deploy multiple NameNodes for a single cluster.
2. built-in high availability for the NameNode via a new feature called the Quorum Journal Manager (QJM). QJM-based HA features an active NameNode and a standby NameNode
3. The standby NameNode can become active either by a manual process or automatically
4. 1 and 3
5. 1,2 and 3

Question : Select the correct statement which applies to "Fair Scheduler"

1. Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time
2. By default, the Fair Scheduler bases scheduling fairness decisions only on CPU
3. It can be configured to schedule with both memory and CPU
4. 1 and 3
5. 1 2 and 3