IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : You have been processing huge volume of customer sales data of Arinika Retail Solutions.
However, you also need to process Geospatial data, which produce the GeoJSON format. Which of the following would provide a with the desired output in GeoJSON format?

1. Big SQL

2. BigSheets

3. Access Mostly Uused Products by 50000+ Subscribers

4. Text Analytics

5. Apache Pig

Correct Answer : Get Lastest Questions and Answer :
Explanation: Similar to Apache Hive, Big SQL also supports the use of custom SerDe to handle different data formats. To use custom SerDe blocks to create tables, use
the create table statement clause ROW FORMAT SERDE 'serde.class.name'. It is easier to represent complex data structures in terms of JSON format since the convention used for JSON
data is similar to programming languages. There are several other advantages of using the JSON data format, including being lightweight and easier to parse.
Please note that a user can define custom SerDe based on specific requirements by implementing the Apache Hive interface for SerDe "org.apache.hadoop.hive.serde2.SerDe". To work
with Big SQL, we need to include the custom SerDe Java class in the Big SQL classpath. Refer to the InfoSphere BigInsights Information Center for details (see Related topics).
To demonstrate the use of JSON SerDe, we'll create a table WISHLIST for our shopping cart scenario which contains CUSTOMER_ID and an array of ITEM_ID to keep the list of items a
customer wants to keep in the wish list. See Related topics to download the SerDe block used in the example can be downloaded from GitHub.

Question : Which of the following is correct with regards to IBM GPFS

1. File clones can be created from a regular file or a file in a snapshot using the mmclone command.

2. Use the mmclone command to display status for specified files.

3. Access Mostly Uused Products by 50000+ Subscribers

4. A and B

5. A,B and C

Correct Answer : Get Lastest Questions and Answer :
Explanation: Creating and managing file clones : A file clone is a writable snapshot of an individual file. File clones can be used to provision virtual machines by
creating a virtual disk for each machine by cloning a common base image. A related usage is to clone the virtual disk image of an individual machine as part of taking a snapshot
of the machine state.

Cloning a file is similar to creating a copy of a file, but the creation process is faster and more space efficient because no additional disk space is consumed until the clone or
the original file is modified. Multiple clones of the same file can be created with no additional space overhead. You can also create clones of clones.

Management of file clones in a GPFS file system includes:

Creating file clones : File clones can be created from a regular file or a file in a snapshot using the mmclone command.

Listing file clones : Use the mmclone command to display status for specified files.

Deleting file clones : There is no explicit GPFS command available for deleting file clones. File clones can be deleted using a regular delete (rm) command. Clone parent files
cannot be deleted until all file clone copies of the parent have been deleted and all open file handles to them have been closed.

Splitting file clones from clone parents : Use the mmclone command to split a file clone from a clone parent.

File clones and disk space management : File clones have several considerations that are related to disk space management.

File clones and snapshots : When a snapshot is created and a file clone is subsequently updated, the previous state of the file clone will be saved in the snapshot.

File clones and policy files : Policy files can be created to examine clone attributes.

Question : You are working, with an email marketing company, which already have TB of email data, and they are expecting GB data will be added every day. A typical query
can involve pulling in 20 GB of data. It always been an issue to query this data, hence a proper solution is expected. Which of the following can solve this requirement

1. Set up a Hadoop system

2. Utilize de-duplication and compression technology

3. Access Mostly Uused Products by 50000+ Subscribers

4. Create range partitions and proper indexing of data and store in DB2

Correct Answer : Get Lastest Questions and Answer :
Explanation: Data that also contains meta-data (data about data) are generally classified as structured or semi-structured data. Relational databases " that contain
schema of tables, XML files " that contain tags, simple tables with columns etc. are examples of structured data.

Now consider data like a blog content, or a comment, email messages, any text document " say legal policies of a company, or an audio file, or video file or images, which
constitute about 80 to 90% of all forms of data available for analysis. These forms of data do not follow any specific structure nor do they contain information about the content
of the data. These are all classified as unstructured data.

Hadoop suitable for analysing unstructured data.

Hadoop has distributed storage and distributed processing framework, which is essential for unstructured data analysis, owing to its size and complexity.

Hadoop is designed to support Big Data " Data that is too big for any traditional database technologies to accommodate. Unstructured data is BIG " really BIG in most cases.

Data in HDFS is stored as files. Hadoop does not enforce on having a schema or a structure to the data that has to be stored.

Hadoop also has applications like Sqoop, HIVE, HBASE etc. to import and export from other popular traditional and non-traditional database forms. This allows using Hadoop for
structuring any unstructured data and then exporting the semi-structured or structured data into traditional databases for further analysis.

Hadoop is a very powerful tool for writing customized codes. Analyzing unstructured data typically involves complex algorithms. Programmers can implement algorithms of any
complexity, while exploiting the benefits of the Hadoop framework for efficiency and reliability. This gives flexibility for users to understand the data at a crude level and
program any algorithm that may be appropriate.

Hadoop being an open-source project, in numerous applications specific to video/audio file processing, image files analysis, text analytics have being developed in market;
Pivotal, pythian to mentioned a few.

Related Questions

Question : : Operational modeling elements represent the parts of the application and describe how they communicate with each other. Some of these elements include components
(pieces of the application itself), nodes (pieces of infrastructure that can run the application), locations (security zones or physical places), and connections (communication
links between elements). In operational model, two levels have been documented very well, one of them is Theoretical level and other one will be ...

1. Regional

2. Physical

3. Access Mostly Uused Products by 50000+ Subscribers

4. Logical

Question : In Arinika Inc, there are big web server farms, which continuously generates the data in log files. And you data science team want to analyze this logs. Which of the
following recommended ?

1. Apache Pig and Hive

2. Apache Spark

3. Access Mostly Uused Products by 50000+ Subscribers

4. IBM (InfoSphere Streams), and BigInsight

Question : You are working in a financial organization, where they have strict policy to keep the data for at least years. Hence, being a solution architect, you will be
asked to have data retention and archival in place. So what are all the requirement for Data retention and archival

1. A format and storage repository for archived data

2. Public cloud

3. Access Mostly Uused Products by 50000+ Subscribers

4. Solid-state technology

Question : The Annotation Query Language (AQL) is the easiest and most flexible tool to pull structured output from which of the following?

1. Hive data structures

2. Unstructured text

3. Access Mostly Uused Products by 50000+ Subscribers

4. JDBC connected relational data marts

Question : You have been, storing data in IBm NoSQL solution, known as IBM Cloudant. And you want to pre-create some of the functions as a view. So that they can be used later
on to fetch the data e.g avrage sale price of a product id. Which of the language, you will be using to write views for Cloudant

1. Go

2. Java

3. Access Mostly Uused Products by 50000+ Subscribers

4. Python

5. Scala

Question : Cloudant is a graph database ?

1. True
2. False