Question : You have been processing huge volume of customer sales data of Arinika Retail Solutions. However, you also need to process Geospatial data, which produce the GeoJSON format. Which of the following would provide a with the desired output in GeoJSON format? 1. Big SQL
Correct Answer : Get Lastest Questions and Answer : Explanation: Similar to Apache Hive, Big SQL also supports the use of custom SerDe to handle different data formats. To use custom SerDe blocks to create tables, use the create table statement clause ROW FORMAT SERDE 'serde.class.name'. It is easier to represent complex data structures in terms of JSON format since the convention used for JSON data is similar to programming languages. There are several other advantages of using the JSON data format, including being lightweight and easier to parse. Please note that a user can define custom SerDe based on specific requirements by implementing the Apache Hive interface for SerDe "org.apache.hadoop.hive.serde2.SerDe". To work with Big SQL, we need to include the custom SerDe Java class in the Big SQL classpath. Refer to the InfoSphere BigInsights Information Center for details (see Related topics). To demonstrate the use of JSON SerDe, we'll create a table WISHLIST for our shopping cart scenario which contains CUSTOMER_ID and an array of ITEM_ID to keep the list of items a customer wants to keep in the wish list. See Related topics to download the SerDe block used in the example can be downloaded from GitHub.
Question : Which of the following is correct with regards to IBM GPFS 1. File clones can be created from a regular file or a file in a snapshot using the mmclone command.
2. Use the mmclone command to display status for specified files.
Correct Answer : Get Lastest Questions and Answer : Explanation: Creating and managing file clones : A file clone is a writable snapshot of an individual file. File clones can be used to provision virtual machines by creating a virtual disk for each machine by cloning a common base image. A related usage is to clone the virtual disk image of an individual machine as part of taking a snapshot of the machine state.
Cloning a file is similar to creating a copy of a file, but the creation process is faster and more space efficient because no additional disk space is consumed until the clone or the original file is modified. Multiple clones of the same file can be created with no additional space overhead. You can also create clones of clones.
Management of file clones in a GPFS file system includes:
Creating file clones : File clones can be created from a regular file or a file in a snapshot using the mmclone command.
Listing file clones : Use the mmclone command to display status for specified files.
Deleting file clones : There is no explicit GPFS command available for deleting file clones. File clones can be deleted using a regular delete (rm) command. Clone parent files cannot be deleted until all file clone copies of the parent have been deleted and all open file handles to them have been closed.
Splitting file clones from clone parents : Use the mmclone command to split a file clone from a clone parent.
File clones and disk space management : File clones have several considerations that are related to disk space management.
File clones and snapshots : When a snapshot is created and a file clone is subsequently updated, the previous state of the file clone will be saved in the snapshot.
File clones and policy files : Policy files can be created to examine clone attributes.
Question : You are working, with an email marketing company, which already have TB of email data, and they are expecting GB data will be added every day. A typical query can involve pulling in 20 GB of data. It always been an issue to query this data, hence a proper solution is expected. Which of the following can solve this requirement 1. Set up a Hadoop system
2. Utilize de-duplication and compression technology
4. Create range partitions and proper indexing of data and store in DB2
Correct Answer : Get Lastest Questions and Answer : Explanation: Data that also contains meta-data (data about data) are generally classified as structured or semi-structured data. Relational databases " that contain schema of tables, XML files " that contain tags, simple tables with columns etc. are examples of structured data.
Now consider data like a blog content, or a comment, email messages, any text document " say legal policies of a company, or an audio file, or video file or images, which constitute about 80 to 90% of all forms of data available for analysis. These forms of data do not follow any specific structure nor do they contain information about the content of the data. These are all classified as unstructured data.
Hadoop suitable for analysing unstructured data.
Hadoop has distributed storage and distributed processing framework, which is essential for unstructured data analysis, owing to its size and complexity.
Hadoop is designed to support Big Data " Data that is too big for any traditional database technologies to accommodate. Unstructured data is BIG " really BIG in most cases.
Data in HDFS is stored as files. Hadoop does not enforce on having a schema or a structure to the data that has to be stored.
Hadoop also has applications like Sqoop, HIVE, HBASE etc. to import and export from other popular traditional and non-traditional database forms. This allows using Hadoop for structuring any unstructured data and then exporting the semi-structured or structured data into traditional databases for further analysis.
Hadoop is a very powerful tool for writing customized codes. Analyzing unstructured data typically involves complex algorithms. Programmers can implement algorithms of any complexity, while exploiting the benefits of the Hadoop framework for efficiency and reliability. This gives flexibility for users to understand the data at a crude level and program any algorithm that may be appropriate.
Hadoop being an open-source project, in numerous applications specific to video/audio file processing, image files analysis, text analytics have being developed in market; Pivotal, pythian to mentioned a few.