Question : : You are working with a data scientists, and data scientist has to provide deep analytics from existing history data so that the new data which will be coming can be scored based on that. However, you also have to provide the better solution for new data which will be received in real-time with low latency. What combination of solution will work for both of you
1. SPSS Modeler with InfoSphere Streams
2. Info Sphere DataStage with InfoSphere Streams
3. Hive with MapReduce
4. Pig with YARN
5. BigInsight Platform
Correct Answer : 1 Explanation: IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.
IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere. It uses a graphical notation to construct data integration solutions and is available in various versions such as the Server Edition, the Enterprise Edition, and the MVS Edition.
InfoSphere Streams radically extends the state-of-the-art in big data processing; it's a high-performance computing platform that allows users to develop and reuse applications to rapidly ingest, analyze, and correlate information as it arrives from thousands of real-time sources.
Question : Big R utilizes the big SQL query engine for processing 1. True 2. False
Correct Answer : 1 Explanation: use Big R, one needs to take the following steps: (1) Install the "bigr" package on the client. Big R requires several pre-requisite packages including rJava, data.table, and base64enc. (2) To enable function pushdown via the "Apply" functions, each node of a BigInsights cluster needs to have the R interpreter installed on it. In addition, each node also needs the same "bigr" package installed. (3) Ensure that Big SQL server is running on the BigInsights cluster. As Big R statements are executed, they are transparently converted into corresponding SQL and JaQL statements, and these are executed by the Big SQL server.
Question : The goals of Big R is to enable the use of R as a query language for big data: Big R hides many of the complexities pertaining to the underlying Hadoop / MapReduce framework. Using classes such as bigr.frame, bigr.vector and bigr.list, a user is presented with an API that is heavily inspired by R's foundational API on data.frames, vectors and frames.
1. True 2. False
Correct Answer : 1 Explanation: Using Big R, an R user can explore, transform, and analyze big data hosted in a BigInsights cluster using familiar R syntax and paradigm. All of theses capabilities are accessible from a standard R client. The goals of Big R are two-fold: (1) Enable the use of R as a query language for big data: Big R hides many of the complexities pertaining to the underlying Hadoop / MapReduce framework. Using classes such as bigr.frame, bigr.vector and bigr.list, a user is presented with an API that is heavily inspired by R's foundational API on data.frames, vectors and frames. (2) Enable the pushdown of R functions such that they run right on the data: Via mechanisms such as groupApply, rowApply and tableApply, user-written functions composed in R can be shipped to the cluster. BigInsights transparently parallelizes exection of these function and provides consolidated results back to the user. Almost any R code, including most packages available on open-source repositories such as CRAN, can be run using this mechanism.
1. You must modify the configuration files on your NameNode where the master configuration files reside for all DataNodes. 2. You must restart all 100 DataNode daemons to apply the changes.
Question : What describes the relationship between MapReduce and Hive? 1. Hive provides additional capabilities that allow certain types of data manipulation not possible with MapReduce. 2. Hive programs rely on MapReduce but are extensible, allowing developers to do special-purpose processing not provided by MapReduce. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hive provides no additional capabilities to MapReduce. Hive programs are executed as MapReduce jobs via the Hive interpreter.
Question : What is HIVE? 1. HIVE is part of the Apache Hadoop project that enables in-memory analysis of real-time streams of data 2. Hive is a way to add data from local file system to HDFS 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing