Question : Your colleague, who is new to Hadoop, approaches you with a question. They want to know how best to access their data. This colleague has a strong background in data flow languages and programming. Which query interface would you recommend?
Correct Answer : Get Lastest Questions and Answer : Exp: Apache Pig consists of a data flow language, Pig Latin, and an environment to execute the Pig code. The main benefit of using Pig is to utilize the power of MapReduce in a distributed system, while simplifying the tasks of developing and executing a MapReduce job. In most cases, it is transparent to the user that a MapReduce job is running in the background when Pig commands are executed. This abstraction layer on top of Hadoop simplifies the development of code against data in HDFS and makes MapReduce more accessible to a larger audience. With Apache Hadoop and Pig already installed, the basics of using Pig include entering the Pig execution environment by typing pig at the command prompt and then entering a sequence of Pig instruction lines at the grunt prompt Such Pig instructions are translated, behind the scenes, into one or more MapReduce jobs. Thus, Pig simplifies the coding of a MapReduce job and enables the user to quickly develop, test, and debug the Pig code. In this particular example, the MapReduce job would be initiated after the STORE command is processed. Prior to the STORE command, Pig had begun to build an execution plan but had not yet initiated MapReduce processing. Pig provides for the execution of several common data manipulations, such as inner and outer joins between two or more files (tables), as would be expected in a typical relational database. Writing these joins explicitly in MapReduce using Hadoop would be quite involved and complex. Pig also provides a GROUP BY functionality that is similar to the Group By functionality offered in SQL.
Question : The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to import the data into Hadoop. Which tool should they use? 1. Chukwa 2. Sqoop 3. Access Mostly Uused Products by 50000+ Subscribers 4. Flume
Correct Answer : Get Lastest Questions and Answer : Exp: Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.
Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.
This document describes how to get started using Sqoop to move data between databases and Hadoop and provides reference information for the operation of the Sqoop command-line tool suite. This document is intended for:
System and application programmers System administrators Database administrators Data analysts Data engineers
Question : What does the R code z <- f[1:10, ] do?
Correct Answer : Get Lastest Questions and Answer : Exp: R operates on named data structures. The simplest such structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers. To set up a vector named x, say, consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command
> x <- c(10.4, 5.6, 3.1, 6.4, 21.7) This is an assignment statement using the function c() which in this context can take an arbitrary number of vector arguments and whose value is a vector got by concatenating its arguments end to end.7
A number occurring by itself in an expression is taken as a vector of length one.
Notice that the assignment operator ('<-'), which consists of the two characters '<' ("less than") and '-' ("minus") occurring strictly side-by-side and it 'points' to the object receiving the value of the expression. In most contexts the '=' operator can be used as an alternative.
Assignment can also be made using the function assign(). An equivalent way of making the same assignment as above is with:
> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7)) The usual operator, <-, can be thought of as a syntactic short-cut to this.
Assignments can also be made in the other direction, using the obvious change in the assignment operator. So the same assignment could be made using
> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x If an expression is used as a complete command, the value is printed and lost8. So now if we were to use the command
> 1/x the reciprocals of the five values would be printed at the terminal (and the value of x, of course, unchanged).
The further assignment
> y <- c(x, 0, x) would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place.