Dell EMC Data Science and BigData Certification Questions and Answers

Question : Your colleague, who is new to Hadoop, approaches you with a question. They want to know how
best to access their data. This colleague has a strong background in data flow languages and
programming.
Which query interface would you recommend?

1. Hive
2. Pig
3. Access Mostly Uused Products by 50000+ Subscribers
4. HBase

Correct Answer : Get Lastest Questions and Answer : Exp: Apache Pig consists of a data flow language, Pig Latin, and an environment to execute the Pig code. The main benefit of using Pig is to utilize the power of MapReduce in a distributed system,
while simplifying the tasks of developing and executing a MapReduce job. In most cases, it is transparent to the user that a MapReduce job is running in the background when Pig commands are executed. This abstraction
layer on top of Hadoop simplifies the development of code against data in HDFS and makes MapReduce more
accessible to a larger audience. With Apache Hadoop and Pig already installed, the basics of using Pig include entering the Pig execution environment by typing pig at the command prompt and then entering a sequence of
Pig instruction lines at the grunt prompt Such Pig instructions are translated, behind the scenes, into one or more MapReduce jobs. Thus, Pig simplifies the coding of a MapReduce job and enables the user to quickly
develop, test, and debug the Pig code. In this particular example, the MapReduce job would be initiated after the STORE command is processed. Prior to the STORE command, Pig had begun to build an execution plan but
had not yet initiated MapReduce processing. Pig provides for the execution of several common data manipulations, such as inner and outer joins between two or more files (tables), as would be expected in a typical
relational database. Writing these joins explicitly in MapReduce using Hadoop would be quite involved and complex. Pig also provides a GROUP BY functionality that is similar to the Group By functionality offered in SQL.

Question : The web analytics team uses Hadoop to process access logs. They now want to correlate this
data with structured user data residing in a production single-instance JDBC database. They
collaborate with the production team to import the data into Hadoop. Which tool should they use?

1. Chukwa
2. Sqoop
3. Access Mostly Uused Products by 50000+ Subscribers
4. Flume

Correct Answer : Get Lastest Questions and Answer :
Exp: Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop
Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault
tolerance.

This document describes how to get started using Sqoop to move data between databases and Hadoop and provides reference information for the operation of the Sqoop command-line tool suite. This document is intended for:

System and application programmers
System administrators
Database administrators
Data analysts
Data engineers

Question : What does the R code
z <- f[1:10, ]
do?

1. Assigns the 1st 10 columns of the 1st row of f to z
2. Assigns a sequence of values from 1 to 10 to z
3. Access Mostly Uused Products by 50000+ Subscribers
4. Assigns the first 10 rows of f to the vector z

Correct Answer : Get Lastest Questions and Answer : Exp: R operates on named data structures. The simplest such structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers. To set up a vector named x, say,
consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command

> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
This is an assignment statement using the function c() which in this context can take an arbitrary number of vector arguments and whose value is a vector got by concatenating its arguments end to end.7

A number occurring by itself in an expression is taken as a vector of length one.

Notice that the assignment operator ('<-'), which consists of the two characters '<' ("less than") and '-' ("minus") occurring strictly side-by-side and it 'points' to the object receiving the value of
the expression. In most contexts the '=' operator can be used as an alternative.

Assignment can also be made using the function assign(). An equivalent way of making the same assignment as above is with:

> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
The usual operator, <-, can be thought of as a syntactic short-cut to this.

Assignments can also be made in the other direction, using the obvious change in the assignment operator. So the same assignment could be made using

> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
If an expression is used as a complete command, the value is printed and lost8. So now if we were to use the command

> 1/x
the reciprocals of the five values would be printed at the terminal (and the value of x, of course, unchanged).

The further assignment

> y <- c(x, 0, x)
would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place.

Related Questions

Question : Which of the following statement true with regards to Linear Regression Model?
A. Ordinary Least Square can be used to estimates the parameters in linear model
B. In Linear model, it tries to find multiple lines which can approximate the relationship between the outcome and input variables.
C. Ordinary Least Square is a sum of the individual distance between each point and the fitted line of regression model.
D. Ordinary Least Square is a sum of the squared individual distance between each point and the fitted line of regression model.

1. A,B
2. B,C
3. C,D
4. A,D
5. B,D

Question : Which of the following is correct definition of Residual?

1. Residual is a mean squared error

2. Residual is calculated as square of actual value minus predicted value

3. Residual is calculated as Actual Value minus Predicted Values

4. Residual is calculated as Actual Value plus Predicted Values

Question : Which of the following statement is true with regards to R square?
A. It helps in finding that how close the data are to the fitted model.
B. R Square has value between 0 to 999
C. R square 0 means the model explains none of the variability of the response data around the mean.
D. R square 999 indicates that the model explains all the variability of the response data around mean.
E. Higher the squared, the better the model fits your data.

1. A,B
2. B,C
3. A,C
4. D,E
5. A,E

Question : What all are the correct statements with regards to R-squared values in the regression model?
A. R squared values does not indicate whether regression model are adequate.
B. R squared always indicate accuracy of the regression model.
C. Is it possible that R-square can be low for a good regression model.
D. None of above

1. A,B
2. A,C
3. C,D
4. A,D
5. B,D

Question : Which of the following statement is true for the R square value in the regression model?
A. When R square =1 , all the residuals are equal to 0
B. When R square =0, all the residual are equal to 1
C. R square can be increased by adding more variables to the model.
D. R-squared never decreases upon adding more independent variables.

1. A,B
2. B,C
3. C,D
4. A,C,D
5. A,B,C,D

Question : What are the characteristics of the structured data?

1. Data can be co-related with the relationship keys.

2. They can have define data types.

3. These data can be easily queried.

4. It can have well defined schema

5. All of the above