Dell EMC Data Science and BigData Certification Questions and Answers

Question : Which word or phrase completes the statement? Unix is to bash as Hadoop is to:

1. NameNode
2. Sqoop
3. Access Mostly Uused Products by 50000+ Subscribers
4. Flume
5. Pig

Correct Answer : Get Lastest Questions and Answer :

Explanation: Apache Pig consists of a data flow language, Pig Latin, and an environment to execute the
Pig code. The main benefit of using Pig is to utilize the power of MapReduce in a
distributed system, while simplifying the tasks of developing and executing a MapReduce
job. In most cases, it is transparent to the user that a MapReduce job is running in the
background when Pig commands are executed. This abstraction layer on top of Hadoop
simplifies the development of code against data in HDFS and makes MapReduce more
accessible to a larger audience. With Apache Hadoop and Pig already installed, the basics of using Pig include entering
the Pig execution environment by typing pig at the command prompt and then entering a
sequence of Pig instruction lines at the grunt prompt.

Question : A call center for a large electronics company handles an average of , support calls a day.
The head of the call center would like to optimize the staffing of the call center during the rollout of
a new product due to recent customer complaints of long wait times. You have been asked to
create a model to optimize call center costs and customer wait times.
The goals for this project include:
1. Relative to the release of a product, how does the call volume change over time?
2. How to best optimize staffing based on the call volume for the newly released product, relative
to old products.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Determine the frequency of calls by both product type and customer language.
Which goals are suitable to be completed with MapReduce?

1. Goal 2 and 4
2. Goal 1 and 3
3. Access Mostly Uused Products by 50000+ Subscribers
4. Goals 2, 3, 4

Correct Answer : Get Lastest Questions and Answer :

Explanation:

Question : Consider the example of an analysis for fraud detection on credit card usage. You will need to
ensure higher-risk transactions that may indicate fraudulent credit card activity are retained in your
data for analysis, and not dropped as outliers during pre-processing. What will be your approach
for loading data into the analytical sandbox for this analysis?

1. ETL
2. ELT
3. Access Mostly Uused Products by 50000+ Subscribers
4. OLTP

Correct Answer : Get Lastest Questions and Answer :

Explanation: Phase 2-Data preparation: Phase 2 requires the presence of an analytic sandbox,
in which the team can work with data and perform analytics for the duration of the
project. The team needs to execute extract, load, and transform (ELT) or extract,
transform and load (ETL) to get data into the sandbox. The ELT and ETL are
sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so
the team can work with it and analyze it. In this phase, the team also needs to
familiarize itself with the data thoroughly and take steps to condition the data

Related Questions

Question : Select the correct statement which applies to logistic regression

1. Computationally inexpensive, easy to implement, knowledge representation easy to interpret
2. May have low accuracy
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only 1 and 3 are correct
5. All 1,2 and 3 are correct

Question : Suppose that we are interested in the factors that influence whether a political candidate wins an election.
The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of
money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.

Above is an example of

1. Linear Regression
2. Logistic Regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Maximum likelihood estimation
5. Hierarchical linear models

Question : A researcher is interested in how variables, such as GRE (Graduate Record Exam scores),
GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school.
The response variable, admit/don't admit, is a binary variable.

Above is an example of

1. Linear Regression
2. Logistic Regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Maximum likelihood estimation
5. Hierarchical linear models

Question :

Which of the following is an correct example of the target variable in regression (supervised learning) ?

1. Nominal values like true, false
2. Reptile, fish, mammal, amphibian, plant, fungi
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only 1 and 2
5. All 1,2 and 3

Question : Select the sequence of the developing machine learning applications
A. Analyze the input data
B. Prepare the input data
C. Collect data
D. Train the algorithm
E. Test the algorithm
F. Use It

1. A,B,C,D,E,F
2. C,B,A,D,E,F
3. Access Mostly Uused Products by 50000+ Subscribers
4. C,B,A,D,E,F

Question :

Select the correct statement which applies to K-Nearest Neighbors

1. No Assumption about the data
2. Computationaly expensive
3. Access Mostly Uused Products by 50000+ Subscribers
4. Works with Numeric Values

1. 1,2,3,4
2. 2,3,4
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2,4