Dell EMC Data Science and BigData Certification Questions and Answers

Question : Which of the following statement is correct with regards to factor data type in R?
A. Factor can be used to represent categorical data.
B. Factors can be ordered and unordered
C. Factors are integers
D. Factors can have any undefined new value in it.
E. Factors are characters

1. A,B,C
2. B,C,D
3. C,D,E
4. A,D,E
5. A,C,E

Correct Answer : 1
Explanation: R language has a special data types which is a factor. Factors are used to represent the categorical data. Factors can be ordered or unordered data. These data type is very important when you
have to do statistical analysis or plotting.
Factors are internally stored as integers, and have labels associated with them. Factors look like characters but they are integers under the hood. So you must be careful if you are treating them as a string.
Factors can have only pre-defined set of values, which is known as levels. And by default R always sort them in alphabetical order. One of the example is sex which has two finite values like female and male.

Question : You are working as a data scientists for a company which sale the car tyre in a country. Initially you have been given a data set with almost , rows. To apply your analytics you need location
information as well and you are provided with the 25,000 records with the location information which has 150 unique cities in that. Which of the following data structure from the R programming language best fit for
this column?

1. List

2. Array

3. Vector

4. Factor

Correct Answer : 4
Explanation: R language has a special data types which is a factor. Factors are used to represent the categorical data. Factors can be ordered or unordered data. These data type is very important when you
have to do statistical analysis or plotting.
Factors are internally stored as integers, and have labels associated with them. Factors look like characters but they are integers under the hood. So you must be careful if you are treating them as a string.
Factors can have only pre-defined set of values, which is known as levels. And by default R always sort them in alphabetical order. One of the example is sex which has two finite values like female and male.
As in the question is clearly saying that these 25,000 records have 150 unique cities which can be used as label in the vector.

Question : Which of the following are example of qualitative data?
A. Labels
B. Softness of a cloth
C. Interval
D. Ratio

1. A,B
2. B,C
3. C,D
4. A,D
5. B,D

Correct Answer : 1
Explanation: : Qualitative data is information about qualities; information that can't actually be measured. Some examples of qualitative data are the softness of your skin, the grace with which you run,
and the color of your eyes. However, try telling Photoshop you can't measure color with numbers.

Here's a quick look at the difference between qualitative and quantitative data.

- The age of your car. (Quantitative.)
- The number of hairs on your knuckle. (Quantitative.)
- The softness of a cat. (Qualitative.)
- The color of the sky. (Qualitative.)
- The number of pennies in your pocket. (Quantitative.)

Related Questions

Question : In the MapReduce framework, what is the purpose of the Map Function?

1. It processes the input and generates key-value pairs
2. It collects the output of the Reduce function
3. Access Mostly Uused Products by 50000+ Subscribers
4. It breaks the input into smaller components and distributes to other nodes in the cluster

Question : While having a discussion with your colleague, this person mentions that they want to perform Kmeans
clustering on text file data stored in HDFS.
Which tool would you recommend to this colleague?

1. Sqoop
2. Scribe
3. Access Mostly Uused Products by 50000+ Subscribers
4. Mahout

Question : What describes a true limitation of Logistic Regression method?

1. It does not handle redundant variables well.
2. It does not handle missing values well.
3. Access Mostly Uused Products by 50000+ Subscribers
4. It does not have explanatory values.

Question : You have completed your model and are handing it off to be deployed in production. What should
you deliver to the production team, along with your commented code?

1. The production team needs to understand how your model will interact with the processes they
already support. Give them documentation on expected model inputs and outputs, and guidance
on error-handling.
2. The production team are technical, and they need to understand how the processes that they
support work, so give them the same presentation that you prepared for the analysts.
3. Access Mostly Uused Products by 50000+ Subscribers
to understand how your model interacts with the processes they already support. Give them the
same presentation that you prepared for the project sponsor.
4. The production team supports the processes that run the organization, and they need context
to understand how your model interacts with the processes they already support. Give them the
executive summary.

Question : Which method is used to solve for coefficients b, b, .., bn in your linear regression model :
Y = b0 + b1x1+b2x2+ .... +bnxn

1. Apriori Algorithm
2. Ridge and Lasso
3. Access Mostly Uused Products by 50000+ Subscribers
4. Integer programming

Question : You submit a MapReduce job to a Hadoop cluster and notice that although the job was
successfully submitted, it is not completing. What should you do?

1. Ensure that the NameNode is running
2. Ensure that the JobTracker is running
3. Access Mostly Uused Products by 50000+ Subscribers
4. Ensure that a DataNode is running