Dell EMC Data Science and BigData Certification Questions and Answers

Question : What does R code nv <- v[v < ] do?

1. Selects the values in vector v that are less than 1000 and assigns them to the vector nv
2. Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000
3. Access Mostly Uused Products by 50000+ Subscribers
4. Selects values of vector v less than 1000, modifies v, and makes a copy to nv

Correct Answer : Get Lastest Questions and Answer :
Explanation: R operates on named data structures. The simplest such structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers. To set up a vector named x,
say, consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command

> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
This is an assignment statement using the function c() which in this context can take an arbitrary number of vector arguments and whose value is a vector got by concatenating its arguments end to end.7

A number occurring by itself in an expression is taken as a vector of length one.

Notice that the assignment operator ('<-'), which consists of the two characters '<' ("less than") and '-' ("minus") occurring strictly side-by-side and it 'points' to the object receiving the value of
the expression. In most contexts the '=' operator can be used as an alternative.

Assignment can also be made using the function assign(). An equivalent way of making the same assignment as above is with:

> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
The usual operator, <-, can be thought of as a syntactic short-cut to this.

Assignments can also be made in the other direction, using the obvious change in the assignment operator. So the same assignment could be made using

> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
If an expression is used as a complete command, the value is printed and lost8. So now if we were to use the command

> 1/x
the reciprocals of the five values would be printed at the terminal (and the value of x, of course, unchanged).

The further assignment

> y <- c(x, 0, x)
would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place.

Question : For which class of problem is MapReduce most suitable?

1. Minimal result data
2. Simple marginalization tasks
3. Access Mostly Uused Products by 50000+ Subscribers
4. Non-overlapping queries

Correct Answer : Get Lastest Questions and Answer :
Exp: It's basically problems that are huge, but not hard. Travelling salesman depends crucially on the distance between any given pair of cities, so while it can be broken down into many parts, the partial results
cannot be recombined so that the globally optimal solution emerges (well, probably not; if you know a way, please apply for your Fields medal now).

On the other hand, counting frequencies of words in a gigantic corpus is trivially partitionable, and trivially recombinable (you just add up the vectors computed for the segments of the corpus), so map-reduce is the
obvious solution.

In practice, more problems tend to be easily recombinable than not, so the decision whether to parallelize a task or not has more to do with how huge the task is, and less with how hard it is.

Question : Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?

1. Define the process to maintain the model
2. Try different analytical techniques
3. Access Mostly Uused Products by 50000+ Subscribers
4. Transform existing variables

Correct Answer : Get Lastest Questions and Answer :

Explanation: Operationalize
In the final phase, the team communicates the benefits of the project more broadly and
sets up a pilot project to deploy the work in a controlled way before broadening the work
to a full enterprise or ecosystem of users. In Phase 4, the team scored the model in the
analytics sandbox.

Related Questions

Question : Refer to the exhibit.
Click on the calculator icon in the upper left corner. An analyst is searching a corpus of documents
for the topic "solid state disk". In the Exhibit, Table A provides the inverse document frequency for
each term across the corpus. Table B provides each term's frequency in four documents selected
from corpus. Which of the four documents is most relevant to the analyst's search?

1. Document A
2. Document C
3. Access Mostly Uused Products by 50000+ Subscribers
4. Document D

Question : Refer to the exhibit.
What provides the decision tree for predicting whether or not someone is a good or bad credit risk.
What would be the assigned probability, p(good), of a single male with no known savings?

1. 0.83
2. 0
3. Access Mostly Uused Products by 50000+ Subscribers
4. 0.6

Question : Refer to the exhibit.
The exhibit shows four graphs labeled as Fig A thorough Fig D. Which figure represents the
entropy function relative to a Boolean classification and is represented by the formula shown in
Exhibit?

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers
4. D

Question : Refer to the exhibit
You ran a linear regression, and the final output is seen in the exhibit.
Based only on the information in the exhibit and an acceptable confidence level of 95%, how
would you interpret the interaction of variable D with the dependent variable?

1. In this model, Variable D is not significantly interacting with the dependent variable
2. For every 1 unit increase in variable D, holding all other variables constant, we can expect the
dependent variable to increase by 10.23 units
3. Access Mostly Uused Products by 50000+ Subscribers
dependent variable to be multiplied by 10.23 units
4. Variable D is more significant than variables A, B, and C.

Question : Refer to the exhibit.
The graph represents an ROC space with four classifiers labelled A through D. Which point in the
graph represents a perfect classification?

1. Q
2. P
3. Access Mostly Uused Products by 50000+ Subscribers
4. R

Question : Refer to the exhibit
Consider the training data set shown in the exhibit. What are the classification (Y = 0 or 1) and the
probability of the classification for the tuple
X(1, 0, 0)
using Naive Bayesian classifier?

1. Classification Y = 1, Probability = 4/54
2. Classification Y = 0, Probability = 4/54
3. Access Mostly Uused Products by 50000+ Subscribers
4. Classification Y = 1, Probability = 1/54