Premium

Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)



Question : What does R code nv <- v[v < ] do?

 :  What does R code nv <- v[v < ] do?
1. Selects the values in vector v that are less than 1000 and assigns them to the vector nv
2. Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000
3. Removes elements of vector v less than 1000 and assigns the elements >= 1000 to nv
4. Selects values of vector v less than 1000, modifies v, and makes a copy to nv

Correct Answer : 1
Explanation: R operates on named data structures. The simplest such structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers. To set up a vector named x, say, consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command

> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
This is an assignment statement using the function c() which in this context can take an arbitrary number of vector arguments and whose value is a vector got by concatenating its arguments end to end.7

A number occurring by itself in an expression is taken as a vector of length one.

Notice that the assignment operator ('<-'), which consists of the two characters '<' ("less than") and '-' ("minus") occurring strictly side-by-side and it 'points' to the object receiving the value of the expression. In most contexts the '=' operator can be used as an alternative.

Assignment can also be made using the function assign(). An equivalent way of making the same assignment as above is with:

> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
The usual operator, <-, can be thought of as a syntactic short-cut to this.

Assignments can also be made in the other direction, using the obvious change in the assignment operator. So the same assignment could be made using

> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
If an expression is used as a complete command, the value is printed and lost8. So now if we were to use the command

> 1/x
the reciprocals of the five values would be printed at the terminal (and the value of x, of course, unchanged).

The further assignment

> y <- c(x, 0, x)
would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place.








Question : For which class of problem is MapReduce most suitable?

  : For which class of problem is MapReduce most suitable?
1. Minimal result data
2. Simple marginalization tasks
3. Embarrassingly parallel
4. Non-overlapping queries



Correct Answer : 3
Exp: It's basically problems that are huge, but not hard. Travelling salesman depends crucially on the distance between any given pair of cities, so while it can be broken down into many parts, the partial results cannot be recombined so that the globally optimal solution emerges (well, probably not; if you know a way, please apply for your Fields medal now).

On the other hand, counting frequencies of words in a gigantic corpus is trivially partitionable, and trivially recombinable (you just add up the vectors computed for the segments of the corpus), so map-reduce is the obvious solution.

In practice, more problems tend to be easily recombinable than not, so the decision whether to parallelize a task or not has more to do with how huge the task is, and less with how hard it is.







Question : Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?


  : Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?
1. Define the process to maintain the model
2. Try different analytical techniques
3. Try different variables
4. Transform existing variables



Correct Answer : 1

Explanation: Operationalize
In the final phase, the team communicates the benefits of the project more broadly and
sets up a pilot project to deploy the work in a controlled way before broadening the work
to a full enterprise or ecosystem of users. In Phase 4, the team scored the model in the
analytics sandbox.




Related Questions


Question : You are analyzing a time series and want to determine its stationarity. You also want to determine
the order of autoregressive models.
How are the autocorrelation functions used?

  : You are analyzing a time series and want to determine its stationarity. You also want to determine
1. PACF as an indication of stationarity, and ACF for the correlation between Xt and Xt-k not
explained by their mutual correlation with X1 through Xk-1.
2. ACF as an indication of stationarity, and PACF to determine the correlation of X1 through Xk-1.
3. Access Mostly Uused Products by 50000+ Subscribers
4. ACF as an indication of stationarity, and PACF for the correlation between Xt and Xt-k not
explained by their mutual correlation with X1 through Xk-1.



Question : Which word or phrase completes the statement? A spreadsheet is to a data island as a centralized
database for reporting is to a ________?

 : Which word or phrase completes the statement? A spreadsheet is to a data island as a centralized
1. Data Repository
2. Analytic Sandbox
3. Access Mostly Uused Products by 50000+ Subscribers
4. Data Warehouse


Question : Which R data structure allows elements to have different data types?

 :  Which R data structure allows elements to have different data types?
1. Matrix
2. Vector
3. Access Mostly Uused Products by 50000+ Subscribers
4. Array


Question : Which key role for a successful analytic project can consult and advise the project team on the
value of end results and how these will be used on a day-to-day basis?

  : Which key role for a successful analytic project can consult and advise the project team on the
1. Business User
2. Project Manager
3. Access Mostly Uused Products by 50000+ Subscribers
4. Business Intelligence Analyst




Question : A disk drive manufacturer has a defect rate of less than .% with % confidence. A quality
assurance team samples 1000 disk drives and finds 14 defective units. Which action should the
team recommend?

 : A disk drive manufacturer has a defect rate of less than .% with % confidence. A quality
1. A larger sample size should be taken to determine if the plant is functioning properly
2. A smaller sample size should be taken to determine if the plant is functioning properly
3. Access Mostly Uused Products by 50000+ Subscribers
4. The manufacturing process should be inspected for problems.



Question : What is required in a presentation for project sponsors?

  : What is required in a presentation for project sponsors?
1. Data warehouse design changes
2. Line by line review of the developed code
3. Access Mostly Uused Products by 50000+ Subscribers
4. Detailed statistical basis for the modeling approach used in the project