Premium

Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)



Question : A Data Scientist is assigned to build a model from a reporting data warehouse. The warehouse
contains data collected from many sources and transformed through a complex, multi-stage ETL
process. What is a concern the data scientist should have about the data?


  : A Data Scientist is assigned to build a model from a reporting data warehouse. The warehouse
1. It is too processed
2. It is not structured
3. It is not normalized
4. It is too centralized





Correct Answer : 1
Explanation: Prior to conducting data analysis, the required data must be collected and processed to
extract the useful information. The degree of initial processing and data preparation
depends on the volume of data, as well as how straightforward it is to understand the
structure of the data. Highly processed data may loose some imporatnt information.





Question : Which word or phrase completes the statement? Emphasis color is to standard color as _______ .


  : Which word or phrase completes the statement? Emphasis color is to standard color as _______ .
1. Main message is to key findings
2. Frequent item set is to item
3. Main message is to context
4. Pie chart is to proportions



Correct Answer : 3
Explanation: Our brains are compelled to find meaning, whether it is intended or not. Because the eyes are attracted to bright and high-contrast colors, viewers will derive meaning from something that stands out. When you use color for emphasis, it's like shouting that this object or element has the greatest value. At the Lynda.com site, the bright yellow is used to prominently display their most important message.





Question : Which data asset is an example of semi-structured data?

  : Which data asset is an example of semi-structured data?
1. XML data file
2. Database table
3. Webserver log
4. News article



Correct Answer : 1
Explanation: 5.3. Semi-Structured Data
idea predates XML but not HTML
data is available electronically in
database systems
file systems, e.g., bibliographic data, Web data
data exchange formats, e.g., EDI, scientific data
attempt to reconcile database and document "worlds"
semi-structured data
organised in semantic entities
similar entities are grouped together
entities in same group may not have same attributes
order of attributes not necessarily important
not all attributes may be required
size of same attributes in a group may differ
type of same attributes in a group may differ
5.4. Example of Semi-Structured Data

name: Peter Wood
email: ptw@dcs.bbk.ac.uk, p.wood@bbk.ac.uk
name:
first name: Mark
last name: Levene
email: mark@dcs.bbk.ac.uk
name: Alex Poulovassilis
affiliation: Birkbeck
5.5. Semi-Structured Data Models

based on labelled graphs rather than labelled trees
used for data exchange among, and integration of, heterogeneous data sources

schema information is in the edge labels
sometimes called schemaless or self-describing
data stored at the leaves


Related Questions


Question : On analyzing your time series data you suspect that the data represented as
y1, y2, y3, ... , yn-1, yn
may have a trend component that is quadratic in nature. Which pattern of data will indicate that
the trend in the time series data is quadratic in nature?


 :  On analyzing your time series data you suspect that the data represented as
1. (y4-y2) - (y3-y1) = ....= (yn-yn-2)-(yn-1-yn-3)

2. ((y2-y1) /y1 ) * 100% = ....((yn-yn-1)/yn-1) * 100%

3. (y2-y1) = (y3-y2) = .... = (yn-yn-1)

4. (y3-y2) - (y2-y1) = ....= (yn-yn-1)-(yn-1-yn-2)


Question : Which analytical method is considered unsupervised?

 :  Which analytical method is considered unsupervised?
1. Naive Bayesian classifier

2. Decision tree
3. Linear regression
4. K-means clustering



Question : You have used k-means clustering to classify behavior of , customers for a retail store.
You decide to use household income, age, gender and yearly purchase amount as measures. You
have chosen to use 8 clusters and notice that 2 clusters only have 3 customers assigned. What
should you do?

 :  You have used k-means clustering to classify behavior of ,  customers for a retail store.
1. Decrease the number of measures used
2. Increase the number of clusters
3. Decrease the number of clusters
4. Identify additional measures to add to the analysis


Question : What does R code nv <- v[v < ] do?

 :  What does R code nv <- v[v < ] do?
1. Selects the values in vector v that are less than 1000 and assigns them to the vector nv
2. Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000
3. Removes elements of vector v less than 1000 and assigns the elements >= 1000 to nv
4. Selects values of vector v less than 1000, modifies v, and makes a copy to nv


Question : For which class of problem is MapReduce most suitable?

  : For which class of problem is MapReduce most suitable?
1. Minimal result data
2. Simple marginalization tasks
3. Embarrassingly parallel
4. Non-overlapping queries




Question : Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?


  : Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?
1. Define the process to maintain the model
2. Try different analytical techniques
3. Try different variables
4. Transform existing variables