Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)

Question : A Data Scientist is assigned to build a model from a reporting data warehouse. The warehouse
contains data collected from many sources and transformed through a complex, multi-stage ETL
process. What is a concern the data scientist should have about the data?

1. It is too processed
2. It is not structured
3. It is not normalized
4. It is too centralized

Correct Answer : 1
Explanation: Prior to conducting data analysis, the required data must be collected and processed to
extract the useful information. The degree of initial processing and data preparation
depends on the volume of data, as well as how straightforward it is to understand the
structure of the data. Highly processed data may loose some imporatnt information.

Question : Which word or phrase completes the statement? Emphasis color is to standard color as _______ .

1. Main message is to key findings
2. Frequent item set is to item
3. Main message is to context
4. Pie chart is to proportions

Correct Answer : 3
Explanation: Our brains are compelled to find meaning, whether it is intended or not. Because the eyes are attracted to bright and high-contrast colors, viewers will derive meaning from something that stands out. When you use color for emphasis, it's like shouting that this object or element has the greatest value. At the Lynda.com site, the bright yellow is used to prominently display their most important message.

Question : Which data asset is an example of semi-structured data?

1. XML data file
2. Database table
3. Webserver log
4. News article

Correct Answer : 1
Explanation: 5.3. Semi-Structured Data
idea predates XML but not HTML
data is available electronically in
database systems
file systems, e.g., bibliographic data, Web data
data exchange formats, e.g., EDI, scientific data
attempt to reconcile database and document "worlds"
semi-structured data
organised in semantic entities
similar entities are grouped together
entities in same group may not have same attributes
order of attributes not necessarily important
not all attributes may be required
size of same attributes in a group may differ
type of same attributes in a group may differ
5.4. Example of Semi-Structured Data

name: Peter Wood
email: ptw@dcs.bbk.ac.uk, p.wood@bbk.ac.uk
name:
first name: Mark
last name: Levene
email: mark@dcs.bbk.ac.uk
name: Alex Poulovassilis
affiliation: Birkbeck
5.5. Semi-Structured Data Models

based on labelled graphs rather than labelled trees
used for data exchange among, and integration of, heterogeneous data sources

schema information is in the edge labels
sometimes called schemaless or self-describing
data stored at the leaves

Related Questions

Question : On analyzing your time series data you suspect that the data represented as
y1, y2, y3, ... , yn-1, yn
may have a trend component that is quadratic in nature. Which pattern of data will indicate that
the trend in the time series data is quadratic in nature?

1. (y4-y2) - (y3-y1) = ....= (yn-yn-2)-(yn-1-yn-3)

2. ((y2-y1) /y1 ) * 100% = ....((yn-yn-1)/yn-1) * 100%

3. (y2-y1) = (y3-y2) = .... = (yn-yn-1)

4. (y3-y2) - (y2-y1) = ....= (yn-yn-1)-(yn-1-yn-2)

Question : Which analytical method is considered unsupervised?

1. Naive Bayesian classifier

2. Decision tree
3. Linear regression
4. K-means clustering

Question : You have used k-means clustering to classify behavior of , customers for a retail store.
You decide to use household income, age, gender and yearly purchase amount as measures. You
have chosen to use 8 clusters and notice that 2 clusters only have 3 customers assigned. What
should you do?

1. Decrease the number of measures used
2. Increase the number of clusters
3. Decrease the number of clusters
4. Identify additional measures to add to the analysis

Question : What does R code nv <- v[v < ] do?

1. Selects the values in vector v that are less than 1000 and assigns them to the vector nv
2. Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000
3. Removes elements of vector v less than 1000 and assigns the elements >= 1000 to nv
4. Selects values of vector v less than 1000, modifies v, and makes a copy to nv

Question : For which class of problem is MapReduce most suitable?

1. Minimal result data
2. Simple marginalization tasks
3. Embarrassingly parallel
4. Non-overlapping queries

Question : Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?

1. Define the process to maintain the model
2. Try different analytical techniques
3. Try different variables
4. Transform existing variables