Dell EMC Data Science and BigData Certification Questions and Answers

Question : : In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?

1. Discovery
2. Data Preparation
3. Access Mostly Uused Products by 50000+ Subscribers
4. Communicate Results

Correct Answer : Get Lastest Questions and Answer :
Explanation: Phase 1-Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from
which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an
analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.
Phase 2-Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and
transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and
analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data
Phase 3-Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn
about the relationships between variables and subsequently selects key variables and the most suitable models. Phase 4-Model building: In Phase 4, the team develops datasets for testing, training, and production
purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase. The team also considers whether its existing tools will suffice for running the models, or
if it will need a more robust environment for executing models and workflows (for example, fast hardware and parallel processing, if applicable).
Phase 5-Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. The team
should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders.
Phase 6-Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.

Question : You are testing two new weight-gain formulas for puppies. The test gives the results:
Control group: 1% weight gain
Formula A. 3% weight gain
Formula B. 4% weight gain
A one-way ANOVA returns a p-value = 0.027
What can you conclude?

1. Formula A and Formula B are about equally effective at promoting weight gain.
2. Formula A and Formula B are both effective at promoting weight gain.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Either Formula A or Formula B is effective at promoting weight gain.

Correct Answer : Get Lastest Questions and Answer :
Explanation: A One-Way ANOVA (Analysis of Variance) is a statistical technique by which we can test if three or more means are equal. It tests if the value of a single variable differs significantly among
three or more levels of a factor.

We can say we have a framework for one-way ANOVA when we have a single factor with three or more levels and multiple observations at each level.

In this kind of layout, we can calculate the mean of the observations within each level of our factor.

The concepts of factor, levels and multiple observations at each level can be best understood by an example.

Factor and Levels - An Example

Let us suppose that the Human Resources Department of a company desires to know if occupational stress varies according to age.

The variable of interest is therefore occupational stress as measured by a scale.

The factor being studied is age. There is just one factor (age) and hence a situation appropriate for one-way ANOVA.

Further suppose that the employees have been classified into three groups (levels):

less than 40
40 to 55
above 55
These three groups are the levels of factor age - there are three levels here. With this design, we shall have multiple observations in the form of scores on Occupational Stress from a number of employees belonging to
the three levels of factor age. We are interested to know whether all the levels i.e. age groups have equal stress on the average.

Non-significance of the test statistic (F-statistic) associated with this technique would imply that age has no effect on stress experienced by employees in their respective occupations. On the other hand,
significance would imply that stress afflicts different age groups differently.

Question : Data visualization is used in the final presentation of an analytics project. For what else is this
technique commonly used?

1. Data exploration
2. Descriptive statistics
3. Access Mostly Uused Products by 50000+ Subscribers
4. Model selection

Correct Answer : Get Lastest Questions and Answer :
Explanation: Data exploration is an informative search used by data consumers to form true analysis from the information gathered. Often, data is gathered in a non-rigid or controlled manner in large
bulks. For true analysis, this unorganized bulk of data needs to be narrowed down. This is where data exploration is used to analyze the data and information from the data to form further analysis.

Data often converges in a central warehouse called a data warehouse. This data can come from various sources using various formats. Relevant data is needed for tasks such as statistical reporting, trend spotting and
pattern spotting. Data exploration is the process of gathering such relevant data. There are two main methodologies or techniques used to retrieve relevant data from large, unorganized pools. They are the manual and
automatic methods. The manual method is another name for data exploration, while the automatic method is also known as data mining.

Some people believe these terms are synonymous, while others see a technical difference between them. Data mining generally refers to gathering relevant data from large databases. Data exploration, on the other hand,
generally refers to a data user being able to find his or her way through large amounts of data in order to gather necessary information.

Related Questions

Question : Which word or phrase completes the statement?
Business Intelligence is to ad-hoc reporting and dashboards as Data Science is to
______________ .

1. Alerts and Queries
2. Structured Data and Data Sources
3. Access Mostly Uused Products by 50000+ Subscribers
4. Sales and profit reporting

Question : What is a property of window functions in SQL commands?

1. They can be used to calculate moving averages over various intervals.
2. They group rows into a single output row.
3. Access Mostly Uused Products by 50000+ Subscribers
4. They don't require ordering of data within a window.

Question : You are attempting to find the Euclidean distance between two centroids:
Centroid A's coordinates: (X = 2, Y = 4)
Centroid B's coordinates (X = 8, Y = 10)
Which formula finds the correct Euclidean distance?

1. ((2-8)2+(4-10)2) or 72
2. SQRT(((2-8) x 2) + ((4-10) x 2)) or 12.17
3. Access Mostly Uused Products by 50000+ Subscribers
4. SQRT((2-8)2+(4-10)2) or 8.49

Question : In data visualization, which type of chart is recommended to represent frequency data?

1. Q-Q chart
2. Scatterplot
3. Access Mostly Uused Products by 50000+ Subscribers
4. Line chart

Question : Which activity might be performed in the Operationalize phase of the Data Analytics Lifecycle?

1. Try different analytical techniques
2. Try different variables
3. Access Mostly Uused Products by 50000+ Subscribers
4. Transform existing variables

Question : Refer to the exhibit.
You are asked to write a report on how specific variables impact your client's sales using a data
set provided to you by the client. The data includes 15 variables that the client views as directly
related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1. Multicollinearity is not an issue among the variables
2. Only three variables-A, B, and C-have significant correlation with sales
You build a linear regression model on the dependent variable of sales with the independent
variables of A, B, and C. The results of the regression are seen in the exhibit.
Which interpretation is supported by the analysis?

1. Variables A, B, and C are significantly impacting sales and are effectively estimating sales
2. Due to the R2 of 0.10, the model is not valid - the linear regression should be re-run with all 15
variables forced into the model to increase the R2
3. Access Mostly Uused Products by 50000+ Subscribers
4. Due to the R2 of 0.10, the model is not valid - a different analytical model should be attempted