Premium

Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)



Question : In which lifecycle stage are test and training data sets created?


 : In which lifecycle stage are test and training data sets created?
1. Model planning
2. Discovery
3. Access Mostly Uused Products by 50000+ Subscribers
4. Data preparation


Correct Answer : Get Lastest Questions and Answer :

Explanation: Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.
Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data
Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models.

Model building: In Phase 4, the team develops datasets for testing, training, and production purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase. The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust environment for executing models and workflows (for example, fast hardware and parallel processing, if applicable).
Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders.
Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.




Question : When creating a presentation for a technical audience, what is the main objective?

 : When creating a presentation for a technical audience, what is the main objective?
1. Show that you met the project goals
2. Show how you met the project goals
3. Access Mostly Uused Products by 50000+ Subscribers
4. Show the technique to be used in the production environment

Correct Answer : Get Lastest Questions and Answer :


Explanation: Using visualization for data exploration is different from presenting results to stakeholders. Not every type of plot is suitable for all audiences. Most of the plots presented earlier try to detail the data as clearly as possible for data scientists to identify structures and relationships. These graphs are more technical in nature and are better suited to technical audiences such as data scientists. Nontechnical stakeholders, however, generally prefer simple, clear graphics that focus on the message rather than the data.

When presenting to a technical audience such as data scientists and analysts, focus on how the work was done. Discuss how the team accomplished the goals and the choices it made
in selecting models or analyzing the data. Share analytical methods and decision-making processes so other analysts can learn from them for future projects. Describe methods, techniques, and technologies used, as this technical audience will be interested in learning about these details and considering whether the approach makes sense in this case and whether it can be extended to other, similar projects. Plan to provide specifics related to model accuracy and speed, such as how well the model will perform in a production environment.







Question : Your company has different sales teams. Each team's sales manager has developed incentive
offers to increase the size of each sales transaction. Any sales manager whose incentive program
can be shown to increase the size of the average sales transaction will receive a bonus.
Data are available for the number and average sale amount for transactions offering one of the
incentives as well as transactions offering no incentive.
The VP of Sales has asked you to determine analytically if any of the incentive programs has
resulted in a demonstrable increase in the average sale amount. Which analytical technique would
be appropriate in this situation?



 : Your company has  different sales teams. Each team's sales manager has developed incentive
1. One-way ANOVA
2. Multi-way ANOVA
3. Access Mostly Uused Products by 50000+ Subscribers
4. Wilcoxson Rank Sum Test


Correct Answer : Get Lastest Questions and Answer :

Explanation: The results of a one-way ANOVA can be considered reliable as long as the following assumptions are met:

Response variable residuals are normally distributed (or approximately normally distributed).
Samples are independent.
Variances of populations are equal.
Responses for a given group are independent and identically distributed normal random variables (not a simple random sample (SRS)).
ANOVA is a relatively robust procedure with respect to violations of the normality assumption.[2] If data are ordinal, a non-parametric alternative to this test should be used such as Kruskal-Wallis one-way analysis of variance.




Related Questions


Question : Which word or phrase completes the statement?
Business Intelligence is to ad-hoc reporting and dashboards as Data Science is to
______________ .

 : Which word or phrase completes the statement?
1. Alerts and Queries
2. Structured Data and Data Sources
3. Access Mostly Uused Products by 50000+ Subscribers
4. Sales and profit reporting




Question : What is a property of window functions in SQL commands?

 : What is a property of window functions in SQL commands?
1. They can be used to calculate moving averages over various intervals.
2. They group rows into a single output row.
3. Access Mostly Uused Products by 50000+ Subscribers
4. They don't require ordering of data within a window.




Question : You are attempting to find the Euclidean distance between two centroids:
Centroid A's coordinates: (X = 2, Y = 4)
Centroid B's coordinates (X = 8, Y = 10)
Which formula finds the correct Euclidean distance?

 : You are attempting to find the Euclidean distance between two centroids:
1. ((2-8)2+(4-10)2) or 72
2. SQRT(((2-8) x 2) + ((4-10) x 2)) or 12.17
3. Access Mostly Uused Products by 50000+ Subscribers
4. SQRT((2-8)2+(4-10)2) or 8.49




Question : In data visualization, which type of chart is recommended to represent frequency data?

 : In data visualization, which type of chart is recommended to represent frequency data?
1. Q-Q chart
2. Scatterplot
3. Access Mostly Uused Products by 50000+ Subscribers
4. Line chart




Question : Which activity might be performed in the Operationalize phase of the Data Analytics Lifecycle?

 : Which activity might be performed in the Operationalize phase of the Data Analytics Lifecycle?
1. Try different analytical techniques
2. Try different variables
3. Access Mostly Uused Products by 50000+ Subscribers
4. Transform existing variables




Question : Refer to the exhibit.
You are asked to write a report on how specific variables impact your client's sales using a data
set provided to you by the client. The data includes 15 variables that the client views as directly
related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1. Multicollinearity is not an issue among the variables
2. Only three variables-A, B, and C-have significant correlation with sales
You build a linear regression model on the dependent variable of sales with the independent
variables of A, B, and C. The results of the regression are seen in the exhibit.
Which interpretation is supported by the analysis?


 : Refer to the exhibit.
1. Variables A, B, and C are significantly impacting sales and are effectively estimating sales
2. Due to the R2 of 0.10, the model is not valid - the linear regression should be re-run with all 15
variables forced into the model to increase the R2
3. Access Mostly Uused Products by 50000+ Subscribers
4. Due to the R2 of 0.10, the model is not valid - a different analytical model should be attempted