Premium

Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)



Question : Which word or phrase completes the statement? A data warehouse is to a centralized database
for reporting as an analytic sandbox is to a _______?
 :  Which word or phrase completes the statement? A data warehouse is to a centralized database
1. Collection of data assets for modeling

2. Collection of low-volume databases
3. Centralized database of KPIs

4. Collection of data assets for ETL


Correct Answer : 1 Exp: Data Warehouse : A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to : Congregate data from multiple sources into a single database so a single query engine can be used to present data.
Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases.
Maintain data history, even if the source transaction systems do not.
Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
Present the organization's information consistently.
Provide a single common data model for all data of interest regardless of the data's source.
Restructure the data so that it makes sense to the business users.
Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems.
Add value to operational business applications, notably customer relationship management (CRM) systems.
Make decision-support queries easier to write. Centralized data containers in a purpose-built space
Supports BI and reporting, but restricts robust analyses
Analyst dependent on IT and DBAs for data access and schema
changes
Analysts must spend significant time to get aggregated and
disaggregated data extracts from multiple sources.







Question : You do a Students t-test to compare the average test scores of sample groups from populations A
and B. Group A averaged 10 points higher than group B. You find that this difference is significant,
with a p-value of 0.03. What does that mean?
 :  You do a Students t-test to compare the average test scores of sample groups from populations A
1. There is a 3% chance that you have identified a difference between the populations when in
reality there is none.
2. The difference in scores between a sample from population A and a sample from population B
will tend to be within 3% of 10 points.
3. There is a 3% chance that a sample group from population A will score 10 points higher that a
sample group from population B.
4. There is a 97% chance that a sample group from population A will score 10 points higher that a
sample group from population B.

Correct Answer : 1
Explanation: P values evaluate how well the sample data support the devil's advocate argument that the null hypothesis is true. It measures how compatible your data are with the null hypothesis. How likely is the effect observed in your sample data if the null hypothesis is true?
High P values: your data are likely with a true null.
Low P values: your data are unlikely with a true null.
A low P value suggests that your sample provides enough evidence that you can reject the null hypothesis for the entire population. How Do You Interpret P Values? VaccineIn technical terms, a P value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis. For example, suppose that a vaccine study produced a P value of 0.04. This P value indicates that if the vaccine had no effect, you'd obtain the observed difference or more in 4% of studies due to random sampling error. P values address only one question: how likely are your data, assuming a true null hypothesis? It does not measure support for the alternative hypothesis. This limitation leads us into the next section to cover a very common misinterpretation of P values. P Values Are NOT the Probability of Making a Mistake Incorrect interpretations of P values are very common. The most common mistake is to interpret a P value as the probability of making a mistake by rejecting a true null hypothesis (a Type I error). There are several reasons why P values can't be the error rate. First, P values are calculated based on the assumptions that the null is true for the population and that the difference in the sample is caused entirely by random chance. Consequently, P values can't tell you the probability that the null is true or false because it is 100% true from the perspective of the calculations. Second, while a low P value indicates that your data are unlikely assuming a true null, it can't evaluate which of two competing cases is more likely: The null is true but your sample was unusual.
The null is false.
Determining which case is more likely requires subject area knowledge and replicate studies. Let's go back to the vaccine study and compare the correct and incorrect way to interpret the P value of 0.04: Correct: Assuming that the vaccine had no effect, you'd obtain the observed difference or more in 4% of studies due to random sampling error.
Incorrect: If you reject the null hypothesis, there's a 4% chance that you're making a mistake. To see a graphical representation of how hypothesis tests work, see my post: Understanding Hypothesis Tests: Significance Levels and P Values. What Is the True Error Rate? Caution signThink that this interpretation difference is simply a matter of semantics, and only important to picky statisticians? Think again. It's important to you. If a P value is not the error rate, what the heck is the error rate? (Can you guess which way this is heading now?) Sellke et al.* have estimated the error rates associated with different P values. While the precise error rate depends on various assumptions (which I discuss here), the table summarizes them for middle-of-the-road assumptions.
P value
Probability of incorrectly rejecting a true null hypothesis
0.05
At least 23% (and typically close to 50%)
0.01
At least 7% (and typically close to 15%)
Do the higher error rates in this table surprise you? Unfortunately, the common misinterpretation of P values as the error rate creates the illusion of substantially more evidence against the null hypothesis than is justified. As you can see, if you base a decision on a single study with a P value near 0.05, the difference observed in the sample may not exist at the population level. That can be costly!




Question : What is one modeling or descriptive statistical function in MADlib that is typically not provided in a
standard relational database?
 :  What is one modeling or descriptive statistical function in MADlib that is typically not provided in a
1. Expected value
2. Variance
3. Linear regression

4. Quantiles


Correct Answer : 3

Explanation: Linear regression models a linear relationship of a scalar dependent variable y to one or more explanatory independent variables x to build a model of coefficients.


Related Questions


Question : Trend, seasonal, and cyclical are components of a time series. What is another component?

  :  Trend, seasonal, and cyclical are components of a time series. What is another component?
1. Irregular
2. Linear
3. Quadratic
4. Exponential



Question : You are studying the behavior of a population, and you are provided with multidimensional data at
the individual level. You have identified four specific individuals who are valuable to your study,
and would like to find all users who are most similar to each individual. Which algorithm is the
most appropriate for this study?
  :  You are studying the behavior of a population, and you are provided with multidimensional data at
1. Association rules
2. Decision trees
3. Linear regression
4. K-means clustering




Question : You are using MADlib for Linear Regression analysis. Which value does the statement return?
SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;

 : You are using MADlib for Linear Regression analysis. Which value does the statement return?
1. Coefficients
2. Standard error
3. Goodness of fit
4. P-value


Question : Which data asset is an example of quasi-structured data?


  : Which data asset is an example of quasi-structured data?
1. XML data file
2. Database table
3. News article
4. Webserver log


Question : What would be considered "Big Data"?

  : What would be considered
1. An OLAP Cube containing customer demographic information about 100, 000, 000 customers

2. Aggregated statistical data stored in a relational database table

3. Access Mostly Uused Products by 50000+ Subscribers

4. Spreadsheets containing monthly sales data for a Global 100 corporation



Question : A data scientist plans to classify the sentiment polarity of , product reviews collected from
the Internet. What is the most appropriate model to use? Suppose labeled training data is
available.


 : A data scientist plans to classify the sentiment polarity of ,  product reviews collected from
1. Linear regression

2. Logistic regression

3. Access Mostly Uused Products by 50000+ Subscribers
4. Naive Bayesian classifier