Premium

Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)



Question : In data visualization, which type of chart is recommended to represent frequency data?

 : In data visualization, which type of chart is recommended to represent frequency data?
1. Q-Q chart
2. Scatterplot
3. Access Mostly Uused Products by 50000+ Subscribers
4. Line chart



Correct Answer : Get Lastest Questions and Answer :

Explanation: One of the more commonly used pictorials in statistics is the frequency histogram, which in some ways is similar to a bar chart and tells how many items are in each numerical category. For example, suppose that after a garage sale, you want to determine which items were the most popular: the high-priced items, the low-priced items, and so forth. Let's say you sold a total of 32 items for the following prices: $1, $2, $2, $2, $5, $5, $5, $5, $7, $8, $10, $10, $10, $10, $11, $15, $15, $15, $19, $20, $21, $21, $25, $25, $29, $29, $29, $30, $30, $30, $35, and $35.

The items sold ranged in price from $1 to $35. First, divide this range of $1 to $35 into a number of categories, called class intervals. Typically, no fewer than 5 and no more than 20 class intervals work best for a frequency histogram.

Choose the first class interval to include your lowest (smallest value) data and make sure that no overlap exists so that one piece of data does not fall into two class intervals. For example, you would not have your first class interval be $1 to $5 and your second class interval be $5 to $10 because the four items that sold for $5 would belong in both the first and the second intervals. Instead, use $1 to $5 for the first interval and $6 to $10 for the second. Class intervals are mutually exclusive.

First, make a table of how your data is distributed (see Table 1). The number of observations that falls into each class interval is called the class frequency.


Note that each class interval has the same width. That is, $1 to $5 has a width of five dollars, inclusive; $6 to $10 has a width of five dollars, inclusive; $11 to $15 has a width of five dollars, inclusive; and so forth. From the data





Question : Which activity might be performed in the Operationalize phase of the Data Analytics Lifecycle?

 : Which activity might be performed in the Operationalize phase of the Data Analytics Lifecycle?
1. Try different analytical techniques
2. Try different variables
3. Access Mostly Uused Products by 50000+ Subscribers
4. Transform existing variables



Correct Answer : Get Lastest Questions and Answer :
Explanation: In the final phase, the team communicates the benefits of the project more broadly and sets up a pilot project to deploy the work in a controlled way before broadening the work to a full enterprise or ecosystem of users. In Phase 4, the team scored the model in the analytics sandbox. Phase 6,, represents the first time that most analytics teams approach deploying the new analytical methods or models in a production environment. Rather than deploying these models immediately on a wide-scale basis, the risk can be managed more effectively and the team can learn by undertaking a small scope, pilot deployment before a wide-scale rollout. This approach enables the team to learn about the performance and related constraints of the model in a production environment on a small scale and make adjustments before a full deployment. During the pilot project, the team may need to consider executing the algorithm in the database rather than with in-memory tools such as R because the run time is significantly faster and more efficient than running in-memory, especially on larger datasets.





Question : Refer to the exhibit.
You are asked to write a report on how specific variables impact your client's sales using a data
set provided to you by the client. The data includes 15 variables that the client views as directly
related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1. Multicollinearity is not an issue among the variables
2. Only three variables-A, B, and C-have significant correlation with sales
You build a linear regression model on the dependent variable of sales with the independent
variables of A, B, and C. The results of the regression are seen in the exhibit.
Which interpretation is supported by the analysis?


 : Refer to the exhibit.
1. Variables A, B, and C are significantly impacting sales and are effectively estimating sales
2. Due to the R2 of 0.10, the model is not valid - the linear regression should be re-run with all 15
variables forced into the model to increase the R2
3. Access Mostly Uused Products by 50000+ Subscribers
4. Due to the R2 of 0.10, the model is not valid - a different analytical model should be attempted



Correct Answer : Get Lastest Questions and Answer :

Explanation:



Related Questions


Question : Which word or phrase completes the statement? Unix is to bash as Hadoop is to:


  :   Which word or phrase completes the statement? Unix is to bash as Hadoop is to:
1. NameNode
2. Sqoop
3. HDFS
4. Flume
5. Pig



Question : A call center for a large electronics company handles an average of , support calls a day.
The head of the call center would like to optimize the staffing of the call center during the rollout of
a new product due to recent customer complaints of long wait times. You have been asked to
create a model to optimize call center costs and customer wait times.
The goals for this project include:
1. Relative to the release of a product, how does the call volume change over time?
2. How to best optimize staffing based on the call volume for the newly released product, relative
to old products.
3. Historically, what time of day does the call center need to be most heavily staffed?
4. Determine the frequency of calls by both product type and customer language.
Which goals are suitable to be completed with MapReduce?


  : A call center for a large electronics company handles an average of ,  support calls a day.
1. Goal 2 and 4
2. Goal 1 and 3
3. Goals 1, 2, 3, 4
4. Goals 2, 3, 4



Question : Consider the example of an analysis for fraud detection on credit card usage. You will need to
ensure higher-risk transactions that may indicate fraudulent credit card activity are retained in your
data for analysis, and not dropped as outliers during pre-processing. What will be your approach
for loading data into the analytical sandbox for this analysis?


  :   Consider the example of an analysis for fraud detection on credit card usage. You will need to
1. ETL
2. ELT
3. EDW
4. OLTP



Question : Trend, seasonal, and cyclical are components of a time series. What is another component?

  :  Trend, seasonal, and cyclical are components of a time series. What is another component?
1. Irregular
2. Linear
3. Quadratic
4. Exponential



Question : You are studying the behavior of a population, and you are provided with multidimensional data at
the individual level. You have identified four specific individuals who are valuable to your study,
and would like to find all users who are most similar to each individual. Which algorithm is the
most appropriate for this study?
  :  You are studying the behavior of a population, and you are provided with multidimensional data at
1. Association rules
2. Decision trees
3. Linear regression
4. K-means clustering




Question : You are using MADlib for Linear Regression analysis. Which value does the statement return?
SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;

 : You are using MADlib for Linear Regression analysis. Which value does the statement return?
1. Coefficients
2. Standard error
3. Goodness of fit
4. P-value