Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)

Question : In R, functions like plot() and hist() are known as what?

1. generic functions
2. virtual methods
3. Access Mostly Uused Products by 50000+ Subscribers
4. generic methods

Correct Answer : Get Lastest Questions and Answer :
Exp: You can create histograms with the function hist(x) where x is a numeric vector of values to be plotted. The option freq=FALSE plots probability densities instead of frequencies. The option breaks= controls the number of bins. Histograms are used very often in public health to show the distributions of your independent and dependent variables. Although the basic command for histograms (hist()) in R is simple, getting your histogram to look exactly like you want takes getting to know a few options of the plot. Here I present ways to customize your histogram for your needs.

First, I want to point out that ggplot2 is a package in R that does some amazing graphics, including histograms. I will do a post on ggplot2 in the coming year. However, the hist() function in base R is really easy and fast, and does the job for most of your histogram-ing needs. However, if you want to do complicated histograms, I would recommend reading up on ggplot2.

Okay so for our purposes today, instead of importing data, I'll create some normally distributed data myself. In R, you can generate normal data this way using the rnorm() function:

BMI<-rnorm(n=1000, m=24.2, sd=2.2)

So now we have some BMI data, and the basic histogram plot that comes out of R looks like this:

hist(BMI)

Question : Review the following code:
SELECT pn, vn, sum(prc*qty)
FROM sale
GROUP BY CUBE(pn, vn)
ORDER BY 1, 2, 3;
Which combination of subtotals do you expect to be returned by the query?

1. (pn, vn)
2. ( (pn, vn), (pn) )
3. Access Mostly Uused Products by 50000+ Subscribers
4. ( (pn, vn) , (pn), (vn) , ( ) )

Correct Answer : Get Lastest Questions and Answer :
Explanation: Queries that use the ROLLUP and CUBE operators generate some of the same result sets and perform some of the same calculations as OLAP applications. The CUBE operator generates a result set that can be used for cross tabulation reports. A ROLLUP operation can calculate the equivalent of an OLAP dimension or hierarchy.
In addition to the subtotals generated by the ROLLUP extension, the CUBE extension will generate subtotals for all combinations of the dimensions specified. If "n" is the number of columns listed in the CUBE, there will be 2n subtotal combinations.

SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value
FROM dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

FACT_1_ID FACT_2_ID SALES_VALUE
---------- ---------- -----------
1 1 4363.55
1 2 4794.76
1 3 4718.25
1 4 5387.45

Question : In MADlib what does MAD stand for?

1. Machine Learning, Algorithms for Databases
2. Mathematical Algorithms for Databases
3. Access Mostly Uused Products by 50000+ Subscribers
4. Modular, Accurate, Dependable

Correct Answer : Get Lastest Questions and Answer :
Explanation: MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.

The MADlib mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. There are a number of data analytics solutions that support the MapReduce principle and able to work with NoSQL databases. However, most enterprises still rely on mature SQL data stores and, therefore, need traditional analytics solutions to provide in-depth analysis of their business-critical data.

MADlib is a scalable in-database analytics library that features sophisticated mathematical algorithms for SQL-based systems. MADlib was developed jointly by researchers from UC Berkeley and engineers from Pivotal (formerly EMC/Greenplum). It can be considered as an enterprise alternative to Hadoop in machine learning, data mining, and statistics tasks. In addition, MADlib supports time series rows, which could not be processed appropriately by Hadoop, greatly extending capabilities for building prediction systems, Like Analyst First, MADlib benefits from the support of the good folk at EMC-Greenplum. In particular, EMC-Greenplum's Australian team is responsible for developing the support vector machines, Latent Dirichlet Allocation (a technique for doing topic discovery in unstructured text) and sparse vectors modules. It takes on "data warehousing" and "business intelligence" as outmoded, low-tech approaches to getting value out of Big Data. Instead, it advocates a "Magnetic, Agile, Deep" (MAD) approach to data, that shifts the locus of power from what Brian Dolan calls the "DBA priesthood" to the statisticians and analysts who actually like to crunch the numbers. This is a good thing, on many fronts.
It describes a state-of-the-art parallel data warehouse that sits on 800TB of disk, using 40 dual-processor dual-core Sun Thumper boxes.
It presents a set of general-purpose, hardcore, massively parallel statistical methods for big data. They're expressed in SQL (OMG!) but could be easily translated to MapReduce if that's your bag.

Related Questions

Question : Refer to the exhibit.
You have scored your Naive bayesian classifier model on a hold out test data for cross validation
and determined the way the samples scored and tabluated them as shown in the exhibit.
What are the Precision and Recall rate of the model?

1. Precision = 262/277
Recall = 262/288
2. Precision =262/288
Recall = 262/277
3. Access Mostly Uused Products by 50000+ Subscribers
Recall = 288/262
4. Precision = 288/262
Recall = 277/262

Question : Which ROC curve represents a perfect model fit?

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers
4. D

Question : Refer to the exhibit.
You have scored your Naive bayesian classifier model on a hold out test data for cross validation
and determined the way the samples scored and tabulated them as shown in the exhibit.
What are the the False Positive Rate (FPR) and the False Negative Rate (FNR) of the model?

1. FPR = 15/262
FNR = 26/288
2. FPR = 26/288
FNR = 15/262
3. Access Mostly Uused Products by 50000+ Subscribers
FNR = 288/26
4. FPR = 288/26
FNR = 262/15

Question : Refer to the exhibit.
Click on the calculator icon in the upper left corner. An analyst is searching a corpus of documents
for the topic "solid state disk". In the Exhibit, Table A provides the inverse document frequency for
each term across the corpus. Table B provides each term's frequency in four documents selected
from corpus. Which of the four documents is most relevant to the analyst's search?

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers
4. D

Question : Refer to the exhibit
Click on the calculator icon in the upper left corner. You are going into a meeting where you know
your manager will have a question on your dataset -- specifically relating to customers that are
classified as renters with good credit status.
In order to prepare for the meeting, you create a rule: RENTER => GOOD CREDIT. What is the
confidence of the rule?

1. 63%
2. 41%
3. Access Mostly Uused Products by 50000+ Subscribers
4. 73%

Question :

One can work with the naive Bayes model without accepting Bayesian probability

1. True
2. False