Dell EMC Data Science and BigData Certification Questions and Answers

Question : In R, functions like plot() and hist() are known as what?

1. generic functions
2. virtual methods
3. Access Mostly Uused Products by 50000+ Subscribers
4. generic methods

Correct Answer : Get Lastest Questions and Answer :
Exp: You can create histograms with the function hist(x) where x is a numeric vector of values to be plotted. The option freq=FALSE plots probability densities instead of frequencies. The option breaks= controls the
number of bins. Histograms are used very often in public health to show the distributions of your independent and dependent variables. Although the basic command for histograms (hist()) in R is simple, getting your
histogram to look exactly like you want takes getting to know a few options of the plot. Here I present ways to customize your histogram for your needs.

First, I want to point out that ggplot2 is a package in R that does some amazing graphics, including histograms. I will do a post on ggplot2 in the coming year. However, the hist() function in base R is really easy
and fast, and does the job for most of your histogram-ing needs. However, if you want to do complicated histograms, I would recommend reading up on ggplot2.

Okay so for our purposes today, instead of importing data, I'll create some normally distributed data myself. In R, you can generate normal data this way using the rnorm() function:

BMI<-rnorm(n=1000, m=24.2, sd=2.2)

So now we have some BMI data, and the basic histogram plot that comes out of R looks like this:

hist(BMI)

Question : Review the following code:
SELECT pn, vn, sum(prc*qty)
FROM sale
GROUP BY CUBE(pn, vn)
ORDER BY 1, 2, 3;
Which combination of subtotals do you expect to be returned by the query?

1. (pn, vn)
2. ( (pn, vn), (pn) )
3. Access Mostly Uused Products by 50000+ Subscribers
4. ( (pn, vn) , (pn), (vn) , ( ) )

Correct Answer : Get Lastest Questions and Answer :
Explanation: Queries that use the ROLLUP and CUBE operators generate some of the same result sets and perform some of the same calculations as OLAP applications. The CUBE operator generates a result set
that can be used for cross tabulation reports. A ROLLUP operation can calculate the equivalent of an OLAP dimension or hierarchy.
In addition to the subtotals generated by the ROLLUP extension, the CUBE extension will generate subtotals for all combinations of the dimensions specified. If "n" is the number of columns listed in the CUBE, there
will be 2n subtotal combinations.

SELECT fact_1_id,
fact_2_id,
SUM(sales_value) AS sales_value
FROM dimension_tab
GROUP BY CUBE (fact_1_id, fact_2_id)
ORDER BY fact_1_id, fact_2_id;

FACT_1_ID FACT_2_ID SALES_VALUE
---------- ---------- -----------
1 1 4363.55
1 2 4794.76
1 3 4718.25
1 4 5387.45

Question : In MADlib what does MAD stand for?

1. Machine Learning, Algorithms for Databases
2. Mathematical Algorithms for Databases
3. Access Mostly Uused Products by 50000+ Subscribers
4. Modular, Accurate, Dependable

Correct Answer : Get Lastest Questions and Answer :
Explanation: MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and
unstructured data.

The MADlib mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. There are a number of data analytics
solutions that support the MapReduce principle and able to work with NoSQL databases. However, most enterprises still rely on mature SQL data stores and, therefore, need traditional analytics solutions to provide
in-depth analysis of their business-critical data.

MADlib is a scalable in-database analytics library that features sophisticated mathematical algorithms for SQL-based systems. MADlib was developed jointly by researchers from UC Berkeley and engineers from Pivotal
(formerly EMC/Greenplum). It can be considered as an enterprise alternative to Hadoop in machine learning, data mining, and statistics tasks. In addition, MADlib supports time series rows, which could not be processed
appropriately by Hadoop, greatly extending capabilities for building prediction systems, Like Analyst First, MADlib benefits from the support of the good folk at EMC-Greenplum. In particular, EMC-Greenplum's
Australian team is responsible for developing the support vector machines, Latent Dirichlet Allocation (a technique for doing topic discovery in unstructured text) and sparse vectors modules. It takes on "data
warehousing" and "business intelligence" as outmoded, low-tech approaches to getting value out of Big Data. Instead, it advocates a "Magnetic, Agile, Deep" (MAD) approach to data, that shifts the locus of power from
what Brian Dolan calls the "DBA priesthood" to the statisticians and analysts who actually like to crunch the numbers. This is a good thing, on many fronts.
It describes a state-of-the-art parallel data warehouse that sits on 800TB of disk, using 40 dual-processor dual-core Sun Thumper boxes.
It presents a set of general-purpose, hardcore, massively parallel statistical methods for big data. They're expressed in SQL (OMG!) but could be easily translated to MapReduce if that's your bag.

Dell EMC Data Science and BigData Certification Questions and Answers

Related Questions