Dell EMC Data Science and BigData Certification Questions and Answers

Question : Which word or phrase completes the statement?
Business Intelligence is to ad-hoc reporting and dashboards as Data Science is to
______________ .

1. Alerts and Queries
2. Structured Data and Data Sources
3. Access Mostly Uused Products by 50000+ Subscribers
4. Sales and profit reporting

Correct Answer : Get Lastest Questions and Answer :

Explanation: Data science skills are of vital and growing importance in commercial, governmental and not-for-profit organisations. Those in the management, risk, customer and IT functions increasingly need skills and/or
literacy in this area. This course introduces a range of data mining tools and techniques as they are commonly used in business.

Question : What is a property of window functions in SQL commands?

1. They can be used to calculate moving averages over various intervals.
2. They group rows into a single output row.
3. Access Mostly Uused Products by 50000+ Subscribers
4. They don't require ordering of data within a window.

Correct Answer : Get Lastest Questions and Answer :

Explanation: In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving
mean (MM)[1] or rolling mean and is a type of finite impulse response filter. Variations include: simple, and cumulative, or weighted forms (described below).

Given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series. Then the subset is modified by "shifting
forward"; that is, excluding the first number of the series and including the next number following the original subset in the series. This creates a new subset of numbers, which is averaged. This process is repeated
over the entire data series. The plot line connecting all the (fixed) averages is the moving average. A moving average is a set of numbers, each of which is the average of the corresponding subset of a larger set of
datum points. A moving average may also use unequal weights for each datum value in the subset to emphasize particular values in the subset.

A moving average is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles. The threshold between short-term and long-term depends on the application, and
the parameters of the moving average will be set accordingly. For example, it is often used in technical analysis of financial data, like stock prices, returns or trading volumes. It is also used in economics to
examine gross domestic product, employment or other macroeconomic time series. Mathematically, a moving average is a type of convolution and so it can be viewed as an example of a low-pass filter used in signal
processing. When used with non-time series data, a moving average filters higher frequency components without any specific connection to time, although typically some kind of ordering is implied. Viewed simplistically
it can be regarded as smoothing the data.

Question : You are attempting to find the Euclidean distance between two centroids:
Centroid A's coordinates: (X = 2, Y = 4)
Centroid B's coordinates (X = 8, Y = 10)
Which formula finds the correct Euclidean distance?

1. ((2-8)2+(4-10)2) or 72
2. SQRT(((2-8) x 2) + ((4-10) x 2)) or 12.17
3. Access Mostly Uused Products by 50000+ Subscribers
4. SQRT((2-8)2+(4-10)2) or 8.49

Correct Answer : Get Lastest Questions and Answer :

Explanation: In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" (i.e straight line) distance between two points in Euclidean space. With this distance, Euclidean space becomes a metric space. The
associated norm is called the Euclidean norm. Older literature refers to the metric as Pythagorean metric.
Very often, especially when measuring the distance in the plane, we use the formula for the Euclidean distance. According to the Euclidean distance formula, the distance between two points in the plane with
coordinates (x, y) and (a, b) is given by

dist((x, y), (a, b)) = sqrt{(x - a)*(x-a)+ (y - b)*(y-b)}

Related Questions

Question : Refer to exhibit

You are asked to write a report on how specific variables impact your client's sales using a data
set provided to you by the client. The data includes 15 variables that the client views as directly
related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1. Multicollinearity is not an issue among the variables
2. Only three variables-A, B, and C-have significant correlation with sales
You build a linear regression model on the dependent variable of sales with the independent
variables of A, B, and C. The results of the regression are seen in the exhibit.
You cannot request additional datA. what is a way that you could try to increase the R2 of the
model without artificially inflating it?

1. Create clusters based on the data and use them as model inputs
2. Force all 15 variables into the model as independent variables
3. Access Mostly Uused Products by 50000+ Subscribers
4. Break variables A, B, and C into their own univariate models

Question : You have two tables of customers in your database. Customers in cust_table_ were sent an email
promotion last year, and customers in cust_table_2 received a newsletter last year.
Customers can only be entered in once per table. You want to create a table that includes all
customers, and any of the communications they received last year. Which type of join would you
use for this table?

1. Full outer join
2. Inner join
3. Access Mostly Uused Products by 50000+ Subscribers
4. Cross join

Question : In which lifecycle stage are initial hypotheses formed?

1. Model planning
2. Discovery
3. Access Mostly Uused Products by 50000+ Subscribers
4. Data preparation

Question : You are given , , user profile pages of an online dating site in XML files, and they are
stored in HDFS. You are assigned to divide the users into groups based on the content of their
profiles. You have been instructed to try K-means clustering on this data. How should you
proceed?

1. Divide the data into sets of 1, 000 user profiles, and run K-means clustering in RHadoop iteratively.
2. Run MapReduce to transform the data, and find relevant key value pairs.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Partition the data by XML file size, and run K-means clustering in each partition.

Question : The Marketing department of your company wishes to track opinion on a new product that was
recently introduced. Marketing would like to know how many positive and negative reviews are
appearing over a given period and potentially retrieve each review for more in-depth insight.
They have identified several popular product review blogs that historically have published
thousands of user reviews of your company's products.

You have been asked to provide the desired analysis. You examine the RSS feeds for each blog
and determine which fields are relevant. You then craft a regular expression to match your new
product's name and extract the relevant text from each matching review.
What is the next step you should take?

1. Use the extracted text and your regular expression to perform a sentiment analysis based on mentions of the new product
2. Convert the extracted text into a suitable document representation and index into a review corpus
3. Access Mostly Uused Products by 50000+ Subscribers
4. Group the reviews using Naive Bayesian classification

Question : Which word or phrase completes the statement? A Data Scientist would consider that a RDBMS is
to a Table as R is to a ______________ .

1. List
2. Matrix
3. Access Mostly Uused Products by 50000+ Subscribers
4. Array