Dell EMC Data Science and BigData Certification Questions and Answers

Question : Trend, seasonal, and cyclical are components of a time series. What is another component?

1. Irregular
2. Linear
3. Access Mostly Uused Products by 50000+ Subscribers
4. Exponential

Correct Answer : Get Lastest Questions and Answer :

Explanation:

Question : You are studying the behavior of a population, and you are provided with multidimensional data at
the individual level. You have identified four specific individuals who are valuable to your study,
and would like to find all users who are most similar to each individual. Which algorithm is the
most appropriate for this study?

1. Association rules
2. Decision trees
3. Access Mostly Uused Products by 50000+ Subscribers
4. K-means clustering

Correct Answer : Get Lastest Questions and Answer :

Explanation: kmeans uses an iterative algorithm that minimizes the sum of distances from each object to its cluster centroid, over all clusters. This algorithm moves objects between clusters until the sum cannot be decreased
further. The result is a set of clusters that are as compact and well-separated as possible. You can control the details of the minimization using several optional input parameters to kmeans, including ones for the
initial values of the cluster centroids, and for the maximum number of iterations.
Clustering is primarily an exploratory technique to discover hidden structures of the data, possibly as a prelude to more focused analysis or decision processes. Some specific applications of k-means are image
processing, medical, and customer segmentation. Clustering is often used as a lead-in to classification. Once the clusters are identified,
labels can be applied to each cluster to classify each group based on its characteristics. Marketing and sales groups use k-means to better identify customers who have similar
behaviors and spending patterns.

Question : You are using MADlib for Linear Regression analysis. Which value does the statement return?
SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;

1. Coefficients
2. Standard error
3. Access Mostly Uused Products by 50000+ Subscribers
4. P-value

Correct Answer : Get Lastest Questions and Answer :
Explanation: Ordinary least-squares (OLS) linear regression refers to a stochastic model in which the conditional mean of the dependent variable (usually denoted ) is an affine function of the vector of
independent variables (usually denoted ). for some unknown vector of coefficients . The assumption is that the residuals are i.i.d. distributed Gaussians. That is, the (conditional) probability density of is given by
OLS linear regression finds the vector of coefficients that maximizes the likelihood of the observations.
Ordinary Least Squares Regression, also called Linear Regression, is a statistical model used to fit linear models.
It models a linear relationship of a scalar dependent variable to one or more explanatory independent variables to build a model of coefficients.
Training Function : linregr_train(source_table, out_table, dependent_varname, independent_varname, input_group_cols := NULL, heteroskedasticity_option := NULL)
source_table : Text value. The name of the table containing the training data.
out_table : Text value. Name of the generated table containing the output model.
dependent_varname : Text value. Expression to evaluate for the dependent variable.
independent_varname : Text value. Expression list to evaluate for the independent variables. An intercept variable is not assumed. It is common to provide an explicit intercept term by including a single constant 1
term in the independent variable list.
input_group_cols : Text value. An expression list used to group the input dataset into discrete groups, running one regression per group. Similar to the SQL GROUP BY clause. When this value is null, no grouping is
used and a single result model is generated. Default value: NULL.
heteroskedasticity_option : Boolean value. When True, the heteroskedacity of the model is also calculated and returned with the results. Default value: False.
Output Table : The output table produced by the linear regression training function contains the following columns.
Any grouping columns provided during training. Present only if the grouping option is used.
coef : Float array. Vector of the coefficients of the regression.
r2 : Float. R-squared coefficient of determination of the model.
std_err ": Float array. Vector of the standard error of the coefficients.
t_stats : Float array. Vector of the t-statistics of the coefficients.
p_values : Float array. Vector of the p-values of the coefficients.
condition_no : Float array. The condition number of the matrix. A high condition number is usually an indication that there may be some numeric instability in the result yielding a less reliable model. A high
condition number often results when there is a significant amount of colinearity in the underlying design matrix, in which case other regression techniques, such as elastic net regression, may be more appropriate.
bp_stats : Float. The Breush-Pagan statistic of heteroskedacity. Present only if the heteroskedacity argument was set to True when the model was trained. bp_p_value : Float. The Breush-Pagan calculated p-value.
Present only if the heteroskedacity parameter was set to True when the model was trained.

Related Questions

Question : Select the statement which applies correctlty to the Naive Bayes

1. Works with a small amount of data
2. Sensitive to how the input data is prepared
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of above

Question :

Select the correct statement which applies to Bayes rule

1. Bayesian probability and Bayes' rule gives us a way to estimate unknown probabilities from known values.
2. You can reduce the need for a lot of data by assuming conditional independence among the features in your data.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only 1 and 2
5. All 1,2 and 3 are correct

Question : Which of the following technique can be used to the design of recommender systems?

1. Naive Bayes classifier
2. Power iteration
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 3
5. 2 and 3

Question : You are working on a problem where you have to predict whether the claim is done valid or not.
And you find that most of the claims which are having spelling errors as well as corrections in the manually
filled claim forms compare to the honest claims. Which of the following technique is suitable to find
out whether the claim is valid or not?

1. Naive Bayes
2. Logistic Regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Any one of the above

Question : . Bayes' Theorem allows you to look at an event that has already happened and make an
educated guess about the chain of events that may have led up to that event

1. True
2. False

Question :

Scenario: Suppose that Bob can decide to go to work by one of three modes of transportation,
car, bus, or commuter train. Because of high traffic, if he decides to go by car, there is a 50%
chance he will be late. If he goes by bus, which has special reserved lanes but is sometimes overcrowded,
the probability of being late is only 20%. The commuter train is almost never late, with a probability of
only 1%, but is more expensive than the bus.

Question : Suppose that Bob is late one day, and his boss wishes to estimate the probability that he
drove to work that day by car. Since he does not know which mode of transportation Bob usually uses,
he gives a prior probability of 1 3 to each of the three possibilities. Which of the following method the
boss will use to estimate of the probability that Bob drove to work?

1. Naive Bayes
2. Linear regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above