Cloudera Databricks Data Science Certification Questions and Answers (Dumps and Practice Questions)

Question : Let's say you have two cases as below for the movie ratings
1. You recommend to a user a movie with four stars and he really doesn't like it and he'd rate it two stars
2. You recommend a movie with three stars but the user loves it (he'd rate it five stars).
So which statement correctly applies?

1. In both cases, the contribution to the RMSE is the same
2. In both cases, the contribution to the RMSE is the different
3. In both cases, the contribution to the RMSE, could varies
4. None of the above

Correct Answer : 1

Explanation:

Question : The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the
differences between values predicted by a model or an estimator and the values actually observed.
RMSD is a useful metric for evaluating which types of models?

1. Logistic regression
2. Naive Bayes classifier
3. Linear regression
4. All of the above

Correct Answer : 3

Error calculation allows you to see how well a machine learning method is performing.
One way of determining this performance is to calculate a numerical error. This number is sometimes a percent,
however it can also be a score or distance. The goal is usually to minimize an error percent or distance,
however th goal may be to minimize or maximize a score. Encog supports the following error calculation methods.

Sum of Squares Error (ESS)
Root Mean Square Error (RMS)
Mean Square Error (MSE) (default)
SOM Error (Euclidean Distance Error)

RMSE measures error of a predicted numeric value, and so applies to contexts like regression and some recommender system techniques,
which rely on predicting a numeric value. It is not relevant to classification techniques
like logistic regression and Naive Bayes, which predict categorical values.
It also is not relevant to unsupervied techniques like clustering. RMSE is good for Linear regression and Recommendation system
The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the
differences between values predicted by a model or an estimator and the values actually observed. Basically,
the RMSD represents the sample standard deviation of the differences between predicted values and observed values.
These individual differences are called residuals when the calculations are performed over the data sample that was used for estimation,
and are called prediction errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes
of the errors in predictions for various times into a single measure of predictive power. RMSD is a good measure of accuracy,
but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent.

Question : Select the correct statement which applies to logistic regression

1. Computationally inexpensive, easy to implement, knowledge representation easy to interpret
2. May have low accuracy
3. Works with Numeric values
4. Only 1 and 3 are correct
5. All 1,2 and 3 are correct

Correct Answer : 5

Depending on the size of the data you are uploading, Amazon S3 offers the following options:

Logistic regression
Pros: Computationally inexpensive, easy to implement, knowledge representation easy to interpret
Cons: Prone to underfitting, may have low accuracy Works with: Numeric values, nominal values

Related Questions

Question : PCA analyzes the all the variance in the in the variables and reorganizes it into a new set of
components equal to the number of original variables. Regarding these new variables which of the following statement are correct?

1. They are independent
2. They decrease in the amount of variance in the originals they account for First component captures most of the variance, 2ndsecond most and so on until all the variance is accounted for
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only 1 and 3
5. All 1,2 and 3

Question : PCA is a parametric method of extracting relevant information form confusing data sets.

1. True
2. False

Question : In Supervised Learning you have performed the following steps
1. Determine the type of training examples
2. Gather a training set. The training set needs to be representative of the real-world use of the function
3. Access Mostly Uused Products by 50000+ Subscribers
4. Determine the structure of the learned function and corresponding learning algorithm,
5. Complete the design. Run the learning algorithm on the gathered training set
6.Evaluate the accuracy of the learned function.

In the 4th step which of the following algorithm you can apply

1. Support Vector Machine
2. Decision trees
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2 only
5. All 1,2 and 3

Question : Which of the following is/are superviswed learning algorithm

1. Logistic regression
2. Naive Bayes classifier
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only 1 and 2
5. All 1,2 and 3

Question : Which of the following is a unsupervided learning algorithms

1. K-means algorithm
2. k-nearest neighbor
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hierarchical clustering
5. Logistic regression

1. 1,2,3,4
2. 1,3,4,5
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,3,5
5. 2,3,4

Question : Select the correct algorithm which represent supervised learning?

1. PCA
2. SVD
3. Access Mostly Uused Products by 50000+ Subscribers
4. Logistic regression
5. None of the above