Cloudera Databricks Data Science Certification Questions and Answers (Dumps and Practice Questions)

Question : Of all the smokers in a particular district, % prefer brand A and % prefer brand B.
Of those smokers who prefer brand A, 30% are females, and of those who prefer brand B, 40% are female.
What is the probability that a randomly selected smoker prefers brand A, given that the person selected is a female?

Which of the following is a best way to solve this problem?

1. Bays Theorem
2. Poission Distribution
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above

Correct Answer : Get Lastest Questions and Answer :

Explanation:

Question : Google Adwords studies the number of men, and women, clicking the advertisement on search engine
during the midnight for an hour each day. Google find that the number of men that click can be modeled as a
random variable with distribution Poisson(X), and likewise the number of women that click as Poisson(Y).

What is likely to be the best model of the total number of advertisement clicks during the midnight for an hour ?

1. Binomial(X+Y,X+Y)
2. Poisson(X/Y)
3. Access Mostly Uused Products by 50000+ Subscribers
4. Poisson(X+Y)

Correct Answer : Get Lastest Questions and Answer :

Explanation: In probability theory and statistics, the Poisson distribution , named after French mathematician Simeon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.[1] The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

For instance, an individual keeping track of the amount of mail they receive each day may notice that they receive an average number of 4 letters per day. If receiving any particular piece of mail doesn't affect the arrival times of future pieces of mail, i.e., if pieces of mail from a wide range of sources arrive independently of one another, then a reasonable assumption is that the number of pieces of mail received per day obeys a Poisson distribution.[2] Other examples that may follow a Poisson: the number of phone calls received by a call center per hour, the number of decay events per second from a radioactive source, or the number of taxis passing a particular street corner per hour.The total number of clicks is the sum of the number of X and Y. The sum of two Poisson random variables also follows a Poisson distribution with rate equal to the sum of their rates. The Normal and Binomial distribution can approximate the Poisson distribution in certain cases, but the expressions above do not approximate Poisson(X+Y).

Question :
There are 5000 different color balls, out of which 1200 are pink color.
What is the maximum likelihood estimate for the proportion of "pink" items in the test set of color balls?

1. 2.4
2. 24
3. Access Mostly Uused Products by 50000+ Subscribers
4. .48
5. 4.8

Correct Answer : Get Lastest Questions and Answer :

Explanation: Given no additional information, the MLE for the probability of an item in the test set is exactly its frequency in the training set. The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For example, one may be interested in the heights of adult female penguins, but be unable to measure the height of every single penguin in a population due to cost or time constraints. Assuming that the heights are normally (Gaussian) distributed with some unknown mean and variance, the mean and variance can be estimated with MLE while only knowing the heights of some sample of the overall population. MLE would accomplish this by taking the mean and variance as parameters and finding particular parametric values that make the observed results the most probable (given the model).
In general, for a fixed set of data and underlying statistical model, the method of maximum likelihood selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the "agreement" of the selected model with the observed data, and for discrete random variables it indeed maximizes the probability of the observed data under the resulting distribution. Maximum-likelihood estimation gives a unified approach to estimation, which is well-defined in the case of the normal distribution and many other problems. However, in some complicated problems, difficulties do occur: in such problems, maximum-likelihood estimators are unsuitable or do not exist.

Related Questions

Question : In which of the scenario you can use the regression to predict the values

1. Samsung can use it for mobile sales forecast
2. Mobile companies can use it to forecast manufacturing defects
3. Probability of the celebrity divorce
4. Only 1 and 2
5. All 1 , 2 and 3

Question s: RMSE is a good measure of accuracy, but only to compare forecasting errors of different models for a ______, as it is scale-dependent

1. Between Variables
2. Particular Variable
3. Among all the variables
4. All of the above are correct

Question : You are creating a Classification process where input is the income, education and
current debt of a customer, what could be the possible output of this process.

1. Probability of the customer default on loan repayment
2. Percentage of the customer loan repayment capability
3. Percentage of the customer should be given loan or not
4. The output might be a risk class, such as "good", "acceptable", "average", or "unacceptable".
5. All of the above

Question : Let's say you have two cases as below for the movie ratings
1. You recommend to a user a movie with four stars and he really doesn't like it and he'd rate it two stars
2. You recommend a movie with three stars but the user loves it (he'd rate it five stars).
So which statement correctly applies?

1. In both cases, the contribution to the RMSE is the same
2. In both cases, the contribution to the RMSE is the different
3. In both cases, the contribution to the RMSE, could varies
4. None of the above

Question : The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the
differences between values predicted by a model or an estimator and the values actually observed.
RMSD is a useful metric for evaluating which types of models?

1. Logistic regression
2. Naive Bayes classifier
3. Linear regression
4. All of the above

Question : Select the correct statement which applies to logistic regression

1. Computationally inexpensive, easy to implement, knowledge representation easy to interpret
2. May have low accuracy
3. Works with Numeric values
4. Only 1 and 3 are correct
5. All 1,2 and 3 are correct