Dell EMC Data Science and BigData Certification Questions and Answers

Question :

Logistic regression is a model used for prediction of the probability of occurrence of an event.
It makes use of several variables that may be___________

1. Numerical
2. Categorical
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the 1 and 2 are correct

Correct Answer : Get Lastest Questions and Answer :

Explanation: Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.

Question : Select the correct statement regarding the naive Bayes classification

1. it only requires a small amount of training data to estimate the parameters
2. Independent variables can be assumed
3. Access Mostly Uused Products by 50000+ Subscribers
4. for each class entire covariance matrix need to be determined

1. 1,2,3
2. 2,3,4
3. Access Mostly Uused Products by 50000+ Subscribers
4. 2,3,4

Correct Answer : Get Lastest Questions and Answer :

Explanation: An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables
are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.

Question : Spam filtering of the emails is an example of

1. Supervised learning
2. Unsupervised learning
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 3 are correct
5. 2 and 3 are correct

Correct Answer : Get Lastest Questions and Answer :

Explanation: Clustering is an example of unsupervised learning. The clustering algorithm finds groups within the data without being told what to look for upfront. This contrasts with classification, an example of supervised
machine learning, which is the process of determining to which class an observation belongs. A common application of classification is spam filtering. With spam filtering we use labeled data to train the classifier:
e-mails marked as spam or ham.

Related Questions

Question : In which lifecycle stage are appropriate analytical techniques determined?

1. Model planning
2. Model building
3. Access Mostly Uused Products by 50000+ Subscribers
4. Discovery

Question : What is Hadoop?

1. Java classes for HDFS types and MapReduce job management and HDFS
2. Java classes for HDFS types and MapReduce job management and the MapReduce paradigm
3. Access Mostly Uused Products by 50000+ Subscribers
4. MapReduce paradigm and massive unstructured data storage on commodity hardware

Question : You are using k-means clustering to classify heart patients for a hospital. You have chosen Patient
Sex, Height, Weight, Age and Income as measures and have used 3 clusters. When you create a
pair-wise plot of the clusters, you notice that there is significant overlap between the clusters.
What should you do?

1. Decrease the number of clusters
2. Increase the number of clusters
3. Access Mostly Uused Products by 50000+ Subscribers
4. Identify additional measures to add to the analysis

Question : How does Pig's use of a schema differ from that of a traditional RDBMS?

1. Pig's schema requires that the data is physically present when the schema is defined
2. Pig's schema is required for ETL
3. Access Mostly Uused Products by 50000+ Subscribers
4. Pig's schema is optional

Question : You are provided four different datasets. Initial analysis on these datasets show that they have
identical mean, variance and correlation values. What should your next step in the analysis be?

1. Select one of the four datasets and begin planning and building a model
2. Combine the data from all four of the datasets and begin planning and bulding a model
3. Access Mostly Uused Products by 50000+ Subscribers
4. Visualize the data to further explore the characteristics of each data set

Question : You are asked to create a model to predict the total number of monthly subscribers for a specific
magazine. You are provided with 1 year's worth of subscription and payment data, user
demographic data, and 10 years worth of content of the magazine (articles and pictures). Which
algorithm is the most appropriate for building a predictive model for subscribers?

1. TF-IDF
2. Linear regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Decision trees