Dell EMC Data Science and BigData Certification Questions and Answers

Question : A data scientist wants to predict the probability of death from heart disease based on three risk
factors: age, gender, and blood cholesterol level.
What is the most appropriate method for this project?

1. Linear regression
2. K-means clustering
3. Access Mostly Uused Products by 50000+ Subscribers
4. Apriori algorithm

Correct Answer : Get Lastest Questions and Answer :
Explanation: Logistic regression is used widely in many fields, including the medical and social sciences. For example, the Trauma and Injury Severity Score (TRISS), which is widely used to predict
mortality in injured patients, was originally developed by Boyd et al. using logistic regression. Many other medical scales used to assess severity of a patient have been developed using logistic regression. Logistic
regression may be used to predict whether a patient has a given disease (e.g. diabetes; coronary heart disease), based on observed characteristics of the patient (age, sex, body mass index, results of various blood
tests, etc.; age, blood cholesterol level, systolic blood pressure, relative weight, blood hemoglobin level, smoking (at 3 levels), and abnormal electrocardiogram.).Another example might be to predict whether an
American voter will vote Democratic or Republican, based on age, income, sex, race, state of residence, votes in previous elections, etc. The technique can also be used in engineering, especially for predicting the
probability of failure of a given process, system or product. It is also used in marketing applications such as prediction of a customer's propensity to purchase a product or halt a subscription, etc.[citation needed]
In economics it can be used to predict the likelihood of a person's choosing to be in the labor force, and a business application would be to predict the likelihood of a homeowner defaulting on a mortgage. Conditional
random fields, an extension of logistic regression to sequential data, are used in natural language processing.

Question : What are the characteristics of Big Data?

1. Data type, processing complexity, and data structure variety.
2. Data volume, business importance, and data structure variety.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Data volume, processing complexity, and business importance

Correct Answer : Get Lastest Questions and Answer :

Explanation: Three attributes stand out as defining Big Data characteristics:
Huge volume of data: Rather than thousands or millions of rows, Big Data can be billions of rows and millions of columns.
Complexity of data types and structures: Big Data reflects the variety of new data sources, formats, and structures, including digital traces being left on the web and
other digital repositories for subsequent analysis.
Speed of new data creation and growth: Big Data can describe high velocity data, with rapid data ingestion and near real time analysis.

Question : You are analyzing data in order to build a classifier model. You discover non-linear data and
discontinuities that will affect the model. Which analytical method would you recommend?

1. Logistic Regression
2. Decision Trees
3. Access Mostly Uused Products by 50000+ Subscribers
4. ARIMA

Correct Answer : Get Lastest Questions and Answer :

Explanation: A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test
and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represents classification rules.

In decision analysis a decision tree and the closely related influence diagram are used as a visual and analytical decision support tool, where the expected values (or expected utility) of competing alternatives are
calculated.

A decision tree consists of 3 types of nodes:

Decision nodes - commonly represented by squares
Chance nodes - represented by circles
End nodes - represented by triangles
Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. If in practice decisions have to be taken online with no recall
under incomplete knowledge, a decision tree should be paralleled by a probability model as a best choice model or online selection model algorithm. Another use of decision trees is as a descriptive means for
calculating conditional probabilities.

Decision trees, influence diagrams, utility functions, and other decision analysis tools and methods are taught to undergraduate students in schools of business, health economics, and public health, and are examples
of operations research or management science methods.

Related Questions

Question : You have been given two population HEPop and HEPop, you need to do Hypothesis testing on this data to find that they are equal or not. However, you cannot assume that data is normally distributed. Which
of the following test would help?

1. Use Welch t-test

2. Use Student t-test

3. Use Teacher t-test

4. Use Wilcoxon rank sum test

Question : You are conducting a Hypothesis test and Null Hypothesis is true. But you have rejected that Null Hypothesis, what type of this error?

1. Type-I Error

2. Type-II Error

3. Type-III Error

4. Type-IV Error

5. There is no error

Question : You are conducting a Hypothesis test for two populations HEPop and HEPop. Which of the following statements are correct with regards to the Power and Sample Size?
A. The power of a test is the probability of correctly rejecting the null hypothesis
B. The power of a test is the probability of correctly accepting the null hypothesis
C. It is represented as (1-Probability of Type two Error)
D. Power of a test improves when the sample size increases.

1. A,B
2. A,C,D
3. A,B,C
4. B,C,D
5. A,B,C,D

Question : There are students who subscribed for the training materials from an Educational Portal and then appear for the final exam. Portal provides three means of preparing for the exam as below
1. Prepare using Books
2. Prepare using Recorded Video Trainings
. Prepare using Sample Practice Questions and Study Notes
You divide 90 students in three groups as below
Group-1: Is using only Books for exam preparation
Group-2: Is using only recorded video trainings for exam preparation
Group-: Is using only Practice Questions for the exam preparation
Which of the following Hypothesis test you can use in this scenario to compare their exam scores to find that which of the exam preparation technique is more effective?

1. You will be using Student t-test

2. You will be using Welch's t-test

3. You will be using Wilcoxon sun test

4. You will be using ANOVA

5. You would be applying 3student's t-tests, by creating three pairs

Question : Which of the following is true about the clustering?
A. It is a supervised learning
B. It is a unsupervised learning
C. This technique can be used to finding hidden structure within the labelled data
D. Dividing employees in three groups based on their salary is an example of Clustering

1. A,B
2. B,C
3. B,C,D
4. A,B,D
5. A,B,C,D

Question : You are working in a data analytics company as a data scientist, you have been given a set of various types of Pizzas available across various premium food centers in a country. This data is given as
numeric values like Calorie, Size, and Sale per day etc. You need to group all the pizzas with the similar properties, which of the following technique you would be using for that?

1. Association Rules

2. Naive Bayes Classifier

3. K-means Clustering

4. Linear Regression

5. Grouping