Dell EMC Data Science and BigData Certification Questions and Answers

Question : What is an appropriate data visualization to use in a presentation for a project sponsor?

1. Box and Whisker plot
2. Pie chart
3. Access Mostly Uused Products by 50000+ Subscribers
4. Density plot

Correct Answer : Get Lastest Questions and Answer :

Explanation: Project Sponsor: Responsible for the genesis of the project. Provides the impetus and requirements for the project and defines the core business problem. Generally provides the funding and gauges the degree of
value from the final outputs of the working team. This person sets the priorities for the project and clarifies the desired outputs. Because the presentation is often circulated within an organization, it is critical
to articulate the results properly and position the findings in a way that is appropriate for the audience. Presentation for project sponsors: This contains high-level takeaways for executive level stakeholders, with
a few key messages to aid their decision-making rocess. Focus on clean, easy visuals for the presenter to explain and for the viewer to grasp. Project Sponsor: Responsible for the genesis of the project. Provides the
impetus and requirements for the project and defines the core business problem. Generally provides the funding and gauges the degree of value from the final outputs of the working team. This person sets the priorities
for the project and clarifies the desired outputs. Because the presentation is often circulated within an organization, it is critical to articulate the results properly and position the findings in a way that is
appropriate for the audience. Presentation for project sponsors: This contains high-level takeaways for executive level stakeholders, with a few key messages to aid their decision-making rocess. Focus on clean, easy
visuals for the presenter to explain and for the viewer to grasp.

Question : In a Student's t-test, what is the meaning of the p-value?

1. it is the "power" of the Student's t-test
2. it is the mean of the distribution for the null hypothesis
3. Access Mostly Uused Products by 50000+ Subscribers
4. it is the area under the appropriate tails of the Student's distribution

Correct Answer : Get Lastest Questions and Answer :
Explanation: The P value is used all over statistics, from t-tests to regression analysis. Everyone knows that you use P values to determine statistical significance in a hypothesis test. In fact, P values
often determine what studies get published and what projects get funding.

Despite being so important, the P value is a slippery concept that people often interpret incorrectly. How do you interpret P values?

In this post, I'll help you to understand P values in a more intuitive way and to avoid a very common misinterpretation that can cost you money and credibility.
P values evaluate how well the sample data support the devil's advocate argument that the null hypothesis is true. It measures how compatible your data are with the null hypothesis. How likely is the effect observed
in your sample data if the null hypothesis is true?

High P values: your data are likely with a true null.
Low P values: your data are unlikely with a true null.
A low P value suggests that your sample provides enough evidence that you can reject the null hypothesis for the entire population. In technical terms, a P value is the probability of obtaining an effect at least as
extreme as the one in your sample data, assuming the truth of the null hypothesis.

For example, suppose that a vaccine study produced a P value of 0.04. This P value indicates that if the vaccine had no effect, you'd obtain the observed difference or more in 4% of studies due to random sampling error.

P values address only one question: how likely are your data, assuming a true null hypothesis? It does not measure support for the alternative hypothesis. This limitation leads us into the next section to cover a very
common misinterpretation of P values.P Values Are NOT the Probability of Making a Mistake

Incorrect interpretations of P values are very common. The most common mistake is to interpret a P value as the probability of making a mistake by rejecting a true null hypothesis (a Type I error).

There are several reasons why P values can't be the error rate. First, P values are calculated based on the assumptions that the null is true for the population and that the difference in the sample is caused entirely
by random chance. Consequently, P values can't tell you the probability that the null is true or false because it is 100% true from the perspective of the calculations. Second, while a low P value indicates that your
data are unlikely assuming a true null, it can't evaluate which of two competing cases is more likely:
The null is true but your sample was unusual.
The null is false.
Determining which case is more likely requires subject area knowledge and replicate studies.
Let's go back to the vaccine study and compare the correct and incorrect way to interpret the P value of 0.04:
Correct: Assuming that the vaccine had no effect, you'd obtain the observed difference or more in 4% of studies due to random sampling error.
Incorrect: If you reject the null hypothesis, there's a 4% chance that you're making a mistake.

Question : In addition to less data movement and the ability to use larger datasets in calculations, what is a
benefit of analytical calculations in a database?

1. improved connections between disparate data sources
2. more efficient handling of categorical values
3. Access Mostly Uused Products by 50000+ Subscribers
4. full use of data aggregation functionality

Correct Answer : Get Lastest Questions and Answer :

Explanation: Online Analytical Processing (OLAP) databases facilitate business-intelligence queries. OLAP is a database technology that has been optimized for querying and reporting, instead of processing transactions. The
source data for OLAP is Online Transactional Processing (OLTP) databases that are commonly stored in data warehouses. OLAP data is derived from this historical data, and aggregated into structures that permit
sophisticated analysis. OLAP data is also organized hierarchically and stored in cubes instead of tables. It is a sophisticated technology that uses multidimensional structures to provide rapid access to data for
analysis. This organization makes it easy for a PivotTable report or PivotChart report to display high-level summaries, such as sales totals across an entire country or region, and also display the details for sites
where sales are particularly strong or weak.

OLAP databases are designed to speed up the retrieval of data. Because the OLAP server, rather than Microsoft Office Excel, computes the summarized values, less data needs to be sent to Excel when you create or change
a report. This approach enables you to work with much larger amounts of source data than you could if the data were organized in a traditional database, where Excel retrieves all of the individual records and then
calculates the summarized values.

OLAP databases contain two basic types of data: measures, which are numeric data, the quantities and averages that you use to make informed business decisions, and dimensions, which are the categories that you use to
organize these measures. OLAP databases help organize data by many levels of detail, using the same categories that you are familiar with to analyze the data.

The following sections describe each of these components in more detail:
Cube A data structure that aggregates the measures by the levels and hierarchies of each of the dimensions that you want to analyze. Cubes combine several dimensions, such as time, geography, and product lines,
with summarized data, such as sales or inventory figures. Cubes are not "cubes" in the strictly mathematical sense because they do not necessarily have equal sides. However, they are an apt metaphor for a complex
concept.
Measure A set of values in a cube that are based on a column in the cube's fact table and that are usually numeric values. Measures are the central values in the cube that are preprocessed, aggregated, and
analyzed. Common examples include sales, profits, revenues, and costs.
Member An item in a hierarchy representing one or more occurrences of data. A member can be either unique or nonunique. For example, 2007 and 2008 represent unique members in the year level of a time dimension,
whereas January represents nonunique members in the month level because there can be more than one January in the time dimension if it contains data for more than one year.

Calculated member A member of a dimension whose value is calculated at run time by using an expression. Calculated member values may be derived from other members' values. For example, a calculated member, Profit,
can be determined by subtracting the value of the member, Costs, from the value of the member, Sales.

Dimension A set of one or more organized hierarchies of levels in a cube that a user understands and uses as the base for data analysis. For example, a geography dimension might include levels for Country/Region,
State/Province, and City. Or, a time dimension might include a hierarchy with levels for year, quarter, month, and day. In a PivotTable report or PivotChart report, each hierarchy becomes a set of fields that you can
expand and collapse to reveal lower or higher levels.

Hierarchy A logical tree structure that organizes the members of a dimension such that each member has one parent member and zero or more child members. A child is a member in the next lower level in a hierarchy
that is directly related to the current member. For example, in a Time hierarchy containing the levels Quarter, Month, and Day, January is a child of Qtr1. A parent is a member in the next higher level in a hierarchy
that is directly related to the current member. The parent value is usually a consolidation of the values of all of its children. For example, in a Time hierarchy that contains the levels Quarter, Month, and Day, Qtr1
is the parent of January.
Level Within a hierarchy, data can be organized into lower and higher levels of detail, such as Year, Quarter, Month, and Day levels in a Time hierarchy.

Related Questions

Question : In which of the following cases you can use the K-means clustering?
A. Image Processing
B. Customer Segmentation
C. Classification of the plants
D. Reducing the customer churn rate

1. A,B
2. B,C
3. A,B,C
4. B,C,D
5. A,B,C,D

Question : You want to apply the K-Means clustering on the total number of objects which are M, total attributes on each object are n. You need to create groups or clusters. What would be matrix dimension you
would be using to store all the objects attributes?

1. MXn

2. MX5

3. nX5

4. 5X5

5. MXM

Question : You have data of , people who make the purchasing from a specific grocery store. You also have their income detail in the data. You have created clusters using this data. But in one of the cluster
you see that only 30 people are falling as below
30, 2400, 2600, 2700, 2270 etc.
What would you do in this case?

1. You will be increasing number of clusters.

2. You will be decreasing the number of clusters.

3. You will remove that 30 people from dataset

4. You will be multiplying standard deviation with the 100.

Question : You are working with the Clustering solution of the customer datasets. There are almost variables are available for each customer and almost ,, customer's data is available. You want to reduce
the number of variables for clustering, what would you do?
A. You will randomly reduce the number of variables
B. You will find the correlation among the variables and from their variables are not co-related will be discarded.
C. You will find the correlation among the variables and from the highly co-related variables, you will be considering only one or two variables from it.
D. You cannot discard any variable for creating clusters.
E. You can combine several variables in one variable

1. A,B
2. B,D
3. C,D
4. C,E
5. A,E

Question : You are having patients' data with the height and age. Where age in years and height in meters. You wanted to create cluster using this two attributes. You wanted to have near equal effect for both
the age and height while creating the cluster. What you can do?
A. You will be adding height with the numeric value 100
B. You will be converting each height value to centimeters
C. You will be dividing both age and height with their respective standard deviation
D. You will be taking square root of height

1. A,B
2. B,C
3. C,D
4. A,D
5. B,D

Question : Which of the following true with regards to the K-Means clustering algorithm?
A. Labels are not pre-assigned to each objects in the cluster.
B. Labels are pre-assigned to each objects in the cluster.
C. It classify the data based on the labels.
D. It discovers the center of each cluster.
E. It find each objects fall in which particular cluster

1. A,B,C
2. B,C,D
3. C,D,E
4. A,D,E
5. A,C,E