Dell EMC Data Science and BigData Certification Questions and Answers

Question : How does Pig's use of a schema differ from that of a traditional RDBMS?

1. Pig's schema requires that the data is physically present when the schema is defined
2. Pig's schema is required for ETL
3. Access Mostly Uused Products by 50000+ Subscribers
4. Pig's schema is optional

Correct Answer : Get Lastest Questions and Answer :
Explanation: The Schema class encapsulates the notion of a schema for a relational operator. A schema is a list of columns that describe the output of a relational operator. Each column in the relation is
represented as a FieldSchema, a static class inside the Schema. A column by definition has an alias, a type and a possible schema (if the column is a bag or a tuple). In addition, each column in the schema has a
unique auto generated name used for tracking the lineage of the column in a sequence of statements. The lineage of the column is tracked using a map of the predecessors' columns to the operators that generate the
predecessor columns. The predecessor columns are the columns required in order to generate the column under consideration. Similarly, a reverse lookup of operators that generate the predecessor column to the
predecessor column is maintained. Schemas enable you to assign names to fields and declare types for fields. Schemas are optional but we encourage you to use them whenever possible; type declarations result in better
parse-time error checking and more efficient code execution.

Schemas for simple types and complex types can be used anywhere a schema definition is appropriate.

Schemas are defined with the LOAD, STREAM, and FOREACH operators using the AS clause. If you define a schema using the LOAD operator, then it is the load function that enforces the schema (see LOAD and User Defined
Functions for more information).

Question : You are provided four different datasets. Initial analysis on these datasets show that they have
identical mean, variance and correlation values. What should your next step in the analysis be?

1. Select one of the four datasets and begin planning and building a model
2. Combine the data from all four of the datasets and begin planning and bulding a model
3. Access Mostly Uused Products by 50000+ Subscribers
4. Visualize the data to further explore the characteristics of each data set

Correct Answer : Get Lastest Questions and Answer :

Explanation:

Question : You are asked to create a model to predict the total number of monthly subscribers for a specific
magazine. You are provided with 1 year's worth of subscription and payment data, user
demographic data, and 10 years worth of content of the magazine (articles and pictures). Which
algorithm is the most appropriate for building a predictive model for subscribers?

1. TF-IDF
2. Linear regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Decision trees

Correct Answer : Get Lastest Questions and Answer :

Explanation: A data model explicitly describes a relationship between predictor and response variables. Linear regression fits a data model that is linear in the model coefficients. The most common type of linear regression
is a least-squares fit, which can fit both lines and polynomials, among other linear models.

Before you model the relationship between pairs of quantities, it is a good idea to perform correlation analysis to establish if a linear relationship exists between these quantities. Be aware that variables can have
nonlinear relationships, which correlation analysis cannot detect. For more information, see Linear Correlation.

If you need to fit data with a nonlinear model, transform the variables to make the relationship linear. Alternatively, try to fit a nonlinear function directly using either the Statistics and Machine Learning Toolbox
nlinfit function, the Optimization Toolbox lsqcurvefit function, or by applying functions in the Curve Fitting Toolbox.

Related Questions

Question : Which word or phrase completes the statement? Business Intelligence is to monitoring trends as
Data Science is to ________ trends.

1. Predicting
2. Discarding
3. Access Mostly Uused Products by 50000+ Subscribers
4. Optimizing

Question : Consider a scale that has five () values that range from "not important" to "very important". Which
data classification best describes this data?

1. Nominal
2. Real
3. Access Mostly Uused Products by 50000+ Subscribers
4. Ordinal

Question : Which key role for a successful analytic project can provide business domain expertise with a
deep understanding of the data and key performance indicators?

1. Business User
2. Project Sponsor
3. Access Mostly Uused Products by 50000+ Subscribers
4. Business Intelligence Analyst
5. None of above

Question : On analyzing your time series data you suspect that the data represented as
y1, y2, y3, ... , yn-1, yn
may have a trend component that is quadratic in nature. Which pattern of data will indicate that
the trend in the time series data is quadratic in nature?

1. (y4-y2) - (y3-y1) = ....= (yn-yn-2)-(yn-1-yn-3)

2. ((y2-y1) /y1 ) * 100% = ....((yn-yn-1)/yn-1) * 100%

3. Access Mostly Uused Products by 50000+ Subscribers

4. (y3-y2) - (y2-y1) = ....= (yn-yn-1)-(yn-1-yn-2)

Question : Which analytical method is considered unsupervised?

1. Naive Bayesian classifier

2. Decision tree
3. Access Mostly Uused Products by 50000+ Subscribers
4. K-means clustering

Question : You have used k-means clustering to classify behavior of , customers for a retail store.
You decide to use household income, age, gender and yearly purchase amount as measures. You
have chosen to use 8 clusters and notice that 2 clusters only have 3 customers assigned. What
should you do?

1. Decrease the number of measures used
2. Increase the number of clusters
3. Access Mostly Uused Products by 50000+ Subscribers
4. Identify additional measures to add to the analysis