Premium

Dell EMC Data Science and BigData Certification Questions and Answers



Question : Assume that you have a data frame in R. Which function would you use to display descriptive
statistics about this variable?

  : Assume that you have a data frame in R. Which function would you use to display descriptive
1. levels
2. attributes
3. Access Mostly Uused Products by 50000+ Subscribers
4. summary



Correct Answer : Get Lastest Questions and Answer :
Explanation: summary is a generic function used to produce result summaries of the results of various model fitting functions. The function invokes particular methods which depend on the class of the first
argument. Usage
summary(object, ...)
## Default S3 method:
summary(object, ..., digits = max(3, getOption("digits")-3))
## S3 method for class 'data.frame'
summary(object, maxsum = 7,
digits = max(3, getOption("digits")-3), ...)

## S3 method for class 'factor'
summary(object, maxsum = 100, ...)
## S3 method for class 'matrix'
summary(object, ...)
Arguments

object : an object for which a summary is desired.
maxsum : integer, indicating how many levels should be shown for factors.
digits : integer, used for number formatting with signif() (for summary.default) or format() (for summary.data.frame).
additional arguments affecting the summary produced.
Details : For factors, the frequency of the first maxsum - 1 most frequent levels is shown, and the less frequent levels are summarized in "(Others)" (resulting in at most maxsum frequencies). The functions summary.lm
and summary.glm are examples of particular methods which summarize the results produced by lm and glm.




Question : What is the mandatory Clause that must be included when using Window functions?
 : What is the mandatory Clause that must be included when using Window functions?
1. OVER
2. RANK
3. Access Mostly Uused Products by 50000+ Subscribers
4. RANK BY



Correct Answer : Get Lastest Questions and Answer s:
Explanation: A window function call always contains an OVER clause following the window function's name and argument(s). This is what syntactically distinguishes it from a regular function or aggregate
function. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that
share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.

Although avg will produce the same result no matter what order it processes the partition's rows in, this is not true of all window functions. When needed, you can control that order using ORDER BY within OVER. Here
is an example:

SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;








Question : What is the purpose of the process step "parsing" in text analysis?
  :  What is the purpose of the process step
1. computes the TF-IDF values for all keywords and indices
2. executes the clustering and classification to organize the contents
3. Access Mostly Uused Products by 50000+ Subscribers
4. imposes a structure on the unstructured/semi-structured text for downstream analysis


Correct Answer : Get Lastest Questions and Answer :
Explanation: Parsing is the process that takes unstructured text and imposes a structure for further
analysis. The unstructured text could be a plain text file, a weblog, an Extensible Markup
Language (XML) file, a HyperText Markup Language (HTML) file, or a Word document.
Parsing deconstructs the provided text and renders it in a more structured way for the
subsequent steps.



Related Questions


Question : You have been assigned to run a linear regression model for each of , distinct districts, and
all the data is currently stored in a PostgreSQL database. Which tool/library would you use to
produce these models with the least effort?

 : You have been assigned to run a linear regression model for each of ,  distinct districts, and
1. MADlib
2. Mahout
3. Access Mostly Uused Products by 50000+ Subscribers
4. HBase




Question : Your customer provided you with , unlabeled records and asked you to separate them into
three groups. What is the correct analytical method to use?


 : Your customer provided you with ,  unlabeled records and asked you to separate them into
1. Semi Linear Regression
2. Logistic regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Linear regression
5. K-means clustering


Question : You are performing a market basket analysis using the Apriori algorithm. Which measure is a ratio
describing the how many more times two items are present together than would be expected if
those two items are statistically independent?


  : You are performing a market basket analysis using the Apriori algorithm. Which measure is a ratio
1. Confidence
2. Support
3. Access Mostly Uused Products by 50000+ Subscribers
4. Lift




Question : In which lifecycle stage are appropriate analytical techniques determined?
  : In which lifecycle stage are appropriate analytical techniques determined?
1. Model planning
2. Model building
3. Access Mostly Uused Products by 50000+ Subscribers
4. Discovery



Question : What is Hadoop?
   : What is Hadoop?
1. Java classes for HDFS types and MapReduce job management and HDFS
2. Java classes for HDFS types and MapReduce job management and the MapReduce paradigm
3. Access Mostly Uused Products by 50000+ Subscribers
4. MapReduce paradigm and massive unstructured data storage on commodity hardware





Question : You are using k-means clustering to classify heart patients for a hospital. You have chosen Patient
Sex, Height, Weight, Age and Income as measures and have used 3 clusters. When you create a
pair-wise plot of the clusters, you notice that there is significant overlap between the clusters.
What should you do?


   : 	You are using k-means clustering to classify heart patients for a hospital. You have chosen Patient
1. Decrease the number of clusters
2. Increase the number of clusters
3. Access Mostly Uused Products by 50000+ Subscribers
4. Identify additional measures to add to the analysis