Dell EMC Data Science and BigData Certification Questions and Answers

Question : You have been given a huge datasets with the following occurrences
Bread is 80% of the time in all transactions, combination of bread and milk is 60% of the time in all transactions. Which of the following statement is correct with regards to Apriori?

1. Support for {bread} is 0.8

2. Support for {bread} is 0.6

3. Support for {bread} is 1.4

4. Support for {bread} is 0.2

Correct Answer : Get Lastest Questions and Answer :
Explanation: As bread occurs 80% of the time in all the transactions, hence you can say that support for bread is 0.8, similarly combination of {bread, milk} is 60% time, so we can say that support is 0.6
for combination of {bread, milk}

Question : For Apriori algorithm you have decided that minimum support value is ., which of the following are frequent itemsets, if following percentage occurrences are given?
Bread->80%
Milk->70%
Bread,Milk -> 55%
Bread, Banana -> 30%
A. Bread
B. Milk
C. Bread, Milk
D. Banana
E. Bread, Banana

1. A,B,C
2. B,C,D
3. C,D,E
4. A,D,E
5. A,C,E

Correct Answer : Get Lastest Questions and Answer :
Explanation: A frequent itemset has items that appear together often enough. The term "often enough" is formally defined with a minimum support criterion. Suppose minimum support is 0.5 then any itemset
appear more than 0.5 are considered frequent itemsets. Hence Bread, Milk and combination of this Bread and Milk are considered frequent dataset.

Question : You have been given combination of three item sets as {a,b,c} are having . support and minimum support is defined as .. So which of the following statement is correct?
A. Combination of {a,b} are frequent item sets
B. Combination of {b,c} are frequent item sets
C. Combination of {a,c} are frequent item sets
D. Item {a} is a frequent dataset
E. Item {c} is a frequent dataset

1. A,B,C
2. B,C,D
3. B,C,D,E
4. A,B,C,D
5. A,B,C,D,E

Correct Answer : Get Lastest Questions and Answer :
Explanation: : In the question it is given that itemset {a,b,c} has 0.8 support, which is higher than 0.7 so it is a frequent item set. Similarly any subset of frequent dataset is also a frequent itemset.

Related Questions

Question : You are given , , user profile pages of an online dating site in XML files, and they are
stored in HDFS. You are assigned to divide the users into groups based on the content of their
profiles. You have been instructed to try K-means clustering on this data. How should you
proceed?

1. Divide the data into sets of 1, 000 user profiles, and run K-means clustering in RHadoop iteratively.
2. Run MapReduce to transform the data, and find relevant key value pairs.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Partition the data by XML file size, and run K-means clustering in each partition.

Question : The Marketing department of your company wishes to track opinion on a new product that was
recently introduced. Marketing would like to know how many positive and negative reviews are
appearing over a given period and potentially retrieve each review for more in-depth insight.
They have identified several popular product review blogs that historically have published
thousands of user reviews of your company's products.

You have been asked to provide the desired analysis. You examine the RSS feeds for each blog
and determine which fields are relevant. You then craft a regular expression to match your new
product's name and extract the relevant text from each matching review.
What is the next step you should take?

1. Use the extracted text and your regular expression to perform a sentiment analysis based on mentions of the new product
2. Convert the extracted text into a suitable document representation and index into a review corpus
3. Access Mostly Uused Products by 50000+ Subscribers
4. Group the reviews using Naive Bayesian classification

Question : Which word or phrase completes the statement? A Data Scientist would consider that a RDBMS is
to a Table as R is to a ______________ .

1. List
2. Matrix
3. Access Mostly Uused Products by 50000+ Subscribers
4. Array

Question : Which word or phrase completes the statement? Unix is to bash as Hadoop is to:

1. NameNode
2. Sqoop
3. Access Mostly Uused Products by 50000+ Subscribers
4. Flume
5. Pig

Question : A call center for a large electronics company handles an average of , support calls a day.
The head of the call center would like to optimize the staffing of the call center during the rollout of
a new product due to recent customer complaints of long wait times. You have been asked to
create a model to optimize call center costs and customer wait times.
The goals for this project include:
1. Relative to the release of a product, how does the call volume change over time?
2. How to best optimize staffing based on the call volume for the newly released product, relative
to old products.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Determine the frequency of calls by both product type and customer language.
Which goals are suitable to be completed with MapReduce?

1. Goal 2 and 4
2. Goal 1 and 3
3. Access Mostly Uused Products by 50000+ Subscribers
4. Goals 2, 3, 4

Question : Consider the example of an analysis for fraud detection on credit card usage. You will need to
ensure higher-risk transactions that may indicate fraudulent credit card activity are retained in your
data for analysis, and not dropped as outliers during pre-processing. What will be your approach
for loading data into the analytical sandbox for this analysis?

1. ETL
2. ELT
3. Access Mostly Uused Products by 50000+ Subscribers
4. OLTP