Dell EMC Data Science and BigData Certification Questions and Answers

Question : You have completed your model and are handing it off to be deployed in production. What should
you deliver to the production team, along with your commented code?

1. The production team needs to understand how your model will interact with the processes they
already support. Give them documentation on expected model inputs and outputs, and guidance
on error-handling.
2. The production team are technical, and they need to understand how the processes that they
support work, so give them the same presentation that you prepared for the analysts.
3. Access Mostly Uused Products by 50000+ Subscribers
to understand how your model interacts with the processes they already support. Give them the
same presentation that you prepared for the project sponsor.
4. The production team supports the processes that run the organization, and they need context
to understand how your model interacts with the processes they already support. Give them the
executive summary.

Correct Answer : Get Lastest Questions and Answer :

Explanation: Data Analytics Lifecycle
1-Discovery:
2-Data preparation:
3-Model planning:
4-Model building:
5-Communicate results:
6-Operationalize: the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.

Question : Which method is used to solve for coefficients b, b, .., bn in your linear regression model :
Y = b0 + b1x1+b2x2+ .... +bnxn

1. Apriori Algorithm
2. Ridge and Lasso
3. Access Mostly Uused Products by 50000+ Subscribers
4. Integer programming

Correct Answer : Get Lastest Questions and Answer :
Explanation: RY = b0 + b1x1+b2x2+ .... +bnxn
In the linear model, the bi's represent the unknown p parameters. The estimates for these
unknown parameters are chosen so that, on average, the model provides a reasonable
estimate of a person's income based on age and education. In other words, the fitted model
should minimize the overall error between the linear model and the actual observations.
Ordinary Least Squares (OLS) is a common technique to estimate the parameters

Question : You submit a MapReduce job to a Hadoop cluster and notice that although the job was
successfully submitted, it is not completing. What should you do?

1. Ensure that the NameNode is running
2. Ensure that the JobTracker is running
3. Access Mostly Uused Products by 50000+ Subscribers
4. Ensure that a DataNode is running

Correct Answer : Get Lastest Questions and Answer :
Explanation: A TaskTracker is a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations - from a JobTracker.

Every TaskTracker is configured with a set of slots, these indicate the number of tasks that it can accept. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first
looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.

The TaskTracker spawns a separate JVM processes to do the actual work; this is to ensure that process failure does not take down the task tracker. The TaskTracker monitors these spawned processes, capturing the output
and exit codes. When the process finishes, successfully or not, the tracker notifies the JobTracker. The TaskTrackers also send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the
JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. The JobTracker is
the service within Hadoop that farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.

Client applications submit jobs to the Job tracker.
The JobTracker talks to the NameNode to determine the location of the data
The JobTracker locates TaskTracker nodes with available slots at or near the data
The JobTracker submits the work to the chosen TaskTracker nodes.
The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.
A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even
blacklist the TaskTracker as unreliable.
When the work is completed, the JobTracker updates its status.
Client applications can poll the JobTracker for information.

Related Questions

Question : The web analytics team uses Hadoop to process access logs. They now want to correlate this
data with structured user data residing in their massively parallel database. Which tool should they
use to export the structured data from Hadoop?

1. Sqoop
2. Pig
3. Access Mostly Uused Products by 50000+ Subscribers
4. Scribe

Question : When would you prefer a Naive Bayes model to a logistic regression model for classification?

1. When some of the input variables might be correlated
2. When all the input variables are numerical.
3. Access Mostly Uused Products by 50000+ Subscribers
4. When you are using several categorical input variables with over 1000 possible values each.

Question : Before you build an ARMA model, how can you tell if your time series is weakly stationary?

1. The mean of the series is close to 0.
2. The series is normally distributed.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There appears to be no apparent trend component

Question : What is an example of a null hypothesis?

1. that a newly created model provides a prediction of a null sample mean
2. that a newly created model does not provide better predictions than the currently existing model
3. Access Mostly Uused Products by 50000+ Subscribers
4. that a newly created model provides a prediction that will be well fit to the null distribution

Question : You have fit a decision tree classifier using input variables. The resulting tree used of the
variables, and is 5 levels deep. Some of the nodes contain only 3 data points. The AUC of the
model is 0.85. What is your evaluation of this model?

1. The tree did not split on all the input variables. You need a larger data set to get a more accurate model.
2. The AUC is high, and the small nodes are all very pure. This is an accurate model.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The AUC is high, so the overall model is accurate. It is not well-calibrated, because the small nodes will give poor estimates of probability.

Question : If your intention is to show trends over time, which chart type is the most appropriate way to depict the data?

1. Line chart
2. Bar chart
3. Access Mostly Uused Products by 50000+ Subscribers
4. Histogram