Question : You have completed your model and are handing it off to be deployed in production. What should you deliver to the production team, along with your commented code? 1. The production team needs to understand how your model will interact with the processes they already support. Give them documentation on expected model inputs and outputs, and guidance on error-handling. 2. The production team are technical, and they need to understand how the processes that they support work, so give them the same presentation that you prepared for the analysts. 3. Access Mostly Uused Products by 50000+ Subscribers to understand how your model interacts with the processes they already support. Give them the same presentation that you prepared for the project sponsor. 4. The production team supports the processes that run the organization, and they need context to understand how your model interacts with the processes they already support. Give them the executive summary.
Explanation: Data Analytics Lifecycle 1-Discovery: 2-Data preparation: 3-Model planning: 4-Model building: 5-Communicate results: 6-Operationalize: the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.
Question : Which method is used to solve for coefficients b, b, .., bn in your linear regression model : Y = b0 + b1x1+b2x2+ .... +bnxn
Correct Answer : Get Lastest Questions and Answer : Explanation: RY = b0 + b1x1+b2x2+ .... +bnxn In the linear model, the bi's represent the unknown p parameters. The estimates for these unknown parameters are chosen so that, on average, the model provides a reasonable estimate of a person's income based on age and education. In other words, the fitted model should minimize the overall error between the linear model and the actual observations. Ordinary Least Squares (OLS) is a common technique to estimate the parameters
Question : You submit a MapReduce job to a Hadoop cluster and notice that although the job was successfully submitted, it is not completing. What should you do? 1. Ensure that the NameNode is running 2. Ensure that the JobTracker is running 3. Access Mostly Uused Products by 50000+ Subscribers 4. Ensure that a DataNode is running
Correct Answer : Get Lastest Questions and Answer : Explanation: A TaskTracker is a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations - from a JobTracker.
Every TaskTracker is configured with a set of slots, these indicate the number of tasks that it can accept. When the JobTracker tries to find somewhere to schedule a task within the MapReduce operations, it first looks for an empty slot on the same server that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.
The TaskTracker spawns a separate JVM processes to do the actual work; this is to ensure that process failure does not take down the task tracker. The TaskTracker monitors these spawned processes, capturing the output and exit codes. When the process finishes, successfully or not, the tracker notifies the JobTracker. The TaskTrackers also send out heartbeat messages to the JobTracker, usually every few minutes, to reassure the JobTracker that it is still alive. These message also inform the JobTracker of the number of available slots, so the JobTracker can stay up to date with where in the cluster work can be delegated. The JobTracker is the service within Hadoop that farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.
Client applications submit jobs to the Job tracker. The JobTracker talks to the NameNode to determine the location of the data The JobTracker locates TaskTracker nodes with available slots at or near the data The JobTracker submits the work to the chosen TaskTracker nodes. The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker. A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable. When the work is completed, the JobTracker updates its status. Client applications can poll the JobTracker for information.
1. When some of the input variables might be correlated 2. When all the input variables are numerical. 3. Access Mostly Uused Products by 50000+ Subscribers 4. When you are using several categorical input variables with over 1000 possible values each.
1. that a newly created model provides a prediction of a null sample mean 2. that a newly created model does not provide better predictions than the currently existing model 3. Access Mostly Uused Products by 50000+ Subscribers 4. that a newly created model provides a prediction that will be well fit to the null distribution
1. The tree did not split on all the input variables. You need a larger data set to get a more accurate model. 2. The AUC is high, and the small nodes are all very pure. This is an accurate model. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The AUC is high, so the overall model is accurate. It is not well-calibrated, because the small nodes will give poor estimates of probability.