Question : : In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project? 1. Discovery 2. Data Preparation 3. Access Mostly Uused Products by 50000+ Subscribers 4. Communicate Results
Correct Answer : Get Lastest Questions and Answer : Explanation: Phase 1-Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data. Phase 2-Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data Phase 3-Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models. Phase 4-Model building: In Phase 4, the team develops datasets for testing, training, and production purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase. The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust environment for executing models and workflows (for example, fast hardware and parallel processing, if applicable). Phase 5-Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders. Phase 6-Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.
Question : You are testing two new weight-gain formulas for puppies. The test gives the results: Control group: 1% weight gain Formula A. 3% weight gain Formula B. 4% weight gain A one-way ANOVA returns a p-value = 0.027 What can you conclude?
1. Formula A and Formula B are about equally effective at promoting weight gain. 2. Formula A and Formula B are both effective at promoting weight gain. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Either Formula A or Formula B is effective at promoting weight gain.
Correct Answer : Get Lastest Questions and Answer : Explanation: A One-Way ANOVA (Analysis of Variance) is a statistical technique by which we can test if three or more means are equal. It tests if the value of a single variable differs significantly among three or more levels of a factor.
We can say we have a framework for one-way ANOVA when we have a single factor with three or more levels and multiple observations at each level.
In this kind of layout, we can calculate the mean of the observations within each level of our factor.
The concepts of factor, levels and multiple observations at each level can be best understood by an example.
Factor and Levels - An Example
Let us suppose that the Human Resources Department of a company desires to know if occupational stress varies according to age.
The variable of interest is therefore occupational stress as measured by a scale.
The factor being studied is age. There is just one factor (age) and hence a situation appropriate for one-way ANOVA.
Further suppose that the employees have been classified into three groups (levels):
less than 40 40 to 55 above 55 These three groups are the levels of factor age - there are three levels here. With this design, we shall have multiple observations in the form of scores on Occupational Stress from a number of employees belonging to the three levels of factor age. We are interested to know whether all the levels i.e. age groups have equal stress on the average.
Non-significance of the test statistic (F-statistic) associated with this technique would imply that age has no effect on stress experienced by employees in their respective occupations. On the other hand, significance would imply that stress afflicts different age groups differently.
Question : Data visualization is used in the final presentation of an analytics project. For what else is this technique commonly used?
Correct Answer : Get Lastest Questions and Answer : Explanation: Data exploration is an informative search used by data consumers to form true analysis from the information gathered. Often, data is gathered in a non-rigid or controlled manner in large bulks. For true analysis, this unorganized bulk of data needs to be narrowed down. This is where data exploration is used to analyze the data and information from the data to form further analysis.
Data often converges in a central warehouse called a data warehouse. This data can come from various sources using various formats. Relevant data is needed for tasks such as statistical reporting, trend spotting and pattern spotting. Data exploration is the process of gathering such relevant data. There are two main methodologies or techniques used to retrieve relevant data from large, unorganized pools. They are the manual and automatic methods. The manual method is another name for data exploration, while the automatic method is also known as data mining.
Some people believe these terms are synonymous, while others see a technical difference between them. Data mining generally refers to gathering relevant data from large databases. Data exploration, on the other hand, generally refers to a data user being able to find his or her way through large amounts of data in order to gather necessary information.
1. They can be used to calculate moving averages over various intervals. 2. They group rows into a single output row. 3. Access Mostly Uused Products by 50000+ Subscribers 4. They don't require ordering of data within a window.
1. Variables A, B, and C are significantly impacting sales and are effectively estimating sales 2. Due to the R2 of 0.10, the model is not valid - the linear regression should be re-run with all 15 variables forced into the model to increase the R2 3. Access Mostly Uused Products by 50000+ Subscribers 4. Due to the R2 of 0.10, the model is not valid - a different analytical model should be attempted