Question : Which word or phrase completes the statement? Business Intelligence is to ad-hoc reporting and dashboards as Data Science is to ______________ .
Explanation: Data science skills are of vital and growing importance in commercial, governmental and not-for-profit organisations. Those in the management, risk, customer and IT functions increasingly need skills and/or literacy in this area. This course introduces a range of data mining tools and techniques as they are commonly used in business.
Question : What is a property of window functions in SQL commands?
1. They can be used to calculate moving averages over various intervals. 2. They group rows into a single output row. 3. Access Mostly Uused Products by 50000+ Subscribers 4. They don't require ordering of data within a window.
Explanation: In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating a series of averages of different subsets of the full data set. It is also called a moving mean (MM)[1] or rolling mean and is a type of finite impulse response filter. Variations include: simple, and cumulative, or weighted forms (described below).
Given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series. Then the subset is modified by "shifting forward"; that is, excluding the first number of the series and including the next number following the original subset in the series. This creates a new subset of numbers, which is averaged. This process is repeated over the entire data series. The plot line connecting all the (fixed) averages is the moving average. A moving average is a set of numbers, each of which is the average of the corresponding subset of a larger set of datum points. A moving average may also use unequal weights for each datum value in the subset to emphasize particular values in the subset.
A moving average is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles. The threshold between short-term and long-term depends on the application, and the parameters of the moving average will be set accordingly. For example, it is often used in technical analysis of financial data, like stock prices, returns or trading volumes. It is also used in economics to examine gross domestic product, employment or other macroeconomic time series. Mathematically, a moving average is a type of convolution and so it can be viewed as an example of a low-pass filter used in signal processing. When used with non-time series data, a moving average filters higher frequency components without any specific connection to time, although typically some kind of ordering is implied. Viewed simplistically it can be regarded as smoothing the data.
Question : You are attempting to find the Euclidean distance between two centroids: Centroid A's coordinates: (X = 2, Y = 4) Centroid B's coordinates (X = 8, Y = 10) Which formula finds the correct Euclidean distance?
Explanation: In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" (i.e straight line) distance between two points in Euclidean space. With this distance, Euclidean space becomes a metric space. The associated norm is called the Euclidean norm. Older literature refers to the metric as Pythagorean metric. Very often, especially when measuring the distance in the plane, we use the formula for the Euclidean distance. According to the Euclidean distance formula, the distance between two points in the plane with coordinates (x, y) and (a, b) is given by
1. Create clusters based on the data and use them as model inputs 2. Force all 15 variables into the model as independent variables 3. Access Mostly Uused Products by 50000+ Subscribers 4. Break variables A, B, and C into their own univariate models
1. Divide the data into sets of 1, 000 user profiles, and run K-means clustering in RHadoop iteratively. 2. Run MapReduce to transform the data, and find relevant key value pairs. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Partition the data by XML file size, and run K-means clustering in each partition.
1. Use the extracted text and your regular expression to perform a sentiment analysis based on mentions of the new product 2. Convert the extracted text into a suitable document representation and index into a review corpus 3. Access Mostly Uused Products by 50000+ Subscribers 4. Group the reviews using Naive Bayesian classification