Question : A data scientist is asked to implement an article recommendation feature for an on-line magazine. The magazine does not want to use client tracking technologies such as cookies or reading history. Therefore, only the style and subject matter of the current article is available for making recommendations. All of the magazine's articles are stored in a database in a format suitable for analytics. Which method should the data scientist try first? 1. K Means Clustering 2. Naive Bayesian 3. Access Mostly Uused Products by 50000+ Subscribers 4. Association Rules
Correct Answer : Get Lastest Questions and Answer : Explanation: kmeans uses an iterative algorithm that minimizes the sum of distances from each object to its cluster centroid, over all clusters. This algorithm moves objects between clusters until the sum cannot be decreased further. The result is a set of clusters that are as compact and well-separated as possible. You can control the details of the minimization using several optional input parameters to kmeans, including ones for the initial values of the cluster centroids, and for the maximum number of iterations. Clustering is primarily an exploratory technique to discover hidden structures of the data, possibly as a prelude to more focused analysis or decision processes. Some specific applications of k-means are image processing, medical, and customer segmentation. Clustering is often used as a lead-in to classification. Once the clusters are identified, labels can be applied to each cluster to classify each group based on its characteristics. Marketing and sales groups use k-means to better identify customers who have similar behaviors and spending patterns.
Question : How are window functions different from regular aggregate functions? 1. Rows retain their separate identities and the window function can access more than the current row. 2. Rows are grouped into an output row and the window function can access more than the current row. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Rows are grouped into an output row and the window function can only access the current row.
Correct Answer : Get Lastest Questions and Answer : Explanation: A window function enables aggregation to occur but still provides the entire dataset with the summary results. For example, the RANK() function can be used to order a set of rows based on some attribute. A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row - the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
Question : Consider these item sets: (hat, scarf, coat) (hat, scarf, coat, gloves) (hat, scarf, gloves) (hat, gloves) (scarf, coat, gloves) What is the confidence of the rule (hat, scarf) -> gloves?
Correct Answer : Get Lastest Questions and Answer : Explanation: confidence measures the chance that X and Y appear together in relation to the chance X appears. Confidence can be used to identify the interestingness of the rules. Two of the hat, scarf combination has gloves out of three (hat, scarf, coat) (hat, scarf, coat, gloves) (hat, scarf, gloves) 2/3 = 66% Antecedent Consequent A 0 A 0 A 1 A 0 B 1 B 0 B 1 where the antecedent is the input variable that we can control, and the consequent is the variable we are trying to predict. Real mining problems would typically have more complex antecedents, but usually focus on single-value consequents. Most mining algorithms would determine the following rules (targeting models): Rule 1: A implies 0 Rule 2: B implies 1 because these are simply the most common patterns found in the data. A simple review of the above table should make these rules obvious. The confidence for Rule 1 is 3/4 because three of the four records that meet the antecedent of A meet the consequent of 0. The confidence for Rule 2 is 2/3 because two of the three records that meet the antecedent of B meet the consequent of 1.
1. where only the subtotals are to be included in the output 2. where only the grand totals are to be included in the output 3. Access Mostly Uused Products by 50000+ Subscribers in the output 4. where all subtotals and grand totals are to be included in the output