Question : You are working with the Clustering solution of the customer datasets. There are almost variables are available for each customer and almost ,, customer's data is available. You want to reduce the number of variables for clustering, what would you do? A. You will randomly reduce the number of variables B. You will find the correlation among the variables and from their variables are not co-related will be discarded. C. You will find the correlation among the variables and from the highly co-related variables, you will be considering only one or two variables from it. D. You cannot discard any variable for creating clusters. E. You can combine several variables in one variable
1. A,B 2. B,D 3. C,D 4. C,E 5. A,E
Correct Answer : Get Lastest Questions and Answer : Explanation: When you are applying clustering technique and you find that there are quite a huge number of variables are available. Then it is better the find the co-relation among the variables and consider only one or two variables from the highly co-related variables. Because highly co-related variable will have the same effect, while creating the cluster. We can use scatter plot matrix among the variables to find the co-relation. You can also combine several variables into a single variable. For example if you have two values in the dataset like Asset and Debt than by combining these two values like Debt to Asset ratio and use it while creating the cluster.
Question : You are having patients' data with the height and age. Where age in years and height in meters. You wanted to create cluster using this two attributes. You wanted to have near equal effect for both the age and height while creating the cluster. What you can do? A. You will be adding height with the numeric value 100 B. You will be converting each height value to centimeters C. You will be dividing both age and height with their respective standard deviation D. You will be taking square root of height
1. A,B 2. B,C 3. C,D 4. A,D 5. B,D
Correct Answer : Get Lastest Questions and Answer : Explanation: When you see the data age in years would have values like 50, 60, 70 90 years etc. And while calculating distance from centroid maximum possible value can be 90-0 and its square will be 8100. While using heights in meter can be 2-0.5(1.5) meters and its square will be 2.25 only. So you can see age has more effect than height. Hence bringing the height on same level you can convert it into centimeters. Can bring data upto 200 centimeters and then it be more effective like square of 200 maximum. However, there is another approach is to divide the each value with its standard deviation, which will not have impact of the units e.g. age/sd of the age, which results in value without unit. This can also help in reducing the effect of units.
Question : Which of the following true with regards to the K-Means clustering algorithm? A. Labels are not pre-assigned to each objects in the cluster. B. Labels are pre-assigned to each objects in the cluster. C. It classify the data based on the labels. D. It discovers the center of each cluster. E. It find each objects fall in which particular cluster
1. A,B,C 2. B,C,D 3. C,D,E 4. A,D,E 5. A,C,E
Correct Answer : Get Lastest Questions and Answer : Explanation: Clustering does not require any predefined labels on the object, rather it consider the attributes on the object. Hence, option-B is out. Clustering is different than classification technique. Hence you can discard the option-C as well. It does not use the pre-defined labels, hence it is called unsupervised learning and option-A is correct. Main purpose of the Clustering technique is to determine the center of each Cluster and then find the distance from that center. If object is near the center than it would fall in that particular cluster. Hence, finally you will have group or clusters created and get to know that objects fall in which particular cluster.
1. Selects the values in vector v that are less than 1000 and assigns them to the vector nv 2. Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000 3. Access Mostly Uused Products by 50000+ Subscribers 4. Selects values of vector v less than 1000, modifies v, and makes a copy to nv