Have an account?
The goal of clustering a set of data is to
divide them into groups of data that are near each other
choose the best data from the set
determine the nearest neighbors of each of the data
predict the class of data
The k-means algorithm...
always converges to a clustering that minimizes the mean-square vector-representative distance
can converge to different final clustering, depending on initial choice of representatives
is widely used in practice
is typically done by hand, using paper and pencil
should only be attempted by trained professionals
The choice of k, the number of clusters to partition a set of data into,...
is a personal choice that shouldn't be discussed in public
depends on why you are clustering the data
should always be as large as your computer system can handle
has maximum 10
Which of the following statements about the K-means algorithm are correct?
The K-means algorithm is sensitive to outliers.
For different initializations, the K-means algorithm will definitely give the same clustering results.
The centroids in the K-means algorithm may not be any observed data points.
The K-means algorithm can detect non-convex clusters.
Considering the K-median algorithm, if points (0, 3), (2, 1), and (-2, 2) are the only points which are assigned to the first cluster now, what is the new centroid for this cluster?
Considering the K-means algorithm, after current iteration, we have 3 centroids (0, 1) (2, 1), (-1, 2). Will points (2, 3) and (2, 0.5) be assigned to the same cluster in the next iteration?
The Iris dataset contains information about Iris setosa and versicolor. What is the Euclidean distance between these two objects?
Which of the following statements are true?
Graphs, time-series data, text, and multimedia data are all examples of data types on which cluster analysis can be performed.
Agglomerative clustering is an example of a hierarchical and distance-based clustering method.
When dealing with high-dimensional data, we sometimes consider only a subset of the dimensions when performing cluster analysis.
We can only visualize the clustering results when the data is 2-dimensional.
Which of the following statements are true?
Clustering analysis in unsupervised learning since it does not require labeled training data.
It is impossible to cluster objects in a data stream. We must have all the data objects that we need to cluster ready before clustering can be performed.
Clustering analysis has a wide range of applications in tasks such as data summarization, dynamic trend detection, multimedia analysis, and biological network analysis.
When clustering, we want to put two dissimilar data objects into the same cluster.
What are some common considerations and requirements for cluster analysis?
We need to consider how to incorporate user preference for cluster size and shape into the clustering algorithm.
In order to perform cluster analysis, we need to have a similarity measure between data objects.
We need to be able to handle a mixture of different types of attributes (e.g., numerical, categorical).
We must know the number of output clusters a priori for all clustering algorithms.
What are the two types of Hierarchical Clustering
Top-Down Clustering (Divisive)
Bottom-Top Clustering (Agglomerative)
What is a Dendrogram?
A tree diagram used to illustrate the arrangement of clusters in hierarchical clustering.
A tree diagram used to illustrate the arrangement of clusters in partitional clustering.
A type of hierarchical clustering.
A type of bar chart diagram to visualize k-means clusters.
The most important part of _____ is selecting the variables on which clustering is based.
interpreting and profiling clusters
selecting a clustering procedure
assessing the validity of clustering
formulating the clustering problem
The most commonly used measure of similarity is the _____ or its square.
_____ is a clustering procedure where all objects start out in one giant cluster. Clusters are formed by dividing this cluster into smaller and smaller clusters.
Which of the following is required by K-means clustering?
defined distance metric
number of clusters
initial guess as to cluster centroids
all answers are correct
In the figure above, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?
For which of the following tasks might clustering be a suitable approach?
Given sales data from a large number of products in a supermarket, estimate future sales for each of these products.
Given a database of information about your users, automatically group them into different market segments.
From the user's usage patterns on a website, identify different user groups.
Given historical weather records, predict if tomorrow's weather will be sunny or rainy.
K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?
Assign each point to its nearest cluster
Test on the cross-validation set
Update the cluster centroids based the current assignment
Using the elbow method to choose K
Clustering should be done on samples of 300 or more.