Skip to Content

Build your own quiz

20 questions

Show Answers

See Preview

- Multiple ChoicePlease save your changes before editing any questions.
The goal of clustering a set of data is to

divide them into groups of data that are near each other

choose the best data from the set

determine the nearest neighbors of each of the data

predict the class of data

- Multiple ChoicePlease save your changes before editing any questions.
The k-means algorithm...

always converges to a clustering that minimizes the mean-square vector-representative distance

can converge to different final clustering, depending on initial choice of representatives

is widely used in practice

is typically done by hand, using paper and pencil

should only be attempted by trained professionals

- Multiple ChoicePlease save your changes before editing any questions.
The choice of k, the number of clusters to partition a set of data into,...

is a personal choice that shouldn't be discussed in public

depends on why you are clustering the data

should always be as large as your computer system can handle

has maximum 10

- Multiple ChoicePlease save your changes before editing any questions.
Which of the following statements about the K-means algorithm are correct?

The K-means algorithm is sensitive to outliers.

For different initializations, the K-means algorithm will definitely give the same clustering results.

The centroids in the K-means algorithm may not be any observed data points.

The K-means algorithm can detect non-convex clusters.

- Multiple ChoicePlease save your changes before editing any questions.
Considering the K-median algorithm, if points (0, 3), (2, 1), and (-2, 2) are the only points which are assigned to the first cluster now, what is the new centroid for this cluster?

(0,2)

(2,1)

(2,0)

(1,2)

- Multiple ChoicePlease save your changes before editing any questions.
Considering the K-means algorithm, after current iteration, we have 3 centroids (0, 1) (2, 1), (-1, 2). Will points (2, 3) and (2, 0.5) be assigned to the same cluster in the next iteration?

Yes

No

- Multiple ChoicePlease save your changes before editing any questions.
The Iris dataset contains information about Iris setosa and versicolor. What is the Euclidean distance between these two objects?

2.8

4.6

22.6

-3.6

- Multiple ChoicePlease save your changes before editing any questions.
Which of the following statements are true?

Graphs, time-series data, text, and multimedia data are all examples of data types on which cluster analysis can be performed.

Agglomerative clustering is an example of a hierarchical and distance-based clustering method.

When dealing with high-dimensional data, we sometimes consider only a subset of the dimensions when performing cluster analysis.

We can only visualize the clustering results when the data is 2-dimensional.

- Multiple ChoicePlease save your changes before editing any questions.
Which of the following statements are true?

Clustering analysis in unsupervised learning since it does not require labeled training data.

It is impossible to cluster objects in a data stream. We must have all the data objects that we need to cluster ready before clustering can be performed.

Clustering analysis has a wide range of applications in tasks such as data summarization, dynamic trend detection, multimedia analysis, and biological network analysis.

When clustering, we want to put two dissimilar data objects into the same cluster.

- Multiple ChoicePlease save your changes before editing any questions.
What are some common considerations and requirements for cluster analysis?

We need to consider how to incorporate user preference for cluster size and shape into the clustering algorithm.

In order to perform cluster analysis, we need to have a similarity measure between data objects.

We need to be able to handle a mixture of different types of attributes (e.g., numerical, categorical).

We must know the number of output clusters a priori for all clustering algorithms.

- Multiple ChoicePlease save your changes before editing any questions.
**What are the two types of Hierarchical Clustering**Top-Down Clustering (Divisive)

Bottom-Top Clustering (Agglomerative)

Dendrogram

K-means

- Multiple ChoicePlease save your changes before editing any questions.
**What is a Dendrogram?**A tree diagram used to illustrate the arrangement of clusters in hierarchical clustering.

A tree diagram used to illustrate the arrangement of clusters in partitional clustering.

A type of hierarchical clustering.

A type of bar chart diagram to visualize k-means clusters.

- Multiple ChoicePlease save your changes before editing any questions.
The most important part of _____ is selecting the variables on which clustering is based.

interpreting and profiling clusters

selecting a clustering procedure

assessing the validity of clustering

formulating the clustering problem

- Multiple ChoicePlease save your changes before editing any questions.
The most commonly used measure of similarity is the _____ or its square.

euclidean distance

city-block distance

Chebychev’s distance

Manhattan distance

- Multiple ChoicePlease save your changes before editing any questions.
_____ is a clustering procedure where all objects start out in one giant cluster. Clusters are formed by dividing this cluster into smaller and smaller clusters.

Non-hierarchical clustering

Divisive clustering

Agglomerative clustering

K-means clustering

- Multiple ChoicePlease save your changes before editing any questions.
Which of the following is required by K-means clustering?

defined distance metric

number of clusters

initial guess as to cluster centroids

all answers are correct

- Multiple ChoicePlease save your changes before editing any questions.
**In the figure above, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?**2

4

3

5

- Multiple ChoicePlease save your changes before editing any questions.
For which of the following tasks might clustering be a suitable approach?

Given sales data from a large number of products in a supermarket, estimate future sales for each of these products.

Given a database of information about your users, automatically group them into different market segments.

From the user's usage patterns on a website, identify different user groups.

Given historical weather records, predict if tomorrow's weather will be sunny or rainy.

- Multiple ChoicePlease save your changes before editing any questions.
K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?

Assign each point to its nearest cluster

Test on the cross-validation set

Update the cluster centroids based the current assignment

Using the elbow method to choose K

- Multiple ChoicePlease save your changes before editing any questions.
Clustering should be done on samples of 300 or more.

False

True