<figcaption>36 of the 500+ Type Ia supernovae discovered by the Sloan Supernova Survey</figcaption> if and only if the $i^{th}$ customer bought the CD.if and only if the $i^{th}$ word (in some order) appears in the document.set of words point in space of words. i appears in doc.vector in space of words. === “Hierarchical”
{ align=left width=300 loading=lazy }
1
2
3
4
5
- Agglomerative (bottom up): each point is a cluster,
repeatedly combining two nearest cluster.
- Divisive (top down): start with one cluster and
recursively split it.
- Key operation: repeatedly combine two nearest clusters.
=== “Point assignment”
{ align=left width=300 height=200 loading=lazy }
1
2
- Maintain a set of clusters
- Points belong to `nearest` cluster
Three important questions:
</ol></summary>
centroid = average of its (data)pointsclustroid = (data)point closest to other points </details>
</ol></summary>
centroids, clustroid </details>
</ol></summary>
Euclidean space/distancek, the number of clusters.k clusters.k, looking at the change in the average distance to centroid, as k increases.k clusters.dispersed set of points k points.



SUM, whose $i^{th}$ component is the sum of the coordinates of the points in the $i^{th}$ dimensionSUMSQ: whose $i^{th}$ component is the sum of squares of coordinates in $i^{th}$ dimensionBFR suggests two approaches
N, SUM, and SUMSQ allow us to make that calculation quickly