Fall 2025
\[ z = \frac{\text{value} - \text{mean}}{\text{standard deviation}} \]
Euclidean \(\operatorname{dist}(\text{pink}, \text{gray}) = 0.13\)
How close is blue to the pink-gray cluster? How we measure this is called the linkage:
Then the agglomerative coefficient is: \[ \text{AC} = \frac{1}{n} \sum_{i=1}^n \frac{l_\text{max} - l(i)}{l_\text{max}} = \frac{1}{n} \sum_{i=1}^n \left(1 - \frac{l(i)}{l_\text{max}} \right) \]
How many meaningful factors be observed in the plot?
That is, we find:
To find the cluster centers, we want to minimize the within-cluster variation. Let \(C_k\) be the set of observations in cluster \(k\): \[ W(C_k) = \sum_{i \in C_k} \|x_i - \mu_k\|_2^2 \]
The total within-cluster variation is \[ \sum_{k = 1}^K W(C_k) \]
We want to minimize this by choosing \(\mu_k\) and \(C_k\).
Eventually this will converge to a local optimum and you can stop
Then \[ s(i) = \frac{b(i) - a(i)}{\max\{ a(i), b(i) \}} \] (or 0 if no other points in the same cluster)
Average of \(s(i)\) is the silhouette score