k-means clustering

Posted: 2016-06-28 , Modified: 2016-06-28

Tags: clustering

Reference

wikipedia (??Feature learning)

The \(k\)-means clustering is the following. (Lloyd’s algorithm) Let data points be \(x^{(i)}\).

  1. Initialize \(\mu_1,\ldots, \mu_k\).
  2. Repeat until convergence.
    1. Let \(c(i) = \amin_j |x^{(i)}-\mu_j|\).
    2. Let \(\mu_j = \EE_{c(i)=j} x^{(i)}\).

This is alternating minimization on the distortion function \[J(c,\mu) = \sumo im \ve{x^{(i)}-\mu_{c(i)}}^2\] with respect to \(c\) and \(\mu\). This value decreases and so converges (possible to local optimum); in practice, the \(\mu\)’s converge.

(Objective function \(\amin_S \sumo ik \sum_{x\in S_i} \ve{x-\mu_i}^2\).)

Questions

Are there provable guarantees on \(k\)-means clustering—if so, what? And/or is there a (worst-case) hardness result?

See also k-NN.