Abstract
Clustering is one of the most important tools for analysis of large datasets,and perhaps the most popular clustering algorithm is Lloyd's algorithm for$k$-means. This algorithm takes $n$ vectors$V=[v_1,\dots,v_n]\in\mathbb{R}^{d\times n}$ and outputs $k$ centroids$c_1,\dots,c_k\in\mathbb{R}^d$; these partition the vectors into clusters basedon which centroid is closest to a particular vector. We present a classical$\varepsilon$-$k$-means algorithm that performs an approximate version of oneiteration of Lloyd's algorithm with time complexity$\tilde{O}\big(\frac{\|V\|_F^2}{n}\frac{k^{2}d}{\varepsilon^2}(k +\log{n})\big)$, exponentially improving the dependence on the data size $n$ andmatching that of the "$q$-means" quantum algorithm originally proposed byKerenidis, Landman, Luongo, and Prakash (NeurIPS'19). Moreover, we propose animproved $q$-means quantum algorithm with time complexity$\tilde{O}\big(\frac{\|V\|_F}{\sqrt{n}}\frac{k^{3/2}d}{\varepsilon}(\sqrt{k}+\sqrt{d})(\sqrt{k}+ \log{n})\big)$ that quadratically improves the runtime of our classical$\varepsilon$-$k$-means algorithm in several parameters. Our quantum algorithmdoes not rely on quantum linear algebra primitives of prior work, but insteadonly uses QRAM to prepare simple states based on the current iteration'sclusters and multivariate quantum amplitude estimation. Finally, we provideclassical and quantum query lower bounds, showing that our algorithms areoptimal in most parameters.