Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Markus Maier

How the result of graph clustering methods depends on the construction of the graph

Feb 10, 2011

Markus Maier, Ulrike von Luxburg, Matthias Hein

Figure 1 for How the result of graph clustering methods depends on the construction of the graph

Figure 2 for How the result of graph clustering methods depends on the construction of the graph

Figure 3 for How the result of graph clustering methods depends on the construction of the graph

Figure 4 for How the result of graph clustering methods depends on the construction of the graph

Abstract:We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.

Via

Access Paper or Ask Questions

Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

Dec 17, 2009

Markus Maier, Matthias Hein, Ulrike von Luxburg

Figure 1 for Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

Figure 2 for Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

Figure 3 for Optimal construction of k-nearest neighbor graphs for identifying noisy clusters

Abstract:We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest neighbor or symmetric k-nearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order log n) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest neighbor graph occurs when one attempts to detect the most significant cluster only.

* Theoretical Computer Science, 410(19), 1749-1764, April 2009
* 31 pages, 2 figures

Via

Access Paper or Ask Questions