Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exact and Approximate Hierarchical Clustering Using A*

Apr 14, 2021

Craig S. Greenberg, Sebastian Macaluso, Nicholas Monath, Avinava Dubey, Patrick Flaherty, Manzil Zaheer, Amr Ahmed, Kyle Cranmer, Andrew McCallum

Figure 1 for Exact and Approximate Hierarchical Clustering Using A*

Figure 2 for Exact and Approximate Hierarchical Clustering Using A*

Figure 3 for Exact and Approximate Hierarchical Clustering Using A*

Figure 4 for Exact and Approximate Hierarchical Clustering Using A*

Share this with someone who'll enjoy it:

Abstract:Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel \emph{trellis} data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with $10^{12}$ trees to $10^{15}$ trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than $10^{1000}$ trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.

* 30 pages, 9 figures

View paper on

Share this with someone who'll enjoy it:

Title:Exact and Approximate Hierarchical Clustering Using A*

Paper and Code