Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Chenglin Fan, Ping Li, Hanyu Peng

The densest subgraph of a large graph usually refers to some subgraph with the highest average degree, which has been extended to the family of $p$-means dense subgraph objectives by~\citet{veldt2021generalized}. The $p$-mean densest subgraph problem seeks a subgraph with the highest average $p$-th-power degree, whereas the standard densest subgraph problem seeks a subgraph with a simple highest average degree. It was shown that the standard peeling algorithm can perform arbitrarily poorly on generalized objective when $p>1$ but uncertain when $0<p<1$. In this paper, we are the first to show that a standard peeling algorithm can still yield $2^{1/p}$-approximation for the case $0<p < 1$. (Veldt 2021) proposed a new generalized peeling algorithm (GENPEEL), which for $p \geq 1$ has an approximation guarantee ratio $(p+1)^{1/p}$, and time complexity $O(mn)$, where $m$ and $n$ denote the number of edges and nodes in graph respectively. In terms of algorithmic contributions, we propose a new and faster generalized peeling algorithm (called GENPEEL++ in this paper), which for $p \in [1, +\infty)$ has an approximation guarantee ratio $(2(p+1))^{1/p}$, and time complexity $O(m(\log n))$, where $m$ and $n$ denote the number of edges and nodes in graph, respectively. This approximation ratio converges to 1 as $p \rightarrow \infty$.

Via

Chenglin Fan, Ping Li, Xiaoyun Li

When designing clustering algorithms, the choice of initial centers is crucial for the quality of the learned clusters. In this paper, we develop a new initialization scheme, called HST initialization, for the $k$-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. From the tree, we propose a novel and efficient search algorithm, for good initial centers that can be used subsequently for the local search algorithm. Our proposed HST initialization can produce initial centers achieving lower errors than those from another popular initialization method, $k$-median++, with comparable efficiency. The HST initialization can also be extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error from applying DP local search followed by our private HST initialization improves previous results on the approximation error, and approaches the lower bound within a small factor. Experiments justify the theory and demonstrate the effectiveness of our proposed method. Our approach can also be extended to the $k$-means problem.

Via

Vincent Cohen-Addad, Chenglin Fan, Silvio Lattanzi, Slobodan Mitrović, Ashkan Norouzi-Fard, Nikos Parotsidis, Jakub Tarnawski

Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labelling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our approximation guarantees are stronger than those shown in prior work and are optimal up to logarithmic factors.

Via