Abstract:Replicability requires that algorithmic conclusions remain consistent when rerun on independently drawn data. A central structural question is composition: given $k$ problems each admitting a $ρ$-replicable algorithm with sample complexity $n$, how many samples are needed to solve all jointly while preserving replicability? The naive analysis yields $\widetilde{O}(nk^2)$ samples, and Bun et al. (STOC'23) observed that reductions through differential privacy give an alternative $\widetilde{O}(n^2k)$ bound, leaving open whether the optimal $\widetilde{O}(nk)$ scaling is achievable. We resolve this open problem and, more generally, show that problems with sample complexities $n_1,\ldots,n_k$ can be jointly solved with $\widetilde{O}(\sum_i n_i)$ samples while preserving constant replicability. Our approach converts each replicable algorithm into a perfectly generalizing one, composes them via a privacy-style analysis, and maps back via correlated sampling. This yields the first advanced composition theorem for replicability. En route, we obtain new bounds for the composition of perfectly generalizing algorithms with heterogeneous parameters. As part of our results, we provide a boosting theorem for the success probability of replicable algorithms. For a broad class of problems, the failure probability appears as a separate additive term independent of $ρ$, immediately yielding improved sample complexity bounds for several problems. Finally, we prove an $Ω(nk^2)$ lower bound for adaptive composition, establishing a quadratic separation from the non-adaptive setting. The key technique, which we call the phantom run, yields structural results of independent interest.
Abstract:We initiate the study of the submodular cover problem in dynamic setting where the elements of the ground set are inserted and deleted. In the classical submodular cover problem, we are given a monotone submodular function $f : 2^{V} \to \mathbb{R}^{\ge 0}$ and the goal is to obtain a set $S \subseteq V$ that minimizes the cost subject to the constraint $f(S) = f(V)$. This is a classical problem in computer science and generalizes the Set Cover problem, 2-Set Cover, and dominating set problem among others. We consider this problem in a dynamic setting where there are updates to our set $V$, in the form of insertions and deletions of elements from a ground set $\mathcal{V}$, and the goal is to maintain an approximately optimal solution with low query complexity per update. For this problem, we propose a randomized algorithm that, in expectation, obtains a $(1-O(\epsilon), O(\epsilon^{-1}))$-bicriteria approximation using polylogarithmic query complexity per update.
Abstract:Maximizing submodular functions has been increasingly used in many applications of machine learning, such as data summarization, recommendation systems, and feature selection. Moreover, there has been a growing interest in both submodular maximization and dynamic algorithms. In 2020, Monemizadeh and Lattanzi, Mitrovic, Norouzi{-}Fard, Tarnawski, and Zadimoghaddam initiated developing dynamic algorithms for the monotone submodular maximization problem under the cardinality constraint $k$. Recently, there have been some improvements on the topic made by Banihashem, Biabani, Goudarzi, Hajiaghayi, Jabbarzade, and Monemizadeh. In 2022, Chen and Peng studied the complexity of this problem and raised an important open question: "Can we extend [fully dynamic] results (algorithm or hardness) to non-monotone submodular maximization?". We affirmatively answer their question by demonstrating a reduction from maximizing a non-monotone submodular function under the cardinality constraint $k$ to maximizing a monotone submodular function under the same constraint. Through this reduction, we obtain the first dynamic algorithms to solve the non-monotone submodular maximization problem under the cardinality constraint $k$. Our algorithms maintain an $(8+\epsilon)$-approximate of the solution and use expected amortized $O(\epsilon^{-3}k^3\log^3(n)\log(k))$ or $O(\epsilon^{-1}k^2\log^3(k))$ oracle queries per update, respectively. Furthermore, we showcase the benefits of our dynamic algorithm for video summarization and max-cut problems on several real-world data sets.
Abstract:Submodular maximization under matroid and cardinality constraints are classical problems with a wide range of applications in machine learning, auction theory, and combinatorial optimization. In this paper, we consider these problems in the dynamic setting where (1) we have oracle access to a monotone submodular function $f: 2^{V} \rightarrow \mathbb{R}^+$ and (2) we are given a sequence $\mathcal{S}$ of insertions and deletions of elements of an underlying ground set $V$. We develop the first parameterized (by the rank $k$ of a matroid $\mathcal{M}$) dynamic $(4+\epsilon)$-approximation algorithm for the submodular maximization problem under the matroid constraint using an expected worst-case $O(k\log(k)\log^3{(k/\epsilon)})$ query complexity where $0 < \epsilon \le 1$. Chen and Peng at STOC'22 studied the complexity of this problem in the insertion-only dynamic model (a restricted version of the fully dynamic model where deletion is not allowed), and they raised the following important open question: *"for fully dynamic streams [sequences of insertions and deletions of elements], there is no known constant-factor approximation algorithm with poly(k) amortized queries for matroid constraints."* Our dynamic algorithm answers this question as well as an open problem of Lattanzi et al. (NeurIPS'20) affirmatively. As a byproduct, for the submodular maximization under the cardinality constraint $k$, we propose a parameterized (by the cardinality constraint $k$) dynamic algorithm that maintains a $(2+\epsilon)$-approximate solution of the sequence $\mathcal{S}$ at any time $t$ using the expected amortized worst-case complexity $O(k\epsilon^{-1}\log^2(k))$. This is the first dynamic algorithm for the problem that has a query complexity independent of the size of ground set $V$.