Abstract:We study a robust online convex optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of rounds k, unknown to the learner. Our focus is on a novel setting allowing unbounded domains and large gradients for the losses without relying on a Lipschitz assumption. We introduce the Log Exponential Adjusted Robust and iNvex (LEARN) loss, a non-convex (invex) robust loss function to mitigate the effects of outliers and develop a robust variant of the online gradient descent algorithm by leveraging the LEARN loss. We establish tight regret guarantees (up to constants), in a dynamic setting, with respect to the uncorrupted rounds and conduct experiments to validate our theory. Furthermore, we present a unified analysis framework for developing online optimization algorithms for non-convex (invex) losses, utilizing it to provide regret bounds with respect to the LEARN loss, which may be of independent interest.
Abstract:Distributed gradient descent algorithms have come to the fore in modern machine learning, especially in parallelizing the handling of large datasets that are distributed across several workers. However, scant attention has been paid to analyzing the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions instead of random noise. In this paper, we formulate a novel problem in which adversarial corruptions are present in a distributed learning system. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Extensive convergence analysis for (strongly) convex loss functions is provided for different choices of the stepsize. We carefully optimize the stepsize schedule to accelerate the convergence of the algorithm, while at the same time amortizing the effect of the corruption over time. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
Abstract:We consider offline reinforcement learning (RL) with preference feedback in which the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an algorithm, \underline{RL} with \underline{L}ocally \underline{O}ptimal \underline{W}eights or {\sc RL-LOW}, which yields a simple regret of $\exp ( - \Omega(n/H) )$ where $n$ is the number of data samples and $H$ denotes an instance-dependent hardness quantity that depends explicitly on the suboptimality gap of each action. Furthermore, we derive a first-of-its-kind instance-dependent lower bound in offline RL with preference feedback. Interestingly, we observe that the lower and upper bounds on the simple regret match order-wise in the exponent, demonstrating order-wise optimality of {\sc RL-LOW}. In view of privacy considerations in practical applications, we also extend {\sc RL-LOW} to the setting of $(\varepsilon,\delta)$-differential privacy and show, somewhat surprisingly, that the hardness parameter $H$ is unchanged in the asymptotic regime as $n$ tends to infinity; this underscores the inherent efficiency of {\sc RL-LOW} in terms of preserving the privacy of the observed rewards. Given our focus on establishing instance-dependent bounds, our work stands in stark contrast to previous works that focus on establishing worst-case regrets for offline RL with preference feedback.
Abstract:We consider a ubiquitous scenario in the study of Influence Maximization (IM), in which there is limited knowledge about the topology of the diffusion network. We set the IM problem in a multi-round diffusion campaign, aiming to maximize the number of distinct users that are influenced. Leveraging the capability of bandit algorithms to effectively balance the objectives of exploration and exploitation, as well as the expressivity of neural networks, our study explores the application of neural bandit algorithms to the IM problem. We propose the framework IM-GNB (Influence Maximization with Graph Neural Bandits), where we provide an estimate of the users' probabilities of being influenced by influencers (also known as diffusion seeds). This initial estimate forms the basis for constructing both an exploitation graph and an exploration one. Subsequently, IM-GNB handles the exploration-exploitation tradeoff, by selecting seed nodes in real-time using Graph Convolutional Networks (GCN), in which the pre-estimated graphs are employed to refine the influencers' estimated rewards in each contextual setting. Through extensive experiments on two large real-world datasets, we demonstrate the effectiveness of IM-GNB compared with other baseline methods, significantly improving the spread outcome of such diffusion campaigns, when the underlying network is unknown.
Abstract:Electromagnetic information theory (EIT) is an interdisciplinary subject that serves to integrate deterministic electromagnetic theory with stochastic Shannon's information theory. Existing EIT analysis operates in the continuous space domain, which is not aligned with the practical algorithms working in the discrete space domain. This mismatch leads to a significant difficulty in application of EIT methodologies to practical discrete space systems, which is called as the discrete-continuous gap in this paper. To bridge this gap, we establish the discrete-continuous correspondence with a prolate spheroidal wave function (PSWF)-based ergodic capacity analysis framework. Specifically, we state and prove some discrete-continuous correspondence lemmas to establish a firm theoretical connection between discrete information-theoretic quantities to their continuous counterparts. With these lemmas, we apply the PSWF ergodic capacity bound to advanced MIMO architectures such as continuous-aperture MIMO (CAP-MIMO) and extremely large-scale MIMO (XL-MIMO). From this PSWF capacity bound, we discover the capacity saturation phenomenon both theoretically and empirically. Although the growth of MIMO performance is fundamentally limited in this EIT-based analysis framework, we reveal new opportunities in MIMO channel estimation by exploiting the EIT knowledge about the channel. Inspired by the PSWF capacity bound, we utilize continuous PSWFs to improve the pilot design of discrete MIMO channel estimators, which is called as the PSWF channel estimator (PSWF-CE). Simulation results demonstrate improved performances of the proposed PSWF-CE, compared to traditional minimum mean squared error (MMSE) and compressed sensing-based estimators.
Abstract:The Indexed Minimum Empirical Divergence (IMED) algorithm is a highly effective approach that offers a stronger theoretical guarantee of the asymptotic optimality compared to the Kullback--Leibler Upper Confidence Bound (KL-UCB) algorithm for the multi-armed bandit problem. Additionally, it has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. Despite its effectiveness, the generalization of this algorithm to contextual bandits with linear payoffs has remained elusive. In this paper, we present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms. We demonstrate that LinIMED provides a $\widetilde{O}(d\sqrt{T})$ upper regret bound where $d$ is the dimension of the context and $T$ is the time horizon. Furthermore, extensive empirical studies reveal that LinIMED and its variants outperform widely-used linear bandit algorithms such as LinUCB and Linear Thompson Sampling in some regimes.
Abstract:Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based image editing framework that achieves pixel-level control using Generative Adversarial Networks (GANs). A flurry of subsequent studies enhanced this framework's generality by leveraging large-scale diffusion models. However, these methods often suffer from inordinately long processing times (exceeding 1 minute per edit) and low success rates. Addressing these issues head on, we present InstaDrag, a rapid approach enabling high quality drag-based image editing in ~1 second. Unlike most previous methods, we redefine drag-based editing as a conditional generation task, eliminating the need for time-consuming latent optimization or gradient-based guidance during inference. In addition, the design of our pipeline allows us to train our model on large-scale paired video frames, which contain rich motion information such as object translations, changing poses and orientations, zooming in and out, etc. By learning from videos, our approach can significantly outperform previous methods in terms of accuracy and consistency. Despite being trained solely on videos, our model generalizes well to perform local shape deformations not presented in the training data (e.g., lengthening of hair, twisting rainbows, etc.). Extensive qualitative and quantitative evaluations on benchmark datasets corroborate the superiority of our approach. The code and model will be released at https://github.com/magic-research/InstaDrag.
Abstract:We study the problem of adversarial combinatorial bandit with a switching cost $\lambda$ for a switch of each selected arm in each round, considering both the bandit feedback and semi-bandit feedback settings. In the oblivious adversarial case with $K$ base arms and time horizon $T$, we derive lower bounds for the minimax regret and design algorithms to approach them. To prove these lower bounds, we design stochastic loss sequences for both feedback settings, building on an idea from previous work in Dekel et al. (2014). The lower bound for bandit feedback is $ \tilde{\Omega}\big( (\lambda K)^{\frac{1}{3}} (TI)^{\frac{2}{3}}\big)$ while that for semi-bandit feedback is $ \tilde{\Omega}\big( (\lambda K I)^{\frac{1}{3}} T^{\frac{2}{3}}\big)$ where $I$ is the number of base arms in the combinatorial arm played in each round. To approach these lower bounds, we design algorithms that operate in batches by dividing the time horizon into batches to restrict the number of switches between actions. For the bandit feedback setting, where only the total loss of the combinatorial arm is observed, we introduce the Batched-Exp2 algorithm which achieves a regret upper bound of $\tilde{O}\big((\lambda K)^{\frac{1}{3}}T^{\frac{2}{3}}I^{\frac{4}{3}}\big)$ as $T$ tends to infinity. In the semi-bandit feedback setting, where all losses for the combinatorial arm are observed, we propose the Batched-BROAD algorithm which achieves a regret upper bound of $\tilde{O}\big( (\lambda K)^{\frac{1}{3}} (TI)^{\frac{2}{3}}\big)$.
Abstract:We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fixed regret or gains a guaranteed reward. Given this added layer of complexity, we ask whether we can develop efficient algorithms that are both asymptotically and minimax optimal. We answer this question affirmatively by designing and analyzing algorithms whose regrets meet their corresponding information-theoretic lower bounds. Our results offer valuable quantitative insights into the benefits of the abstention option, laying the groundwork for further exploration in other online decision-making problems with such an option. Numerical results further corroborate our theoretical findings.
Abstract:We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints, when the arm rewards are supported on the unit interval. Given a finite budget $T$ and a privacy parameter $\varepsilon>0$, the goal is to minimise the error probability in finding the arm with the largest mean after $T$ sampling rounds, subject to the constraint that the policy of the decision maker satisfies a certain {\em $\varepsilon$-differential privacy} ($\varepsilon$-DP) constraint. We construct a policy satisfying the $\varepsilon$-DP constraint (called {\sc DP-BAI}) by proposing the principle of {\em maximum absolute determinants}, and derive an upper bound on its error probability. Furthermore, we derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$, with exponents in the two bounds matching order-wise in (a) the sub-optimality gaps of the arms, (b) $\varepsilon$, and (c) the problem complexity that is expressible as the sum of two terms, one characterising the complexity of standard fixed-budget BAI (without privacy constraints), and the other accounting for the $\varepsilon$-DP constraint. Additionally, we present some auxiliary results that contribute to the derivation of the lower bound on the error probability. These results, we posit, may be of independent interest and could prove instrumental in proving lower bounds on error probabilities in several other bandit problems. Whereas prior works provide results for BAI in the fixed-budget regime without privacy constraints or in the fixed-confidence regime with privacy constraints, our work fills the gap in the literature by providing the results for BAI in the fixed-budget regime under the $\varepsilon$-DP constraint.