Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhishek Sinha

Online Subset Selection using $α$-Core with no Augmented Regret

Sep 29, 2022

Sourav Sahoo, Samrat Mukhopadhyay, Abhishek Sinha

Figure 1 for Online Subset Selection using $α$-Core with no Augmented Regret

Abstract:We consider the problem of sequential sparse subset selections in an online learning setup. Assume that the set $[N]$ consists of $N$ distinct elements. On the $t^{\text{th}}$ round, a monotone reward function $f_t: 2^{[N]} \to \mathbb{R}_+,$ which assigns a non-negative reward to each subset of $[N],$ is revealed to a learner. The learner selects (perhaps randomly) a subset $S_t \subseteq [N]$ of $k$ elements before the reward function $f_t$ for that round is revealed $(k \leq N)$. As a consequence of its choice, the learner receives a reward of $f_t(S_t)$ on the $t^{\text{th}}$ round. The learner's goal is to design an online subset selection policy to maximize its expected cumulative reward accrued over a given time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new concept of $\alpha$-Core, which is a generalization of the notion of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called $\alpha$-augmented regret. In this new metric, the power of the offline benchmark is suitably augmented compared to the online policy. We give several illustrative examples to show that a broad class of reward functions, including submodular, can be efficiently learned with the SCore policy. We also outline how the SCore policy can be used under a semi-bandit feedback model and conclude the paper with a number of open problems.

Via

Access Paper or Ask Questions

Optimistic No-regret Algorithms for Discrete Caching

Aug 15, 2022

Naram Mhaisen, Abhishek Sinha, Georgios Paschos, Georgios Iosifidis

Figure 1 for Optimistic No-regret Algorithms for Discrete Caching

Figure 2 for Optimistic No-regret Algorithms for Discrete Caching

Figure 3 for Optimistic No-regret Algorithms for Discrete Caching

Figure 4 for Optimistic No-regret Algorithms for Discrete Caching

Abstract:We take a systematic look at the problem of storing whole files in a cache with limited capacity in the context of optimistic learning, where the caching policy has access to a prediction oracle (provided by, e.g., a Neural Network). The successive file requests are assumed to be generated by an adversary, and no assumption is made on the accuracy of the oracle. In this setting, we provide a universal lower bound for prediction-assisted online caching and proceed to design a suite of policies with a range of performance-complexity trade-offs. All proposed policies offer sublinear regret bounds commensurate with the accuracy of the oracle. Our results substantially improve upon all recently-proposed online caching policies, which, being unable to exploit the oracle predictions, offer only $O(\sqrt{T})$ regret. In this pursuit, we design, to the best of our knowledge, the first comprehensive optimistic Follow-the-Perturbed leader policy, which generalizes beyond the caching problem. We also study the problem of caching files with different sizes and the bipartite network caching problem. Finally, we evaluate the efficacy of the proposed policies through extensive numerical experiments using real-world traces.

Via

Access Paper or Ask Questions

Universal Caching

May 10, 2022

Ativ Joshi, Abhishek Sinha

Abstract:In the learning literature, the performance of an online policy is commonly measured in terms of the static regret metric, which compares the cumulative loss of an online policy to that of an optimal benchmark in hindsight. In the definition of static regret, the benchmark policy remains fixed throughout the time horizon. Naturally, the resulting regret bounds become loose in non-stationary settings where fixed benchmarks often suffer from poor performance. In this paper, we investigate a stronger notion of regret minimization in the context of an online caching problem. In particular, we allow the action of the offline benchmark at any round to be decided by a finite state predictor containing arbitrarily many states. Using ideas from the universal prediction literature in information theory, we propose an efficient online caching policy with an adaptive sub-linear regret bound. To the best of our knowledge, this is the first data-dependent regret bound known for the universal caching problem. We establish this result by combining a recently-proposed online caching policy with an incremental parsing algorithm, e.g., Lempel-Ziv '78. Our methods also yield a simpler learning-theoretic proof of the improved regret bound as opposed to the more involved and problem-specific combinatorial arguments used in the earlier works.

Via

Access Paper or Ask Questions

$k\texttt{-experts}$ -- Online Policies and Fundamental Limits

Oct 15, 2021

Samrat Mukhopadhyay, Sourav Sahoo, Abhishek Sinha

$Figure 1 for $k\texttt{-experts}$ -- Online Policies and Fundamental Limits$

$Figure 2 for $k\texttt{-experts}$ -- Online Policies and Fundamental Limits$

$Figure 3 for $k\texttt{-experts}$ -- Online Policies and Fundamental Limits$

$Figure 4 for $k\texttt{-experts}$ -- Online Policies and Fundamental Limits$

Abstract:This paper introduces and studies the $k\texttt{-experts}$ problem -- a generalization of the classic Prediction with Expert's Advice (i.e., the $\texttt{Experts}$) problem. Unlike the $\texttt{Experts}$ problem, where the learner chooses exactly one expert, in this problem, the learner selects a subset of $k$ experts from a pool of $N$ experts at each round. The reward obtained by the learner at any round depends on the rewards of the selected experts. The $k\texttt{-experts}$ problem arises in many practical settings, including online ad placements, personalized news recommendations, and paging. Our primary goal is to design an online learning policy having a small regret. In this pursuit, we propose $\texttt{SAGE}$ ($\textbf{Sa}$mpled Hed$\textbf{ge}$) - a framework for designing efficient online learning policies by leveraging statistical sampling techniques. We show that, for many related problems, $\texttt{SAGE}$ improves upon the state-of-the-art bounds for regret and computational complexity. Furthermore, going beyond the notion of regret, we characterize the mistake bounds achievable by online learning policies for a class of stable loss functions. We conclude the paper by establishing a tight regret lower bound for a variant of the $k\texttt{-experts}$ problem and carrying out experiments with standard datasets.

Via

Access Paper or Ask Questions

D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

Jun 12, 2021

Abhishek Sinha, Jiaming Song, Chenlin Meng, Stefano Ermon

Figure 1 for D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

Figure 2 for D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

Figure 3 for D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

Figure 4 for D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

Abstract:Conditional generative models of high-dimensional images have many applications, but supervision signals from conditions to images can be expensive to acquire. This paper describes Diffusion-Decoding models with Contrastive representations (D2C), a paradigm for training unconditional variational autoencoders (VAEs) for few-shot conditional image generation. D2C uses a learned diffusion-based prior over the latent representations to improve generation and contrastive self-supervised learning to improve representation quality. D2C can adapt to novel generation tasks conditioned on labels or manipulation constraints, by learning from as few as 100 labeled examples. On conditional generation from new labels, D2C achieves superior performance over state-of-the-art VAEs and diffusion models. On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% - 60% of the human evaluators in a double-blind study.

Via

Access Paper or Ask Questions

Negative Data Augmentation

Feb 09, 2021

Abhishek Sinha, Kumar Ayush, Jiaming Song, Burak Uzkent, Hongxia Jin, Stefano Ermon

Abstract:Data augmentation is often used to enlarge datasets with synthetic samples generated in accordance with the underlying data distribution. To enable a wider range of augmentations, we explore negative data augmentation strategies (NDA)that intentionally create out-of-distribution samples. We show that such negative out-of-distribution samples provide information on the support of the data distribution, and can be leveraged for generative modeling and representation learning. We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator. We prove that under suitable conditions, optimizing the resulting objective still recovers the true data distribution but can directly bias the generator towards avoiding samples that lack the desired structure. Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities. Further, we incorporate the same negative data augmentation strategy in a contrastive learning framework for self-supervised representation learning on images and videos, achieving improved performance on downstream image classification, object detection, and action recognition tasks. These results suggest that prior knowledge on what does not constitute valid data is an effective form of weak supervision across a range of unsupervised learning tasks.

* Accepted at ICLR 2021

Via

Access Paper or Ask Questions

Online Caching with Optimal Switching Regret

Jan 18, 2021

Samrat Mukhopadhyay, Abhishek Sinha

Figure 1 for Online Caching with Optimal Switching Regret

Figure 2 for Online Caching with Optimal Switching Regret

Figure 3 for Online Caching with Optimal Switching Regret

Abstract:We consider the classical uncoded caching problem from an online learning point-of-view. A cache of limited storage capacity can hold $C$ files at a time from a large catalog. A user requests an arbitrary file from the catalog at each time slot. Before the file request from the user arrives, a caching policy populates the cache with any $C$ files of its choice. In the case of a cache-hit, the policy receives a unit reward and zero rewards otherwise. In addition to that, there is a cost associated with fetching files to the cache, which we refer to as the switching cost. The objective is to design a caching policy that incurs minimal regret while considering both the rewards due to cache-hits and the switching cost due to the file fetches. The main contribution of this paper is the switching regret analysis of a Follow the Perturbed Leader-based anytime caching policy, which is shown to have an order optimal switching regret. In this pursuit, we improve the best-known switching regret bound for this problem by a factor of $\Theta(\sqrt{C}).$ We conclude the paper by comparing the performance of different popular caching policies using a publicly available trace from a commercial CDN server.

* 11 pages, 3 figures, to be submitted to ISIT, 2021

Via

Access Paper or Ask Questions

Caching in Networks without Regret

Sep 17, 2020

Debjit Paria, Krishnakumar, Abhishek Sinha

Figure 1 for Caching in Networks without Regret

Figure 2 for Caching in Networks without Regret

Figure 3 for Caching in Networks without Regret

Figure 4 for Caching in Networks without Regret

Abstract:We consider the online $\textsf{Bipartite Caching}$ problem where $n$ users are connected to $m$ caches in the form of a bipartite network. Each of the $m$ caches has a file storage capacity of $C$. There is a library consisting of $N >C$ distinct files. Each user can request any one of the files from the library at each time slot. We allow the file request sequences to be chosen in an adversarial fashion. A user's request at a time slot is satisfied if the requested file is already hosted on at least one of the caches connected to the user at that time slot. Our objective is to design an efficient online caching policy with minimal regret. In this paper, we propose $\textsf{LeadCache,}$ an online caching policy based on the $\textsf{Follow the Perturbed Leader}$ (FTPL) paradigm. We show that $\textsf{LeadCache}$ is regret optimal up to a multiplicative factor of $\tilde{O}(n^{0.375}).$ As a byproduct of our analysis, we design a new linear-time deterministic Pipage rounding procedure for the LP relaxation of a well-known NP-hard combinatorial optimization problem in this area. Our new rounding algorithm substantially improves upon the currently best-known complexity for this problem. Moreover, we show the surprising result that under mild Strong-Law-type assumptions on the file request sequence, the rate of file fetches to the caches approaches to zero under the $\textsf{LeadCache}$ policy. Finally, we derive a tight universal regret lower bound for the $\textsf{Bipartite Caching}$ problem, which critically makes use of results from graph coloring theory and certifies the announced approximation ratio.

* This is version 1 of the article analyzing the theoretical aspects of the problem. The article will be updated soon with numerical results on real traces

Via

Access Paper or Ask Questions

On the Benefits of Models with Perceptually-Aligned Gradients

May 04, 2020

Gunjan Aggarwal, Abhishek Sinha, Nupur Kumari, Mayank Singh

Figure 1 for On the Benefits of Models with Perceptually-Aligned Gradients

Figure 2 for On the Benefits of Models with Perceptually-Aligned Gradients

Figure 3 for On the Benefits of Models with Perceptually-Aligned Gradients

Figure 4 for On the Benefits of Models with Perceptually-Aligned Gradients

Abstract:Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [\cite{tsipras2018robustness}], such robust models inherit useful interpretable properties where the gradient aligns perceptually well with images, and adding a large targeted adversarial perturbation leads to an image resembling the target class. We perform experiments to show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks. Specifically, we perform adversarial training with attack for different max-perturbation bound. Adversarial training with low max-perturbation bound results in models that have interpretable features with only slight drop in performance over clean samples. In this paper, we leverage models with interpretable perceptually-aligned features and show that adversarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks.

* Accepted at ICLR 2020 Workshop: Towards Trustworthy ML

Via

Access Paper or Ask Questions

Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

Feb 13, 2020

Pinkesh Badjatiya, Mausoom Sarkar, Abhishek Sinha, Siddharth Singh, Nikaash Puri, Jayakumar Subramanian, Balaji Krishnamurthy

Figure 1 for Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

Figure 2 for Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

Figure 3 for Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

Figure 4 for Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

Abstract:In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss) that encourages an agent to stick to the status quo, rather than repeatedly changing its policy. We show how agents trained with SQLoss evolve cooperative behavior in several social dilemma matrix games. To work with social dilemma games that have visual input, we propose GameDistill. GameDistill uses self-supervision and clustering to automatically extract cooperative and selfish policies from a social dilemma game. We combine GameDistill and SQLoss to show how agents evolve socially desirable cooperative behavior in the Coin Game.

Via

Access Paper or Ask Questions