Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ambuj Tewari

University of Texas

Conformalized Late Fusion Multi-View Learning

May 25, 2024

Eduardo Ochoa Rivera, Yash Patel, Ambuj Tewari

Figure 1 for Conformalized Late Fusion Multi-View Learning

Figure 2 for Conformalized Late Fusion Multi-View Learning

Figure 3 for Conformalized Late Fusion Multi-View Learning

Figure 4 for Conformalized Late Fusion Multi-View Learning

Abstract:Uncertainty quantification for multi-view learning is motivated by the increasing use of multi-view data in scientific problems. A common variant of multi-view learning is late fusion: train separate predictors on individual views and combine them after single-view predictions are available. Existing methods for uncertainty quantification for late fusion often rely on undesirable distributional assumptions for validity. Conformal prediction is one approach that avoids such distributional assumptions. However, naively applying conformal prediction to late-stage fusion pipelines often produces overly conservative and uninformative prediction regions, limiting its downstream utility. We propose a novel methodology, Multi-View Conformal Prediction (MVCP), where conformal prediction is instead performed separately on the single-view predictors and only fused subsequently. Our framework extends the standard scalar formulation of a score function to a multivariate score that produces more efficient downstream prediction regions in both classification and regression settings. We then demonstrate that such improvements can be realized in methods built atop conformalized regressors, specifically in robust predict-then-optimize pipelines.

Via

Access Paper or Ask Questions

Smoothed Online Classification can be Harder than Batch Classification

May 24, 2024

Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract:We study online classification under smoothed adversaries. In this setting, at each time point, the adversary draws an example from a distribution that has a bounded density with respect to a fixed base measure, which is known apriori to the learner. For binary classification and scalar-valued regression, previous works \citep{haghtalab2020smoothed, block2022smoothed} have shown that smoothed online learning is as easy as learning in the iid batch setting under PAC model. However, we show that smoothed online classification can be harder than the iid batch classification when the label space is unbounded. In particular, we construct a hypothesis class that is learnable in the iid batch setting under the PAC model but is not learnable under the smoothed online model. Finally, we identify a condition that ensures that the PAC learnability of a hypothesis class is sufficient for its smoothed online learnability.

* 18 pages

Via

Access Paper or Ask Questions

Online Classification with Predictions

May 22, 2024

Vinod Raman, Ambuj Tewari

Abstract:We study online classification when the learner has access to predictions about future examples. We design an online learner whose expected regret is never worse than the worst-case regret, gracefully improves with the quality of the predictions, and can be significantly better than the worst-case regret when the predictions of future examples are accurate. As a corollary, we show that if the learner is always guaranteed to observe data where future examples are easily predictable, then online learning can be as easy as transductive online learning. Our results complement recent work in online algorithms with predictions and smoothed online classification, which go beyond a worse-case analysis by using machine-learned predictions and distributional assumptions respectively.

* 24 pages

Via

Access Paper or Ask Questions

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Mar 06, 2024

Ziping Xu, Zifan Xu, Runxuan Jiang, Peter Stone, Ambuj Tewari

Figure 1 for Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Figure 2 for Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Figure 3 for Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Figure 4 for Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Abstract:Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration--a crucial aspect of RL--has been largely overlooked. This paper addresses this gap by showing that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design like $\epsilon$-greedy that are inefficient in general can be sample-efficient for MTRL. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL. It may also shed light on the enigmatic success of the wide applications of myopic exploration in practice. To validate the role of diversity, we conduct experiments on synthetic robotic control environments, where the diverse task set aligns with the task selection by automatic curriculum learning, which is empirically shown to improve sample-efficiency.

Via

Access Paper or Ask Questions

Optimal Thresholding Linear Bandit

Feb 11, 2024

Eduardo Ochoa Rivera, Ambuj Tewari

Abstract:We study a novel pure exploration problem: the $\epsilon$-Thresholding Bandit Problem (TBP) with fixed confidence in stochastic linear bandits. We prove a lower bound for the sample complexity and extend an algorithm designed for Best Arm Identification in the linear case to TBP that is asymptotically optimal.

* arXiv admin note: substantial text overlap with arXiv:2006.16073 by other authors

Via

Access Paper or Ask Questions

The Complexity of Sequential Prediction in Dynamical Systems

Feb 09, 2024

Vinod Raman, Unique Subedi, Ambuj Tewari

Abstract:We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the realizable and agnostic setting respectively.

* 35 pages

Via

Access Paper or Ask Questions

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs

Feb 07, 2024

Kihyuk Hong, Ambuj Tewari

Abstract:Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward using a pre-collected dataset. Offline RL with low-rank MDPs or general function approximation has been widely studied recently, but existing algorithms with sample complexity $O(\epsilon^{-2})$ for finding an $\epsilon$-optimal policy either require a uniform data coverage assumptions or are computationally inefficient. In this paper, we propose a primal dual algorithm for offline RL with low-rank MDPs in the discounted infinite-horizon setting. Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(\epsilon^{-2})$ with partial data coverage assumption. This improves upon a recent work that requires $O(\epsilon^{-4})$ samples. Moreover, our algorithm extends the previous work to the offline constrained RL setting by supporting constraints on additional reward signals.

Via

Access Paper or Ask Questions

A Framework for Partially Observed Reward-States in RLHF

Feb 05, 2024

Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari

Abstract:The study of reinforcement learning from human feedback (RLHF) has gained prominence in recent years due to its role in the development of LLMs. Neuroscience research shows that human responses to stimuli are known to depend on partially-observed "internal states." Unfortunately current models of RLHF do not take take this into consideration. Moreover most RLHF models do not account for intermediate feedback, which is gaining importance in empirical work and can help improve both sample complexity and alignment. To address these limitations, we model RLHF as reinforcement learning with partially observed reward-states (PORRL). We show reductions from the the two dominant forms of human feedback in RLHF - cardinal and dueling feedback to PORRL. For cardinal feedback, we develop generic statistically efficient algorithms and instantiate them to present POR-UCRL and POR-UCBVI. For dueling feedback, we show that a naive reduction to cardinal feedback fails to achieve sublinear dueling regret. We then present the first explicit reduction that converts guarantees for cardinal regret to dueling regret. We show that our models and guarantees in both settings generalize and extend existing ones. Finally, we identify a recursive structure on our model that could improve the statistical and computational tractability of PORRL, giving examples from past work on RLHF as well as learning perfect reward machines, which PORRL subsumes.

* 47 pages. 13 pages for the main paper, 34 pages for the references and appendix

Via

Access Paper or Ask Questions

Revisiting the Learnability of Apple Tasting

Oct 29, 2023

Vinod Raman, Unique Subedi, Ananth Raman, Ambuj Tewari

Abstract:In online binary classification under \textit{apple tasting} feedback, the learner only observes the true label if it predicts "1". First studied by \cite{helmbold2000apple}, we revisit this classical partial-feedback setting and study online learnability from a combinatorial perspective. We show that the Littlestone dimension continues to prove a tight quantitative characterization of apple tasting in the agnostic setting, closing an open question posed by \cite{helmbold2000apple}. In addition, we give a new combinatorial parameter, called the Effective width, that tightly quantifies the minimax expected mistakes in the realizable setting. As a corollary, we use the Effective width to establish a \textit{trichotomy} of the minimax expected number of mistakes in the realizable setting. In particular, we show that in the realizable setting, the expected number of mistakes for any learner under apple tasting feedback can only be $\Theta(1), \Theta(\sqrt{T})$, or $\Theta(T)$.

* 18 pages

Via

Access Paper or Ask Questions

Sequence Length Independent Norm-Based Generalization Bounds for Transformers

Oct 19, 2023

Jacob Trauger, Ambuj Tewari

Abstract:This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear transformations to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization bound applies to the common Transformer training technique of masking and then predicting the masked word. We also run a simulated study on a sparse majority data set that empirically validates our theoretical findings.

* 18 pages

Via

Access Paper or Ask Questions