Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shana Moothedath

Multi-Task Representation Learning for Conservative Linear Bandits

May 12, 2026

Jiabin Lin, Shana Moothedath

Abstract:This paper presents the Constrained Multi-Task Representation Learning (CMTRL) framework for linear bandits. We consider T linear bandit tasks in a d dimensional space, which share a common low-dimensional representation of dimension r, where r is much smaller than the minimum of d and T. Furthermore, tasks are constrained so that only actions meeting specific safety or performance requirements are allowed, referred to as conservative (safe) bandits. We introduce a novel algorithm, Safe-Alternating projected Gradient Descent and minimization (Safe-AltGDmin), to recover a low-rank feature matrix while satisfying the given constraints. Building on this algorithm, we propose a multi-task representation learning framework for conservative linear bandits and establish theoretical guarantees for its regret and sample complexity bounds. We presented experiments and compared the performance of our algorithm with benchmark algorithms.

Via

Access Paper or Ask Questions

Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards

Apr 04, 2026

Yaoze Guo, Shana Moothedath

Abstract:Multi-task representation learning (MTRL) is an approach that learns shared latent representations across related tasks, facilitating collaborative learning that improves the overall learning efficiency. This paper studies MTRL for multi-task reinforcement learning (RL), where multiple tasks have the same state-action space and transition probabilities, but different rewards. We consider T linear Markov Decision Processes (MDPs) where the reward functions and transition dynamics admit linear feature embeddings of dimension d. The relatedness among the tasks is captured by a low-rank structure on the reward matrices. Learning shared representations across multiple RL tasks is challenging due to the complex and policy-dependent nature of data that leads to a temporal progression of error. Our approach adopts a reward-free reinforcement learning framework to first learn a data-collection policy. This policy then informs an exploration strategy for estimating the unknown reward matrices. Importantly, the data collected under this well-designed policy enable accurate estimation, which ultimately supports the learning of an near-optimal policy. Unlike existing approaches that rely on restrictive assumptions such as Gaussian features, incoherence conditions, or access to optimal solutions, we propose a low-rank matrix estimation method that operates under more general feature distributions encountered in RL settings. Theoretical analysis establishes that accurate low-rank matrix recovery is achievable under these relaxed assumptions, and we characterize the relationship between representation error and sample complexity. Leveraging the learned representation, we construct near-optimal policies and prove a regret bound. Experimental results demonstrate that our method effectively learns robust shared representations and task dynamics from finite data.

Via

Access Paper or Ask Questions

Learning Shared Representations for Multi-Task Linear Bandits

Apr 01, 2026

Jiabin Lin, Shana Moothedath

Abstract:Multi-task representation learning is an approach that learns shared latent representations across related tasks, facilitating knowledge transfer and improving sample efficiency. This paper introduces a novel approach to multi-task representation learning in linear bandits. We consider a setting with T concurrent linear bandit tasks, each with feature dimension d, that share a common latent representation of dimension r \ll min{d,T}$, capturing their underlying relatedness. We propose a new Optimism in the Face of Uncertainty Linear (OFUL) algorithm that leverages shared low-rank representations to enhance decision-making in a sample-efficient manner. Our algorithm first collects data through an exploration phase, estimates the shared model via spectral initialization, and then conducts OFUL based learning over a newly constructed confidence set. We provide theoretical guarantees for the confidence set and prove that the unknown reward vectors lie within the confidence set with high probability. We derive cumulative regret bounds and show that the proposed approach achieves \tilde{O}(\sqrt{drNT}), a significant improvement over solving the T tasks independently, resulting in a regret of \tilde{O}(dT\sqrt{N}). We performed numerical simulations to validate the performance of our algorithm for different problem sizes.

Via

Access Paper or Ask Questions

Post-Decision State-Based Online Learning for Delay-Energy-Aware Flow Allocation in Wireless Systems

Jan 06, 2026

Mahesh Ganesh Bhat, Shana Moothedath, Prasanna Chaporkar

Abstract:We develop a structure-aware reinforcement learning (RL) approach for delay- and energy-aware flow allocation in 5G User Plane Functions (UPFs). We consider a dynamic system with $K$ heterogeneous UPFs of varying capacities that handle stochastic arrivals of $M$ flow types, each with distinct rate requirements. We model the system as a Markov decision process (MDP) to capture the stochastic nature of flow arrivals and departures (possibly unknown), as well as the impact of flow allocation in the system. To solve this problem, we propose a post-decision state (PDS) based value iteration algorithm that exploits the underlying structure of the MDP. By separating action-controlled dynamics from exogenous factors, PDS enables faster convergence and efficient adaptive flow allocation, even in the absence of statistical knowledge about exogenous variables. Simulation results demonstrate that the proposed method converges faster and achieves lower long-term cost than standard Q-learning, highlighting the effectiveness of PDS-based RL for resource allocation in wireless networks.

* This work has been submitted to IEEE ICC 2026 for possible publication

Via

Access Paper or Ask Questions

Diffusion-based Decentralized Federated Multi-Task Representation Learning

Dec 29, 2025

Donghwa Kang, Shana Moothedath

Abstract:Representation learning is a widely adopted framework for learning in data-scarce environments to obtain a feature extractor or representation from various different yet related tasks. Despite extensive research on representation learning, decentralized approaches remain relatively underexplored. This work develops a decentralized projected gradient descent-based algorithm for multi-task representation learning. We focus on the problem of multi-task linear regression in which multiple linear regression models share a common, low-dimensional linear representation. We present an alternating projected gradient descent and minimization algorithm for recovering a low-rank feature matrix in a diffusion-based decentralized and federated fashion. We obtain constructive, provable guarantees that provide a lower bound on the required sample complexity and an upper bound on the iteration complexity of our proposed algorithm. We analyze the time and communication complexity of our algorithm and show that it is fast and communication-efficient. We performed numerical simulations to validate the performance of our algorithm and compared it with benchmark algorithms.

Via

Access Paper or Ask Questions

Beyond Centralization: Provable Communication Efficient Decentralized Multi-Task Learning

Dec 27, 2025

Donghwa Kang, Shana Moothedath

Abstract:Representation learning is a widely adopted framework for learning in data-scarce environments, aiming to extract common features from related tasks. While centralized approaches have been extensively studied, decentralized methods remain largely underexplored. We study decentralized multi-task representation learning in which the features share a low-rank structure. We consider multiple tasks, each with a finite number of data samples, where the observations follow a linear model with task-specific parameters. In the decentralized setting, task data are distributed across multiple nodes, and information exchange between nodes is constrained by a communication network. The goal is to recover the underlying feature matrix whose rank is much smaller than both the parameter dimension and the number of tasks. We propose a new alternating projected gradient and minimization algorithm with provable accuracy guarantees. We provide comprehensive characterizations of the time, communication, and sample complexities. Importantly, the communication complexity is independent of the target accuracy, which significantly reduces communication cost compared to prior methods. Numerical simulations validate the theoretical analysis across different dimensions and network topologies, and demonstrate regimes in which decentralized learning outperforms centralized federated approaches.

Via

Access Paper or Ask Questions

Neural Contextual Bandits Under Delayed Feedback Constraints

Apr 16, 2025

Mohammadali Moghimi, Sharu Theresa Jose, Shana Moothedath

Figure 1 for Neural Contextual Bandits Under Delayed Feedback Constraints

Figure 2 for Neural Contextual Bandits Under Delayed Feedback Constraints

Abstract:This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback, where the reward for a chosen action is revealed after a random, unknown delay. This scenario is common in applications such as online recommendation systems and clinical trials, where reward feedback is delayed because the outcomes or results of a user's actions (such as recommendations or treatment responses) take time to manifest and be measured. The proposed algorithm, called Delayed NeuralUCB, uses an upper confidence bound (UCB)-based exploration strategy. Under the assumption of independent and identically distributed sub-exponential reward delays, we derive an upper bound on the cumulative regret over a T-length horizon. We further consider a variant of the algorithm, called Delayed NeuralTS, that uses Thompson Sampling-based exploration. Numerical experiments on real-world datasets, such as MNIST and Mushroom, along with comparisons to benchmark approaches, demonstrate that the proposed algorithms effectively manage varying delays and are well-suited for complex real-world scenarios.

Via

Access Paper or Ask Questions

Byzantine Resilient Federated Multi-Task Representation Learning

Mar 24, 2025

Tuan Le, Shana Moothedath

Figure 1 for Byzantine Resilient Federated Multi-Task Representation Learning

Figure 2 for Byzantine Resilient Federated Multi-Task Representation Learning

Figure 3 for Byzantine Resilient Federated Multi-Task Representation Learning

Figure 4 for Byzantine Resilient Federated Multi-Task Representation Learning

Abstract:In this paper, we propose BR-MTRL, a Byzantine-resilient multi-task representation learning framework that handles faulty or malicious agents. Our approach leverages representation learning through a shared neural network model, where all clients share fixed layers, except for a client-specific final layer. This structure captures shared features among clients while enabling individual adaptation, making it a promising approach for leveraging client data and computational power in heterogeneous federated settings to learn personalized models. To learn the model, we employ an alternating gradient descent strategy: each client optimizes its local model, updates its final layer, and sends estimates of the shared representation to a central server for aggregation. To defend against Byzantine agents, we employ geometric median aggregation for robust client-server communication. Our method enables personalized learning while maintaining resilience in distributed settings. We implemented the proposed alternating gradient descent algorithm in a federated testbed built using Amazon Web Services (AWS) platform and compared its performance with various benchmark algorithms and their variations. Through extensive experiments using real-world datasets, including CIFAR-10 and FEMINIST, we demonstrated the effectiveness and robustness of our approach and its transferability to new unseen clients with limited data, even in the presence of Byzantine adversaries.

Via

Access Paper or Ask Questions

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

Oct 02, 2024

Jiabin Lin, Shana Moothedath, Namrata Vaswani

Figure 1 for Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

Figure 2 for Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

Abstract:We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.

Via

Access Paper or Ask Questions

Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis

Jan 21, 2024

Sharu Theresa Jose, Shana Moothedath

Abstract:We explore a stochastic contextual linear bandit problem where the agent observes a noisy, corrupted version of the true context through a noise channel with an unknown noise parameter. Our objective is to design an action policy that can approximate" that of an oracle, which has access to the reward model, the channel parameter, and the predictive distribution of the true context from the observed noisy context. In a Bayesian framework, we introduce a Thompson sampling algorithm for Gaussian bandits with Gaussian context noise. Adopting an information-theoretic analysis, we demonstrate the Bayesian regret of our algorithm concerning the oracle's action policy. We also extend this problem to a scenario where the agent observes the true context with some delay after receiving the reward and show that delayed true contexts lead to lower Bayesian regret. Finally, we empirically demonstrate the performance of the proposed algorithms against baselines.

Via

Access Paper or Ask Questions