Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Takuma Udagawa

Offline Contextual Bandits in the Presence of New Actions

May 18, 2026

Ren Kishimoto, Tatsuhiro Shimizu, Kazuki Kawamura, Takanori Muroi, Yusuke Narita, Yuki Sasamoto, Kei Tateno, Takuma Udagawa, Yuta Saito

Abstract:Automated decision-making algorithms drive applications such as recommendation systems and search engines. These algorithms often rely on off-policy contextual bandits or off-policy learning (OPL). Conventionally, OPL selects actions that maximize the expected reward from an existing action set. However, in many real-world scenarios, actions, such as news articles or video content, change continuously, and the action space evolves over time after data collection. We define actions introduced after deploying the logging policy as new actions and focus on OPL with new actions. Existing OPL methods identify optimal actions from the existing set effectively but cannot learn and select new actions because no relevant data are logged. To address this limitation, we propose a new OPL method that leverages action features. We first introduce the Local Combination PseudoInverse (LCPI) estimator for the policy gradient, generalizing the PseudoInverse estimator initially proposed for off-policy evaluation of slate bandits. LCPI controls the trade-off between reward-modeling condition and the condition for data collection regarding the action features, capturing the interaction effects among different dimensions of action features. Furthermore, we propose a generalized algorithm called Policy Optimization for Effective New Actions (PONA), which integrates LCPI, a component specialized for new action selection, with Doubly Robust (DR), which excels at learning within existing actions. We define PONA as a weighted sum of the LCPI and DR estimators, optimizing both the selection of existing and new actions, and allowing the proportion of new action selections to be adjusted by the weight parameter. Through extensive experiments, we demonstrate that PONA efficiently selects new actions while maintaining the overall policy performance as opposed to most existing methods that cannot select new actions.

* 12pages, 7 figures

Via

Access Paper or Ask Questions

Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies

Mar 23, 2026

Koichi Tanaka, Kazuki Kawamura, Takanori Muroi, Yusuke Narita, Yuki Sasamoto, Kei Tateno, Takuma Udagawa, Wei-Wei Du, Yuta Saito

Abstract:Off-Policy Evaluation (OPE) is an important practical problem in algorithmic ranking systems, where the goal is to estimate the expected performance of a new ranking policy using only offline logged data collected under a different, logging policy. Existing estimators, such as the ranking-wise and position-wise inverse propensity score (IPS) estimators, require the data collection policy to be sufficiently stochastic and suffer from severe bias when the logging policy is fully deterministic. In this paper, we propose novel estimators, Click-based Inverse Propensity Score (CIPS), exploiting the intrinsic stochasticity of user click behavior to address this challenge. Unlike existing methods that rely on the stochasticity of the logging policy, our approach uses click probability as a new form of importance weighting, enabling low-bias OPE even under deterministic logging policies where existing methods incur substantial bias. We provide theoretical analyses of the bias and variance properties of the proposed estimators and show, through synthetic and real-world experiments, that our estimators achieve significantly lower bias compared to strong baselines, for a range of experimental settings with completely deterministic logging policies.

* Published as a conference paper at ICLR 2026

Via

Access Paper or Ask Questions

Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Apr 19, 2025

Takuma Udagawa, Yang Zhao, Hiroshi Kanayama, Bishwaranjan Bhattacharjee

Figure 1 for Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Figure 2 for Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Figure 3 for Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Figure 4 for Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Abstract:Large language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.

Via

Access Paper or Ask Questions

Robust ASR Error Correction with Conservative Data Filtering

Jul 18, 2024

Takuma Udagawa, Masayuki Suzuki, Masayasu Muraoka, Gakuto Kurata

Figure 1 for Robust ASR Error Correction with Conservative Data Filtering

Figure 2 for Robust ASR Error Correction with Conservative Data Filtering

Figure 3 for Robust ASR Error Correction with Conservative Data Filtering

Figure 4 for Robust ASR Error Correction with Conservative Data Filtering

Abstract:Error correction (EC) based on large language models is an emerging technology to enhance the performance of automatic speech recognition (ASR) systems. Generally, training data for EC are collected by automatically pairing a large set of ASR hypotheses (as sources) and their gold references (as targets). However, the quality of such pairs is not guaranteed, and we observed various types of noise which can make the EC models brittle, e.g. inducing overcorrection in out-of-domain (OOD) settings. In this work, we propose two fundamental criteria that EC training data should satisfy: namely, EC targets should (1) improve linguistic acceptability over sources and (2) be inferable from the available context (e.g. source phonemes). Through these criteria, we identify low-quality EC pairs and train the models not to make any correction in such cases, the process we refer to as conservative data filtering. In our experiments, we focus on Japanese ASR using a strong Conformer-CTC as the baseline and finetune Japanese LLMs for EC. Through our evaluation on a suite of 21 internal benchmarks, we demonstrate that our approach can significantly reduce overcorrection and improve both the accuracy and quality of ASR results in the challenging OOD settings.

Via

Access Paper or Ask Questions

INDUS: Effective and Efficient Language Models for Scientific Applications

May 17, 2024

Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey(+24 more)

Figure 1 for INDUS: Effective and Efficient Language Models for Scientific Applications

Figure 2 for INDUS: Effective and Efficient Language Models for Scientific Applications

Figure 3 for INDUS: Effective and Efficient Language Models for Scientific Applications

Figure 4 for INDUS: Effective and Efficient Language Models for Scientific Applications

Abstract:Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.

Via

Access Paper or Ask Questions

A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

Oct 13, 2023

Takuma Udagawa, Aashka Trivedi, Michele Merler, Bishwaranjan Bhattacharjee

Figure 1 for A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

Figure 2 for A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

Figure 3 for A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

Figure 4 for A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

Abstract:Large language models have become a vital component in modern NLP, achieving state of the art performance in a variety of tasks. However, they are often inefficient for real-world deployment due to their expensive inference costs. Knowledge distillation is a promising technique to improve their efficiency while retaining most of their effectiveness. In this paper, we reproduce, compare and analyze several representative methods for task-agnostic (general-purpose) distillation of Transformer language models. Our target of study includes Output Distribution (OD) transfer, Hidden State (HS) transfer with various layer mapping strategies, and Multi-Head Attention (MHA) transfer based on MiniLMv2. Through our extensive experiments, we study the effectiveness of each method for various student architectures in both monolingual (English) and multilingual settings. Overall, we show that MHA transfer based on MiniLMv2 is generally the best option for distillation and explain the potential reasons behind its success. Moreover, we show that HS transfer remains as a competitive baseline, especially under a sophisticated layer mapping strategy, while OD transfer consistently lags behind other approaches. Findings from this study helped us deploy efficient yet effective student models for latency-critical applications.

* Accepted to EMNLP 2023 Industry Track

Via

Access Paper or Ask Questions

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Sep 07, 2023

Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Masayasu Muraoka, George Saon

Figure 1 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Figure 2 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Figure 3 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Figure 4 for Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Abstract:Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single representation.

* Submitted to ICASSP 2024

Via

Access Paper or Ask Questions

Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

Mar 16, 2023

Aashka Trivedi, Takuma Udagawa, Michele Merler, Rameswar Panda, Yousef El-Kurdi, Bishwaranjan Bhattacharjee

Figure 1 for Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

Figure 2 for Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

Figure 3 for Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

Figure 4 for Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

Abstract:Large pre-trained language models have achieved state-of-the-art results on a variety of downstream tasks. Knowledge Distillation (KD) of a smaller student model addresses their inefficiency, allowing for deployment in resource-constraint environments. KD however remains ineffective, as the student is manually selected from a set of existing options already pre-trained on large corpora, a sub-optimal choice within the space of all possible student architectures. This paper proposes KD-NAS, the use of Neural Architecture Search (NAS) guided by the Knowledge Distillation process to find the optimal student model for distillation from a teacher, for a given natural language task. In each episode of the search process, a NAS controller predicts a reward based on a combination of accuracy on the downstream task and latency of inference. The top candidate architectures are then distilled from the teacher on a small proxy set. Finally the architecture(s) with the highest reward is selected, and distilled on the full downstream task training set. When distilling on the MNLI task, our KD-NAS model produces a 2 point improvement in accuracy on GLUE tasks with equivalent GPU latency with respect to a hand-crafted student architecture available in the literature. Using Knowledge Distillation, this model also achieves a 1.4x speedup in GPU Latency (3.2x speedup on CPU) with respect to a BERT-Base Teacher, while maintaining 97% performance on GLUE Tasks (without CoLA). We also obtain an architecture with equivalent performance as the hand-crafted student model on the GLUE benchmark, but with a 15% speedup in GPU latency (20% speedup in CPU latency) and 0.8 times the number of parameters

* 12 pages, 3 figures

Via

Access Paper or Ask Questions

Sentence Identification with BOS and EOS Label Combinations

Jan 31, 2023

Takuma Udagawa, Hiroshi Kanayama, Issei Yoshida

Abstract:The sentence is a fundamental unit in many NLP applications. Sentence segmentation is widely used as the first preprocessing task, where an input text is split into consecutive sentences considering the end of the sentence (EOS) as their boundaries. This task formulation relies on a strong assumption that the input text consists only of sentences, or what we call the sentential units (SUs). However, real-world texts often contain non-sentential units (NSUs) such as metadata, sentence fragments, nonlinguistic markers, etc. which are unreasonable or undesirable to be treated as a part of an SU. To tackle this issue, we formulate a novel task of sentence identification, where the goal is to identify SUs while excluding NSUs in a given text. To conduct sentence identification, we propose a simple yet effective method which combines the beginning of the sentence (BOS) and EOS labels to determine the most probable SUs and NSUs based on dynamic programming. To evaluate this task, we design an automatic, language-independent procedure to convert the Universal Dependencies corpora into sentence identification benchmarks. Finally, our experiments on the sentence identification task demonstrate that our proposed method generally outperforms sentence segmentation baselines which only utilize EOS labels.

* Accepted to EACL 2023 (Findings)

Via

Access Paper or Ask Questions

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Nov 25, 2022

Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno

Figure 1 for Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Figure 2 for Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Figure 3 for Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Figure 4 for Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Abstract:Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. Although many estimators have been developed, there is no single estimator that dominates the others, because the estimators' accuracy can vary greatly depending on a given OPE task such as the evaluation policy, number of actions, and noise level. Thus, the data-driven estimator selection problem is becoming increasingly important and can have a significant impact on the accuracy of OPE. However, identifying the most accurate estimator using only the logged data is quite challenging because the ground-truth estimation accuracy of estimators is generally unavailable. This paper studies this challenging problem of estimator selection for OPE for the first time. In particular, we enable an estimator selection that is adaptive to a given OPE task, by appropriately subsampling available logged data and constructing pseudo policies useful for the underlying estimator selection task. Comprehensive experiments on both synthetic and real-world company data demonstrate that the proposed procedure substantially improves the estimator selection compared to a non-adaptive heuristic.

* accepted at AAAI'23

Via

Access Paper or Ask Questions