Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shankar Kumar

Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing

Jan 23, 2025

Hao Zhang, Felix Stahlberg, Shankar Kumar

Abstract:Large Language Models (LLMs) excel at rewriting tasks such as text style transfer and grammatical error correction. While there is considerable overlap between the inputs and outputs in these tasks, the decoding cost still increases with output length, regardless of the amount of overlap. By leveraging the overlap between the input and the output, Kaneko and Okazaki (2023) proposed model-agnostic edit span representations to compress the rewrites to save computation. They reported an output length reduction rate of nearly 80% with minimal accuracy impact in four rewriting tasks. In this paper, we propose alternative edit phrase representations inspired by phrase-based statistical machine translation. We systematically compare our phrasal representations with their span representations. We apply the LLM rewriting model to the task of Automatic Speech Recognition (ASR) post editing and show that our target-phrase-only edit representation has the best efficiency-accuracy trade-off. On the LibriSpeech test set, our method closes 50-60% of the WER gap between the edit span model and the full rewrite model while losing only 10-20% of the length reduction rate of the edit span model.

* accepted by ICASSP 2025

Via

Access Paper or Ask Questions

Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models

Nov 13, 2024

Felix Stahlberg, Jared Lichtarge, Shankar Kumar

Abstract:We propose a novel parameter-efficient training (PET) method for large language models that adapts models to downstream tasks by optimizing a small subset of the existing model parameters. Unlike prior methods, this subset is not fixed in location but rather which parameters are modified evolves over the course of training. This dynamic parameter selection can yield good performance with many fewer parameters than extant methods. Our method enables a seamless scaling of the subset size across an arbitrary proportion of the total model size, while popular PET approaches like prompt tuning and LoRA cover only a small part of this spectrum. We match or outperform prompt tuning and LoRA in most cases on a variety of NLP tasks (MT, QA, GSM8K, SuperGLUE) for a given parameter budget across different model families and sizes.

* NeurIPS 2024 Workshop on Adaptive Foundation Models

Via

Access Paper or Ask Questions

Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices

Sep 24, 2024

Leonid Velikovich, Christopher Li, Diamantino Caseiro, Shankar Kumar, Pat Rondon, Kandarp Joshi, Xavier Velez

Figure 1 for Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices

Figure 2 for Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices

Figure 3 for Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices

Figure 4 for Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices

Abstract:For end-to-end Automatic Speech Recognition (ASR) models, recognizing personal or rare phrases can be hard. A promising way to improve accuracy is through spelling correction (or rewriting) of the ASR lattice, where potentially misrecognized phrases are replaced with acoustically similar and contextually relevant alternatives. However, rewriting is challenging for ASR models trained with connectionist temporal classification (CTC) due to noisy hypotheses produced by a non-autoregressive, context-independent beam search. We present a finite-state transducer (FST) technique for rewriting wordpiece lattices generated by Transformer-based CTC models. Our algorithm performs grapheme-to-phoneme (G2P) conversion directly from wordpieces into phonemes, avoiding explicit word representations and exploiting the richness of the CTC lattice. Our approach requires no retraining or modification of the ASR model. We achieved up to a 15.2% relative reduction in sentence error rate (SER) on a test set with contextually relevant entities.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Oct 23, 2023

Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke Wu

Figure 1 for Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Figure 2 for Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Figure 3 for Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Figure 4 for Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Abstract:One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs by incorporating finite-state constraints during decoding; these eliminate invalid outputs without requiring additional training. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. Relative to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU by 2.9 points for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets, just by improving segmentation.

* accepted to the Findings of EMNLP 2023. arXiv admin note: text overlap with arXiv:2212.09895

Via

Access Paper or Ask Questions

Heterogeneous Federated Learning Using Knowledge Codistillation

Oct 04, 2023

Jared Lichtarge, Ehsan Amid, Shankar Kumar, Tien-Ju Yang, Rohan Anil, Rajiv Mathews

Figure 1 for Heterogeneous Federated Learning Using Knowledge Codistillation

Figure 2 for Heterogeneous Federated Learning Using Knowledge Codistillation

Figure 3 for Heterogeneous Federated Learning Using Knowledge Codistillation

Figure 4 for Heterogeneous Federated Learning Using Knowledge Codistillation

Abstract:Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture. This results in unused modeling capacity on many clients, which limits model performance. To address this issue, we propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with higher capacity. The models exchange information bidirectionally via knowledge distillation, utilizing an unlabeled dataset on a server without sharing parameters. We present two variants of our method, which improve upon federated averaging on image classification and language modeling tasks. We show this technique can be useful even if only out-of-domain or limited in-domain distillation data is available. Additionally, the bi-directional knowledge distillation allows for domain transfer between the models when different pool populations introduce domain shift.

Via

Access Paper or Ask Questions

Towards an On-device Agent for Text Rewriting

Aug 22, 2023

Yun Zhu, Yinxiao Liu, Felix Stahlberg, Shankar Kumar, Yu-hui Chen, Liangchen Luo, Lei Shu, Renjie Liu, Jindong Chen, Lei Meng

Figure 1 for Towards an On-device Agent for Text Rewriting

Figure 2 for Towards an On-device Agent for Text Rewriting

Figure 3 for Towards an On-device Agent for Text Rewriting

Figure 4 for Towards an On-device Agent for Text Rewriting

Abstract:Large Language Models (LLMs) have demonstrated impressive capabilities for text rewriting. Nonetheless, the large sizes of these models make them impractical for on-device inference, which would otherwise allow for enhanced privacy and economical inference. Creating a smaller yet potent language model for text rewriting presents a formidable challenge because it requires balancing the need for a small size with the need to retain the emergent capabilities of the LLM, that requires costly data collection. To address the above challenge, we introduce a new instruction tuning approach for building a mobile-centric text rewriting model. Our strategies enable the generation of high quality training data without any human labeling. In addition, we propose a heuristic reinforcement learning framework which substantially enhances performance without requiring preference data. To further bridge the performance gap with the larger server-side model, we propose an effective approach that combines the mobile rewrite agent with the server model using a cascade. To tailor the text rewriting tasks to mobile scenarios, we introduce MessageRewriteEval, a benchmark that focuses on text rewriting for messages through natural language instructions. Through empirical experiments, we demonstrate that our on-device model surpasses the current state-of-the-art LLMs in text rewriting while maintaining a significantly reduced model size. Notably, we show that our proposed cascading approach improves model performance.

Via

Access Paper or Ask Questions

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

May 28, 2023

W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

Abstract:We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context within the current sentence. Semantically complete sentence boundaries are typically demarcated by punctuation in written text; but unfortunately, spoken real-world utterances rarely contain punctuation. We address this limitation by distilling punctuation knowledge from a bidirectional teacher language model (LM) trained on written, punctuated text. We compare our segmenter, which is distilled from the LM teacher, against a segmenter distilled from a acoustic-pause-based teacher used in other works, on a streaming ASR pipeline. The pipeline with our segmenter achieves a 3.2% relative WER gain along with a 60 ms median end-of-segment latency reduction on a YouTube captioning task.

* Interspeech 2023. First 3 authors contributed equally

Via

Access Paper or Ask Questions

Measuring Re-identification Risk

Apr 12, 2023

CJ Carey, Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Shankar Kumar, Andres Munoz Medina, Vahab Mirrokni, Gabriel Henrique Nunes, Sergei Vassilvitskii(+1 more)

Abstract:Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

Via

Access Paper or Ask Questions

Improved Long-Form Spoken Language Translation with Large Language Models

Dec 19, 2022

Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Axel H. Ng

Figure 1 for Improved Long-Form Spoken Language Translation with Large Language Models

Figure 2 for Improved Long-Form Spoken Language Translation with Large Language Models

Figure 3 for Improved Long-Form Spoken Language Translation with Large Language Models

Figure 4 for Improved Long-Form Spoken Language Translation with Large Language Models

Abstract:A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we fine-tune a general-purpose, large language model to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We compare to several segmentation strategies and find that our approach improves BLEU score on three languages by an average of 2.7 BLEU overall compared to an automatic punctuation baseline. Further, we demonstrate the effectiveness of two constrained decoding strategies to improve well-formedness of the model output from above 99% to 100%.

Via

Access Paper or Ask Questions

Conciseness: An Overlooked Language Task

Nov 08, 2022

Felix Stahlberg, Aashish Kumar, Chris Alberti, Shankar Kumar

Figure 1 for Conciseness: An Overlooked Language Task

Figure 2 for Conciseness: An Overlooked Language Task

Figure 3 for Conciseness: An Overlooked Language Task

Figure 4 for Conciseness: An Overlooked Language Task

Abstract:We report on novel investigations into training models that make sentences concise. We define the task and show that it is different from related tasks such as summarization and simplification. For evaluation, we release two test sets, consisting of 2000 sentences each, that were annotated by two and five human annotators, respectively. We demonstrate that conciseness is a difficult task for which zero-shot setups with large neural language models often do not perform well. Given the limitations of these approaches, we propose a synthetic data generation method based on round-trip translations. Using this data to either train Transformers from scratch or fine-tune T5 models yields our strongest baselines that can be further improved by fine-tuning on an artificial conciseness dataset that we derived from multi-annotator machine translation test sets.

* EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability (TSAR)

Via

Access Paper or Ask Questions