Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuhe Li

LCS-CTC: Leveraging Soft Alignments to Enhance Phonetic Transcription Robustness

Aug 05, 2025

Zongli Ye, Jiachen Lian, Akshaj Gupta, Xuanru Zhou, Krish Patel, Haodong Li, Hwi Joo Park, Chenxu Guo, Shuhe Li, Sam Wang(+9 more)

Abstract:Phonetic speech transcription is crucial for fine-grained linguistic analysis and downstream speech applications. While Connectionist Temporal Classification (CTC) is a widely used approach for such tasks due to its efficiency, it often falls short in recognition performance, especially under unclear and nonfluent speech. In this work, we propose LCS-CTC, a two-stage framework for phoneme-level speech recognition that combines a similarity-aware local alignment algorithm with a constrained CTC training objective. By predicting fine-grained frame-phoneme cost matrices and applying a modified Longest Common Subsequence (LCS) algorithm, our method identifies high-confidence alignment zones which are used to constrain the CTC decoding path space, thereby reducing overfitting and improving generalization ability, which enables both robust recognition and text-free forced alignment. Experiments on both LibriSpeech and PPA demonstrate that LCS-CTC consistently outperforms vanilla CTC baselines, suggesting its potential to unify phoneme modeling across fluent and non-fluent speech.

* 2025 ASRU

Via

Access Paper or Ask Questions

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection

May 28, 2025

Jinming Zhang, Xuanru Zhou, Jiachen Lian, Shuhe Li, William Li, Zoe Ezzes, Rian Bogley, Lisa Wauters, Zachary Miller, Jet Vonk(+3 more)

Abstract:Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency generation, existing synthetic datasets suffer from unnatural prosody and limited contextual diversity. To address these limitations, we propose LLM-Dys -- the most comprehensive dysfluent speech corpus with LLM-enhanced dysfluency simulation. This dataset captures 11 dysfluency categories spanning both word and phoneme levels. Building upon this resource, we improve an end-to-end dysfluency detection framework. Experimental validation demonstrates state-of-the-art performance. All data, models, and code are open-sourced at https://github.com/Berkeley-Speech-Group/LLM-Dys.

* Submitted to Interspeech 2025

Via

Access Paper or Ask Questions

Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection

May 22, 2025

Chenxu Guo, Jiachen Lian, Xuanru Zhou, Jinming Zhang, Shuhe Li, Zongli Ye, Hwi Joo Park, Anaisha Das, Zoe Ezzes, Jet Vonk(+6 more)

Abstract:Automatic detection of speech dysfluency aids speech-language pathologists in efficient transcription of disordered speech, enhancing diagnostics and treatment planning. Traditional methods, often limited to classification, provide insufficient clinical insight, and text-independent models misclassify dysfluency, especially in context-dependent cases. This work introduces Dysfluent-WFST, a zero-shot decoder that simultaneously transcribes phonemes and detects dysfluency. Unlike previous models, Dysfluent-WFST operates with upstream encoders like WavLM and requires no additional training. It achieves state-of-the-art performance in both phonetic error rate and dysfluency detection on simulated and real speech data. Our approach is lightweight, interpretable, and effective, demonstrating that explicit modeling of pronunciation behavior in decoding, rather than complex architectures, is key to improving dysfluency processing systems.

* Interspeech 2025

Via

Access Paper or Ask Questions

Routing Towards Discriminative Power of Class Capsules

Mar 07, 2021

Haoyu Yang, Shuhe Li, Bei Yu

Figure 1 for Routing Towards Discriminative Power of Class Capsules

Figure 2 for Routing Towards Discriminative Power of Class Capsules

Figure 3 for Routing Towards Discriminative Power of Class Capsules

Figure 4 for Routing Towards Discriminative Power of Class Capsules

Abstract:Capsule networks are recently proposed as an alternative to modern neural network architectures. Neurons are replaced with capsule units that represent specific features or entities with normalized vectors or matrices. The activation of lower layer capsules affects the behavior of the following capsules via routing links that are constructed during training via certain routing algorithms. We discuss the routing-by-agreement scheme in dynamic routing algorithm which, in certain cases, leads the networks away from optimality. To obtain better and faster convergence, we propose a routing algorithm that incorporates a regularized quadratic programming problem which can be solved efficiently. Particularly, the proposed routing algorithm targets directly on the discriminative power of class capsules making the correct decision on input instances. We conduct experiments on MNIST, MNIST-Fashion, and CIFAR-10 and show competitive classification results compared to existing capsule networks.

* 6 pages,4 figures

Via

Access Paper or Ask Questions

Bridging the Gap Between Layout Pattern Sampling and Hotspot Detection via Batch Active Learning

Jul 13, 2018

Haoyu Yang, Shuhe Li, Cyrus Tabery, Bingqing Lin, Bei Yu

Figure 1 for Bridging the Gap Between Layout Pattern Sampling and Hotspot Detection via Batch Active Learning

Figure 2 for Bridging the Gap Between Layout Pattern Sampling and Hotspot Detection via Batch Active Learning

Figure 3 for Bridging the Gap Between Layout Pattern Sampling and Hotspot Detection via Batch Active Learning

Figure 4 for Bridging the Gap Between Layout Pattern Sampling and Hotspot Detection via Batch Active Learning

Abstract:Layout hotpot detection is one of the main steps in modern VLSI design. A typical hotspot detection flow is extremely time consuming due to the computationally expensive mask optimization and lithographic simulation. Recent researches try to facilitate the procedure with a reduced flow including feature extraction, training set generation and hotspot detection, where feature extraction methods and hotspot detection engines are deeply studied. However, the performance of hotspot detectors relies highly on the quality of reference layout libraries which are costly to obtain and usually predetermined or randomly sampled in previous works. In this paper, we propose an active learning-based layout pattern sampling and hotspot detection flow, which simultaneously optimizes the machine learning model and the training set that aims to achieve similar or better hotspot detection performance with much smaller number of training instances. Experimental results show that our proposed method can significantly reduce lithography simulation overhead while attaining satisfactory detection accuracy on designs under both DUV and EUV lithography technologies.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions