Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sayan Shaw

Pushing the Limits of On-Device Streaming ASR: A Compact, High-Accuracy English Model for Low-Latency Inference

Apr 16, 2026

Nenad Banfic, David Fan, Kunal Vaishnavi, Sam Kemp, Sunghoon Choi, Rui Ren, Sayan Shaw, Meng Tang

Abstract:Deploying high-quality automatic speech recognition (ASR) on edge devices requires models that jointly optimize accuracy, latency, and memory footprint while operating entirely on CPU without GPU acceleration. We conduct a systematic empirical study of state-of-the-art ASR architectures, encompassing encoder-decoder, transducer, and LLM-based paradigms, evaluated across batch, chunked, and streaming inference modes. Through a comprehensive benchmark of over 50 configurations spanning OpenAI Whisper, NVIDIA Nemotron, Parakeet TDT, Canary, Conformer Transducer, and Qwen3-ASR, we identify NVIDIA's Nemotron Speech Streaming as the strongest candidate for real-time English streaming on resource-constrained hardware. We then re-implement the complete streaming inference pipeline in ONNX Runtime and conduct a controlled evaluation of multiple post-training quantization strategies, including importance-weighted k-quant, mixed-precision schemes, and round-to-nearest quantization, combined with graph-level operator fusion. These optimizations reduce the model from 2.47 GB to as little as 0.67 GB while maintaining word error rate (WER) within 1% absolute of the full-precision PyTorch baseline. Our recommended configuration, the int4 k-quant variant, achieves 8.20% average streaming WER across eight standard benchmarks, running comfortably faster than real-time on CPU with 0.56 s algorithmic latency, establishing a new quality-efficiency Pareto point for on-device streaming ASR.

Via

Access Paper or Ask Questions

Neighbor-Based Optimized Logistic Regression Machine Learning Model For Electric Vehicle Occupancy Detection

Apr 28, 2022

Sayan Shaw, Keaton Chia, Jan Kleissl

Figure 1 for Neighbor-Based Optimized Logistic Regression Machine Learning Model For Electric Vehicle Occupancy Detection

Figure 2 for Neighbor-Based Optimized Logistic Regression Machine Learning Model For Electric Vehicle Occupancy Detection

Figure 3 for Neighbor-Based Optimized Logistic Regression Machine Learning Model For Electric Vehicle Occupancy Detection

Figure 4 for Neighbor-Based Optimized Logistic Regression Machine Learning Model For Electric Vehicle Occupancy Detection

Abstract:This paper presents an optimized logistic regression machine learning model that predicts the occupancy of an Electric Vehicle (EV) charging station given the occupancy of neighboring stations. The model was optimized for the time of day. Trained on data from 57 EV charging stations around the University of California San Diego campus, the model achieved an 88.43% average accuracy and 92.23% maximum accuracy in predicting occupancy, outperforming a persistence model benchmark.

* 5 pages, 3 figures

Via

Access Paper or Ask Questions