Picture for Kai Yu

Kai Yu

Sherman

Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning

Add code
Jun 06, 2025
Viaarxiv icon

Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding

Add code
May 30, 2025
Viaarxiv icon

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer

Add code
May 28, 2025
Viaarxiv icon

Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency

Add code
May 28, 2025
Viaarxiv icon

Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling

Add code
May 26, 2025
Viaarxiv icon

NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering

Add code
May 26, 2025
Viaarxiv icon

MFA-KWS: Effective Keyword Spotting with Multi-head Frame-asynchronous Decoding

Add code
May 26, 2025
Viaarxiv icon

ProgRM: Build Better GUI Agents with Progress Rewards

Add code
May 23, 2025
Viaarxiv icon

Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate

Add code
May 22, 2025
Viaarxiv icon

Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism

Add code
May 20, 2025
Viaarxiv icon