speech


VALLR-Pin: Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement

Add code
Dec 23, 2025
Viaarxiv icon

Aliasing-Free Neural Audio Synthesis

Add code
Dec 23, 2025
Figure 1 for Aliasing-Free Neural Audio Synthesis
Figure 2 for Aliasing-Free Neural Audio Synthesis
Figure 3 for Aliasing-Free Neural Audio Synthesis
Figure 4 for Aliasing-Free Neural Audio Synthesis
Viaarxiv icon

SpidR: Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision

Add code
Dec 23, 2025
Viaarxiv icon

SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

Add code
Dec 23, 2025
Viaarxiv icon

QuarkAudio Technical Report

Add code
Dec 23, 2025
Viaarxiv icon

LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling

Add code
Dec 23, 2025
Figure 1 for LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
Figure 2 for LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
Figure 3 for LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
Figure 4 for LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
Viaarxiv icon

Fun-Audio-Chat Technical Report

Add code
Dec 23, 2025
Viaarxiv icon

FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs

Add code
Dec 23, 2025
Viaarxiv icon

Unsupervised Single-Channel Audio Separation with Diffusion Source Priors

Add code
Dec 23, 2025
Viaarxiv icon

Real-Time Streamable Generative Speech Restoration with Flow Matching

Add code
Dec 22, 2025
Figure 1 for Real-Time Streamable Generative Speech Restoration with Flow Matching
Figure 2 for Real-Time Streamable Generative Speech Restoration with Flow Matching
Figure 3 for Real-Time Streamable Generative Speech Restoration with Flow Matching
Figure 4 for Real-Time Streamable Generative Speech Restoration with Flow Matching
Viaarxiv icon