speech


Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Add code
Dec 22, 2025
Viaarxiv icon

MauBERT: Universal Phonetic Inductive Biases for Few-Shot Acoustic Units Discovery

Add code
Dec 22, 2025
Viaarxiv icon

Real-Time Streamable Generative Speech Restoration with Flow Matching

Add code
Dec 22, 2025
Figure 1 for Real-Time Streamable Generative Speech Restoration with Flow Matching
Figure 2 for Real-Time Streamable Generative Speech Restoration with Flow Matching
Figure 3 for Real-Time Streamable Generative Speech Restoration with Flow Matching
Figure 4 for Real-Time Streamable Generative Speech Restoration with Flow Matching
Viaarxiv icon

Kunnafonidilaw ka Cadeau: an ASR dataset of present-day Bambara

Add code
Dec 22, 2025
Figure 1 for Kunnafonidilaw ka Cadeau: an ASR dataset of present-day Bambara
Figure 2 for Kunnafonidilaw ka Cadeau: an ASR dataset of present-day Bambara
Figure 3 for Kunnafonidilaw ka Cadeau: an ASR dataset of present-day Bambara
Figure 4 for Kunnafonidilaw ka Cadeau: an ASR dataset of present-day Bambara
Viaarxiv icon

Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization

Add code
Dec 22, 2025
Figure 1 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 2 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 3 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Figure 4 for Enhancing Fully Formatted End-to-End Speech Recognition with Knowledge Distillation via Multi-Codebook Vector Quantization
Viaarxiv icon

In-Context Audio Control of Video Diffusion Transformers

Add code
Dec 21, 2025
Viaarxiv icon

Reliable Audio Deepfake Detection in Variable Conditions via Quantum-Kernel SVMs

Add code
Dec 21, 2025
Viaarxiv icon

Smark: A Watermark for Text-to-Speech Diffusion Models via Discrete Wavelet Transform

Add code
Dec 21, 2025
Figure 1 for Smark: A Watermark for Text-to-Speech Diffusion Models via Discrete Wavelet Transform
Figure 2 for Smark: A Watermark for Text-to-Speech Diffusion Models via Discrete Wavelet Transform
Figure 3 for Smark: A Watermark for Text-to-Speech Diffusion Models via Discrete Wavelet Transform
Figure 4 for Smark: A Watermark for Text-to-Speech Diffusion Models via Discrete Wavelet Transform
Viaarxiv icon

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis

Add code
Dec 21, 2025
Viaarxiv icon

Asynchronous Pipeline Parallelism for Real-Time Multilingual Lip Synchronization in Video Communication Systems

Add code
Dec 20, 2025
Figure 1 for Asynchronous Pipeline Parallelism for Real-Time Multilingual Lip Synchronization in Video Communication Systems
Figure 2 for Asynchronous Pipeline Parallelism for Real-Time Multilingual Lip Synchronization in Video Communication Systems
Figure 3 for Asynchronous Pipeline Parallelism for Real-Time Multilingual Lip Synchronization in Video Communication Systems
Figure 4 for Asynchronous Pipeline Parallelism for Real-Time Multilingual Lip Synchronization in Video Communication Systems
Viaarxiv icon