Picture for Bin Ma

Bin Ma

Univ. Western Ontario

FunAudio-ASR Technical Report

Add code
Sep 15, 2025
Viaarxiv icon

Insight Rumors: A Novel Textual Rumor Locating and Marking Model Leveraging Att_BiMamba2 Network

Add code
Aug 18, 2025
Viaarxiv icon

ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment

Add code
Jun 24, 2025
Viaarxiv icon

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction

Add code
May 27, 2025
Viaarxiv icon

ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates

Add code
May 18, 2025
Viaarxiv icon

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding

Add code
May 12, 2025
Viaarxiv icon

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning

Add code
Jan 17, 2025
Viaarxiv icon

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Add code
Jan 17, 2025
Viaarxiv icon

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Add code
Jan 10, 2025
Figure 1 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 2 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 3 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Figure 4 for MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Viaarxiv icon

Speech Separation using Neural Audio Codecs with Embedding Loss

Add code
Nov 27, 2024
Figure 1 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 2 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 3 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 4 for Speech Separation using Neural Audio Codecs with Embedding Loss
Viaarxiv icon