Picture for Guanglu Wan

Guanglu Wan

GeoAlign: Geometric Feature Realignment for MLLM Spatial Reasoning

Add code
Apr 14, 2026
Viaarxiv icon

LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space

Add code
Mar 31, 2026
Viaarxiv icon

Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese

Add code
May 16, 2025
Viaarxiv icon

AS-Speech: Adaptive Style For Speech Synthesis

Add code
Sep 09, 2024
Figure 1 for AS-Speech: Adaptive Style For Speech Synthesis
Figure 2 for AS-Speech: Adaptive Style For Speech Synthesis
Figure 3 for AS-Speech: Adaptive Style For Speech Synthesis
Figure 4 for AS-Speech: Adaptive Style For Speech Synthesis
Viaarxiv icon

MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

Add code
Jun 26, 2024
Figure 1 for MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Figure 2 for MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Figure 3 for MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Figure 4 for MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research
Viaarxiv icon

CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs

Add code
May 27, 2024
Viaarxiv icon

Learning or Self-aligning? Rethinking Instruction Fine-tuning

Add code
Mar 02, 2024
Figure 1 for Learning or Self-aligning? Rethinking Instruction Fine-tuning
Figure 2 for Learning or Self-aligning? Rethinking Instruction Fine-tuning
Figure 3 for Learning or Self-aligning? Rethinking Instruction Fine-tuning
Figure 4 for Learning or Self-aligning? Rethinking Instruction Fine-tuning
Viaarxiv icon

A Task-oriented Dialog Model with Task-progressive and Policy-aware Pre-training

Add code
Oct 01, 2023
Viaarxiv icon

CPPF: A contextual and post-processing-free model for automatic speech recognition

Add code
Sep 21, 2023
Figure 1 for CPPF: A contextual and post-processing-free model for automatic speech recognition
Figure 2 for CPPF: A contextual and post-processing-free model for automatic speech recognition
Figure 3 for CPPF: A contextual and post-processing-free model for automatic speech recognition
Viaarxiv icon

Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter

Add code
Sep 19, 2023
Figure 1 for Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter
Figure 2 for Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter
Figure 3 for Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter
Figure 4 for Enhancing Multilingual Speech Recognition through Language Prompt Tuning and Frame-Level Language Adapter
Viaarxiv icon