Picture for Zhaoye Fei

Zhaoye Fei

MOSS-TTSD: Text to Spoken Dialogue Generation

Add code
Mar 20, 2026
Viaarxiv icon

MOSS-TTS Technical Report

Add code
Mar 18, 2026
Viaarxiv icon

MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models

Add code
Feb 12, 2026
Viaarxiv icon

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Add code
Feb 09, 2026
Viaarxiv icon

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

Add code
Jan 08, 2026
Viaarxiv icon

WESR: Scaling and Evaluating Word-level Event-Speech Recognition

Add code
Jan 08, 2026
Viaarxiv icon

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

Add code
Oct 02, 2025
Figure 1 for MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Figure 2 for MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Figure 3 for MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Figure 4 for MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
Viaarxiv icon

CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation

Add code
Aug 28, 2025
Figure 1 for CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Figure 2 for CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Figure 3 for CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Figure 4 for CodecBench: A Comprehensive Benchmark for Acoustic and Semantic Evaluation
Viaarxiv icon

VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search

Add code
Apr 12, 2025
Viaarxiv icon

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

Add code
Mar 13, 2025
Viaarxiv icon