Picture for Zhifei Xie

Zhifei Xie

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

Add code
May 19, 2026
Viaarxiv icon

Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation

Add code
Apr 12, 2026
Viaarxiv icon

PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory

Add code
Apr 09, 2026
Viaarxiv icon

Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

Add code
Mar 04, 2025
Figure 1 for Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Figure 2 for Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Figure 3 for Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Figure 4 for Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Viaarxiv icon

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

Add code
Oct 16, 2024
Figure 1 for Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Figure 2 for Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Figure 3 for Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Figure 4 for Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities
Viaarxiv icon

Mini-Omni2: Towards Open-source GPT-4o Model with Vision, Speech and Duplex

Add code
Oct 15, 2024
Figure 1 for Mini-Omni2: Towards Open-source GPT-4o Model with Vision, Speech and Duplex
Figure 2 for Mini-Omni2: Towards Open-source GPT-4o Model with Vision, Speech and Duplex
Figure 3 for Mini-Omni2: Towards Open-source GPT-4o Model with Vision, Speech and Duplex
Figure 4 for Mini-Omni2: Towards Open-source GPT-4o Model with Vision, Speech and Duplex
Viaarxiv icon

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Add code
Aug 30, 2024
Viaarxiv icon

DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework

Add code
Aug 21, 2024
Viaarxiv icon