Picture for Manjie Xu

Manjie Xu

FireRed-OCR Technical Report

Add code
Mar 02, 2026
Viaarxiv icon

Code over Words: Overcoming Semantic Inertia via Code-Grounded Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model

Add code
May 19, 2025
Viaarxiv icon

Learning to Plan with Personalized Preferences

Add code
Feb 02, 2025
Figure 1 for Learning to Plan with Personalized Preferences
Figure 2 for Learning to Plan with Personalized Preferences
Figure 3 for Learning to Plan with Personalized Preferences
Figure 4 for Learning to Plan with Personalized Preferences
Viaarxiv icon

SRC-gAudio: Sampling-Rate-Controlled Audio Generation

Add code
Oct 09, 2024
Figure 1 for SRC-gAudio: Sampling-Rate-Controlled Audio Generation
Figure 2 for SRC-gAudio: Sampling-Rate-Controlled Audio Generation
Figure 3 for SRC-gAudio: Sampling-Rate-Controlled Audio Generation
Viaarxiv icon

Towards Diverse and Efficient Audio Captioning via Diffusion Models

Add code
Sep 14, 2024
Viaarxiv icon

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Add code
Sep 13, 2024
Figure 1 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 2 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 3 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 4 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Viaarxiv icon

Video-to-Audio Generation with Hidden Alignment

Add code
Jul 10, 2024
Figure 1 for Video-to-Audio Generation with Hidden Alignment
Figure 2 for Video-to-Audio Generation with Hidden Alignment
Figure 3 for Video-to-Audio Generation with Hidden Alignment
Figure 4 for Video-to-Audio Generation with Hidden Alignment
Viaarxiv icon

Active Reasoning in an Open-World Environment

Add code
Nov 03, 2023
Figure 1 for Active Reasoning in an Open-World Environment
Figure 2 for Active Reasoning in an Open-World Environment
Figure 3 for Active Reasoning in an Open-World Environment
Figure 4 for Active Reasoning in an Open-World Environment
Viaarxiv icon

MEWL: Few-shot multimodal word learning with referential uncertainty

Add code
Jun 01, 2023
Figure 1 for MEWL: Few-shot multimodal word learning with referential uncertainty
Figure 2 for MEWL: Few-shot multimodal word learning with referential uncertainty
Figure 3 for MEWL: Few-shot multimodal word learning with referential uncertainty
Figure 4 for MEWL: Few-shot multimodal word learning with referential uncertainty
Viaarxiv icon