Picture for Le Xu

Le Xu

OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech

Add code
Jan 04, 2026
Viaarxiv icon

GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation

Add code
Dec 29, 2025
Viaarxiv icon

OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving

Add code
Dec 16, 2025
Figure 1 for OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving
Figure 2 for OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving
Figure 3 for OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving
Figure 4 for OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving
Viaarxiv icon

Perception Activator: An intuitive and portable framework for brain cognitive exploration

Add code
Jul 03, 2025
Viaarxiv icon

Mitigating Audiovisual Mismatch in Visual-Guide Audio Captioning

Add code
May 28, 2025
Viaarxiv icon

Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model

Add code
May 19, 2025
Viaarxiv icon

DynamicDTA: Drug-Target Binding Affinity Prediction Using Dynamic Descriptors and Graph Representation

Add code
May 13, 2025
Viaarxiv icon

ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference

Add code
May 20, 2024
Figure 1 for ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference
Figure 2 for ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference
Figure 3 for ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference
Figure 4 for ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference
Viaarxiv icon

MOSEL: Inference Serving Using Dynamic Modality Selection

Add code
Oct 27, 2023
Viaarxiv icon

Controllable Residual Speaker Representation for Voice Conversion

Add code
Sep 15, 2023
Viaarxiv icon