Picture for Lu Xu

Lu Xu

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Add code
Aug 20, 2025
Viaarxiv icon

Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice

Add code
Jul 24, 2025
Viaarxiv icon

MoSE: Skill-by-Skill Mixture-of-Expert Learning for Autonomous Driving

Add code
Jul 10, 2025
Viaarxiv icon

From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition

Add code
May 22, 2025
Viaarxiv icon

RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification

Add code
Mar 12, 2025
Figure 1 for RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification
Figure 2 for RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification
Figure 3 for RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification
Figure 4 for RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification
Viaarxiv icon

Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent

Add code
Jul 31, 2024
Figure 1 for Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Figure 2 for Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Figure 3 for Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Figure 4 for Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Viaarxiv icon

NTIRE 2024 Challenge on Night Photography Rendering

Add code
Jun 18, 2024
Figure 1 for NTIRE 2024 Challenge on Night Photography Rendering
Figure 2 for NTIRE 2024 Challenge on Night Photography Rendering
Figure 3 for NTIRE 2024 Challenge on Night Photography Rendering
Figure 4 for NTIRE 2024 Challenge on Night Photography Rendering
Viaarxiv icon

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

Add code
Jun 15, 2024
Figure 1 for Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
Figure 2 for Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
Figure 3 for Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
Figure 4 for Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model
Viaarxiv icon

Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis

Add code
May 28, 2024
Figure 1 for Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis
Figure 2 for Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis
Figure 3 for Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis
Figure 4 for Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis
Viaarxiv icon

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Add code
May 09, 2024
Figure 1 for CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Figure 2 for CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Figure 3 for CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Figure 4 for CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Viaarxiv icon