Picture for Xiaohui Li

Xiaohui Li

Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance

Add code
Aug 10, 2025
Viaarxiv icon

The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs

Add code
Jul 10, 2025
Viaarxiv icon

APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

Add code
Jun 09, 2025
Viaarxiv icon

Road Similarity-Based BEV-Satellite Image Matching for UGV Localization

Add code
Apr 23, 2025
Viaarxiv icon

Self-Supervised Traversability Learning with Online Prototype Adaptation for Off-Road Autonomous Driving

Add code
Apr 16, 2025
Viaarxiv icon

EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame Interpolation

Add code
Mar 26, 2025
Viaarxiv icon

Research on the Offshore Marine Communication Environment Based on Satellite Remote Sensing Data

Add code
Feb 19, 2025
Viaarxiv icon

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency

Add code
Jan 17, 2025
Viaarxiv icon

Subspace-Constrained Quadratic Matrix Factorization: Algorithm and Applications

Add code
Nov 07, 2024
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon