Picture for Xiang Bai

Xiang Bai

Huazhong University of Science and Technology

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

Add code
Mar 12, 2026
Viaarxiv icon

ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding

Add code
Feb 26, 2026
Viaarxiv icon

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Add code
Feb 26, 2026
Viaarxiv icon

GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving

Add code
Feb 09, 2026
Viaarxiv icon

I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing

Add code
Jan 07, 2026
Viaarxiv icon

Crowded Video Individual Counting Informed by Social Grouping and Spatial-Temporal Displacement Priors

Add code
Jan 03, 2026
Viaarxiv icon

FitControler: Toward Fit-Aware Virtual Try-On

Add code
Dec 30, 2025
Viaarxiv icon

MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning

Add code
Dec 16, 2025
Figure 1 for MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Figure 2 for MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Figure 3 for MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Figure 4 for MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Viaarxiv icon

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation

Add code
Dec 14, 2025
Viaarxiv icon

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

Add code
Dec 14, 2025
Figure 1 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 2 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 3 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Figure 4 for DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Viaarxiv icon