Picture for Hongsheng Li

Hongsheng Li

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Add code
Jun 05, 2025
Viaarxiv icon

MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Add code
Jun 05, 2025
Viaarxiv icon

Probability-Consistent Preference Optimization for Enhanced LLM Reasoning

Add code
May 29, 2025
Viaarxiv icon

UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Add code
May 27, 2025
Viaarxiv icon

SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving

Add code
May 22, 2025
Viaarxiv icon

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Add code
May 22, 2025
Viaarxiv icon

Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space

Add code
May 22, 2025
Viaarxiv icon

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

Add code
May 22, 2025
Viaarxiv icon

UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

Add code
May 21, 2025
Viaarxiv icon

Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations

Add code
May 17, 2025
Viaarxiv icon