Picture for Pengxiang Li

Pengxiang Li

From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes

Add code
Jun 05, 2025
Viaarxiv icon

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

Add code
May 26, 2025
Viaarxiv icon

Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

Add code
May 21, 2025
Viaarxiv icon

TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

Add code
May 14, 2025
Viaarxiv icon

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Add code
May 06, 2025
Viaarxiv icon

Iterative Trajectory Exploration for Multimodal Agents

Add code
Apr 30, 2025
Viaarxiv icon

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

Add code
Apr 19, 2025
Viaarxiv icon

Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback

Add code
Mar 13, 2025
Viaarxiv icon

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

The Curse of Depth in Large Language Models

Add code
Feb 09, 2025
Viaarxiv icon