Picture for Shengyi Qian

Shengyi Qian

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Add code
May 28, 2026
Viaarxiv icon

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Add code
May 21, 2026
Viaarxiv icon

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Add code
Mar 03, 2026
Viaarxiv icon

Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

Add code
Feb 23, 2026
Viaarxiv icon

Learning Personalized Agents from Human Feedback

Add code
Feb 18, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

DigiData: Training and Evaluating General-Purpose Mobile Control Agents

Add code
Nov 11, 2025
Viaarxiv icon

DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data

Add code
May 21, 2025
Viaarxiv icon

Multi-Object Hallucination in Vision-Language Models

Add code
Jul 08, 2024
Figure 1 for Multi-Object Hallucination in Vision-Language Models
Figure 2 for Multi-Object Hallucination in Vision-Language Models
Figure 3 for Multi-Object Hallucination in Vision-Language Models
Figure 4 for Multi-Object Hallucination in Vision-Language Models
Viaarxiv icon

3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

Add code
Jun 26, 2024
Figure 1 for 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Figure 2 for 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Figure 3 for 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Figure 4 for 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Viaarxiv icon