Picture for Shengyi Qian

Shengyi Qian

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Add code
Mar 03, 2026
Viaarxiv icon

Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

Add code
Feb 23, 2026
Viaarxiv icon

Learning Personalized Agents from Human Feedback

Add code
Feb 18, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

DigiData: Training and Evaluating General-Purpose Mobile Control Agents

Add code
Nov 11, 2025
Viaarxiv icon

DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data

Add code
May 21, 2025
Viaarxiv icon

Multi-Object Hallucination in Vision-Language Models

Add code
Jul 08, 2024
Figure 1 for Multi-Object Hallucination in Vision-Language Models
Figure 2 for Multi-Object Hallucination in Vision-Language Models
Figure 3 for Multi-Object Hallucination in Vision-Language Models
Figure 4 for Multi-Object Hallucination in Vision-Language Models
Viaarxiv icon

3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

Add code
Jun 26, 2024
Figure 1 for 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Figure 2 for 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Figure 3 for 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Figure 4 for 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Viaarxiv icon

Multimodal Graph Benchmark

Add code
Jun 24, 2024
Figure 1 for Multimodal Graph Benchmark
Figure 2 for Multimodal Graph Benchmark
Figure 3 for Multimodal Graph Benchmark
Figure 4 for Multimodal Graph Benchmark
Viaarxiv icon

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Add code
Jun 12, 2024
Figure 1 for 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Figure 2 for 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Figure 3 for 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Figure 4 for 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Viaarxiv icon