Picture for Manling Li

Manling Li

Spatial Mental Modeling from Limited Views

Add code
Jun 26, 2025
Figure 1 for Spatial Mental Modeling from Limited Views
Figure 2 for Spatial Mental Modeling from Limited Views
Figure 3 for Spatial Mental Modeling from Limited Views
Figure 4 for Spatial Mental Modeling from Limited Views
Viaarxiv icon

Exploring Diffusion Transformer Designs via Grafting

Add code
Jun 06, 2025
Viaarxiv icon

Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging

Add code
May 08, 2025
Viaarxiv icon

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

Add code
Apr 24, 2025
Viaarxiv icon

Re-thinking Temporal Search for Long-Form Video Understanding

Add code
Apr 03, 2025
Figure 1 for Re-thinking Temporal Search for Long-Form Video Understanding
Figure 2 for Re-thinking Temporal Search for Long-Form Video Understanding
Figure 3 for Re-thinking Temporal Search for Long-Form Video Understanding
Figure 4 for Re-thinking Temporal Search for Long-Form Video Understanding
Viaarxiv icon

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas

Add code
Mar 04, 2025
Figure 1 for Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Figure 2 for Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Figure 3 for Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Figure 4 for Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Viaarxiv icon

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Add code
Feb 22, 2025
Figure 1 for The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
Figure 2 for The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
Figure 3 for The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
Figure 4 for The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
Viaarxiv icon

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Add code
Feb 13, 2025
Figure 1 for EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Figure 2 for EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Figure 3 for EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Figure 4 for EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Viaarxiv icon

SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering

Add code
Feb 10, 2025
Figure 1 for SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Figure 2 for SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Figure 3 for SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Figure 4 for SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Viaarxiv icon

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Add code
Dec 03, 2024
Viaarxiv icon