Picture for Yi Ru Wang

Yi Ru Wang

MolmoAct2: Action Reasoning Models for Real-world Deployment

Add code
May 04, 2026
Viaarxiv icon

RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains

Add code
Apr 06, 2026
Viaarxiv icon

MolmoAct: Action Reasoning Models that can Reason in Space

Add code
Aug 12, 2025
Figure 1 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 2 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 3 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 4 for MolmoAct: Action Reasoning Models that can Reason in Space
Viaarxiv icon

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

Add code
May 15, 2025
Viaarxiv icon

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Add code
Jan 30, 2025
Figure 1 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 2 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 3 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 4 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Viaarxiv icon

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Add code
Oct 01, 2024
Figure 1 for AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Figure 2 for AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Figure 3 for AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Figure 4 for AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Viaarxiv icon

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Add code
Jun 27, 2024
Figure 1 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 2 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 3 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 4 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Viaarxiv icon

NEWTON: Are Large Language Models Capable of Physical Reasoning?

Add code
Oct 10, 2023
Figure 1 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 2 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 3 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 4 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Viaarxiv icon

AR2-D2:Training a Robot Without a Robot

Add code
Jun 23, 2023
Viaarxiv icon

MVTrans: Multi-View Perception of Transparent Objects

Add code
Feb 22, 2023
Figure 1 for MVTrans: Multi-View Perception of Transparent Objects
Figure 2 for MVTrans: Multi-View Perception of Transparent Objects
Figure 3 for MVTrans: Multi-View Perception of Transparent Objects
Figure 4 for MVTrans: Multi-View Perception of Transparent Objects
Viaarxiv icon