Picture for Jiafei Duan

Jiafei Duan

SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation

Add code
Jan 30, 2025
Figure 1 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 2 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 3 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Figure 4 for SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
Viaarxiv icon

SAT: Spatial Aptitude Training for Multimodal Language Models

Add code
Dec 10, 2024
Figure 1 for SAT: Spatial Aptitude Training for Multimodal Language Models
Figure 2 for SAT: Spatial Aptitude Training for Multimodal Language Models
Figure 3 for SAT: Spatial Aptitude Training for Multimodal Language Models
Figure 4 for SAT: Spatial Aptitude Training for Multimodal Language Models
Viaarxiv icon

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

Add code
Oct 01, 2024
Figure 1 for AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Figure 2 for AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Figure 3 for AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Figure 4 for AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Viaarxiv icon

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Add code
Jun 27, 2024
Figure 1 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 2 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 3 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 4 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Viaarxiv icon

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

Add code
Jun 15, 2024
Viaarxiv icon

Octopi: Object Property Reasoning with Large Tactile-Language Models

Add code
May 05, 2024
Figure 1 for Octopi: Object Property Reasoning with Large Tactile-Language Models
Figure 2 for Octopi: Object Property Reasoning with Large Tactile-Language Models
Figure 3 for Octopi: Object Property Reasoning with Large Tactile-Language Models
Figure 4 for Octopi: Object Property Reasoning with Large Tactile-Language Models
Viaarxiv icon

EVE: Enabling Anyone to Train Robot using Augmented Reality

Add code
Apr 09, 2024
Figure 1 for EVE: Enabling Anyone to Train Robot using Augmented Reality
Figure 2 for EVE: Enabling Anyone to Train Robot using Augmented Reality
Figure 3 for EVE: Enabling Anyone to Train Robot using Augmented Reality
Figure 4 for EVE: Enabling Anyone to Train Robot using Augmented Reality
Viaarxiv icon

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

Add code
Feb 13, 2024
Viaarxiv icon

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Add code
Nov 07, 2023
Figure 1 for Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Figure 2 for Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Figure 3 for Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Figure 4 for Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Viaarxiv icon

NEWTON: Are Large Language Models Capable of Physical Reasoning?

Add code
Oct 10, 2023
Figure 1 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 2 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 3 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Figure 4 for NEWTON: Are Large Language Models Capable of Physical Reasoning?
Viaarxiv icon