Picture for Gabriel Sarch

Gabriel Sarch

Vero: An Open RL Recipe for General Visual Reasoning

Add code
Apr 07, 2026
Viaarxiv icon

Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames

Add code
May 30, 2025
Figure 1 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 2 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 3 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Figure 4 for Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Viaarxiv icon

Grounded Reinforcement Learning for Visual Reasoning

Add code
May 29, 2025
Figure 1 for Grounded Reinforcement Learning for Visual Reasoning
Figure 2 for Grounded Reinforcement Learning for Visual Reasoning
Figure 3 for Grounded Reinforcement Learning for Visual Reasoning
Figure 4 for Grounded Reinforcement Learning for Visual Reasoning
Viaarxiv icon

Grounding Task Assistance with Multimodal Cues from a Single Demonstration

Add code
May 02, 2025
Figure 1 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 2 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 3 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Figure 4 for Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Viaarxiv icon

ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights

Add code
Jun 20, 2024
Viaarxiv icon

Neural Representations of Dynamic Visual Stimuli

Add code
Jun 04, 2024
Figure 1 for Neural Representations of Dynamic Visual Stimuli
Figure 2 for Neural Representations of Dynamic Visual Stimuli
Figure 3 for Neural Representations of Dynamic Visual Stimuli
Figure 4 for Neural Representations of Dynamic Visual Stimuli
Viaarxiv icon

HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

Add code
Apr 29, 2024
Viaarxiv icon

ODIN: A Single Model for 2D and 3D Perception

Add code
Jan 04, 2024
Figure 1 for ODIN: A Single Model for 2D and 3D Perception
Figure 2 for ODIN: A Single Model for 2D and 3D Perception
Figure 3 for ODIN: A Single Model for 2D and 3D Perception
Figure 4 for ODIN: A Single Model for 2D and 3D Perception
Viaarxiv icon

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

Add code
Oct 23, 2023
Figure 1 for Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Figure 2 for Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Figure 3 for Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Figure 4 for Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Viaarxiv icon

3D View Prediction Models of the Dorsal Visual Stream

Add code
Sep 04, 2023
Figure 1 for 3D View Prediction Models of the Dorsal Visual Stream
Figure 2 for 3D View Prediction Models of the Dorsal Visual Stream
Viaarxiv icon