Picture for Winson Han

Winson Han

MolmoPoint: Better Pointing for VLMs with Grounding Tokens

Add code
Mar 30, 2026
Viaarxiv icon

Unified Spatio-Temporal Token Scoring for Efficient Video VLMs

Add code
Mar 18, 2026
Viaarxiv icon

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation

Add code
Mar 17, 2026
Viaarxiv icon

MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

Add code
Feb 11, 2026
Viaarxiv icon

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Add code
Jan 15, 2026
Viaarxiv icon

Visual Representations inside the Language Model

Add code
Oct 06, 2025
Figure 1 for Visual Representations inside the Language Model
Figure 2 for Visual Representations inside the Language Model
Figure 3 for Visual Representations inside the Language Model
Figure 4 for Visual Representations inside the Language Model
Viaarxiv icon

MolmoAct: Action Reasoning Models that can Reason in Space

Add code
Aug 12, 2025
Figure 1 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 2 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 3 for MolmoAct: Action Reasoning Models that can Reason in Space
Figure 4 for MolmoAct: Action Reasoning Models that can Reason in Space
Viaarxiv icon

GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation

Add code
May 19, 2025
Viaarxiv icon

The One RING: a Robotic Indoor Navigation Generalist

Add code
Dec 18, 2024
Figure 1 for The One RING: a Robotic Indoor Navigation Generalist
Figure 2 for The One RING: a Robotic Indoor Navigation Generalist
Figure 3 for The One RING: a Robotic Indoor Navigation Generalist
Figure 4 for The One RING: a Robotic Indoor Navigation Generalist
Viaarxiv icon

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Add code
Dec 14, 2023
Viaarxiv icon