Picture for Yanjiang Guo

Yanjiang Guo

VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model

Add code
Feb 15, 2026
Viaarxiv icon

BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation

Add code
Feb 11, 2026
Viaarxiv icon

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models

Add code
Jan 06, 2026
Viaarxiv icon

villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Add code
Jul 31, 2025
Viaarxiv icon

UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

Add code
Jan 31, 2025
Figure 1 for UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
Figure 2 for UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
Figure 3 for UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
Figure 4 for UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
Viaarxiv icon

Improving Vision-Language-Action Model with Online Reinforcement Learning

Add code
Jan 28, 2025
Figure 1 for Improving Vision-Language-Action Model with Online Reinforcement Learning
Figure 2 for Improving Vision-Language-Action Model with Online Reinforcement Learning
Figure 3 for Improving Vision-Language-Action Model with Online Reinforcement Learning
Figure 4 for Improving Vision-Language-Action Model with Online Reinforcement Learning
Viaarxiv icon

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Add code
Dec 19, 2024
Figure 1 for Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Figure 2 for Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Figure 3 for Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Figure 4 for Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Viaarxiv icon

Prediction with Action: Visual Policy Learning via Joint Denoising Process

Add code
Nov 27, 2024
Figure 1 for Prediction with Action: Visual Policy Learning via Joint Denoising Process
Figure 2 for Prediction with Action: Visual Policy Learning via Joint Denoising Process
Figure 3 for Prediction with Action: Visual Policy Learning via Joint Denoising Process
Figure 4 for Prediction with Action: Visual Policy Learning via Joint Denoising Process
Viaarxiv icon

Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

Add code
Aug 26, 2024
Figure 1 for Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning
Figure 2 for Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning
Figure 3 for Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning
Figure 4 for Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning
Viaarxiv icon

DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

Add code
Jul 01, 2023
Figure 1 for DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment
Figure 2 for DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment
Figure 3 for DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment
Figure 4 for DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment
Viaarxiv icon