Picture for Kaidong Zhang

Kaidong Zhang

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

Add code
May 10, 2026
Viaarxiv icon

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Add code
Apr 07, 2026
Viaarxiv icon

Semantic One-Dimensional Tokenizer for Image Reconstruction and Generation

Add code
Mar 17, 2026
Viaarxiv icon

Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models

Add code
Jan 08, 2026
Viaarxiv icon

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

Add code
May 03, 2025
Figure 1 for RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Figure 2 for RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Figure 3 for RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Figure 4 for RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Viaarxiv icon

A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Add code
Apr 21, 2025
Figure 1 for A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Figure 2 for A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Figure 3 for A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Figure 4 for A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Viaarxiv icon

InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction

Add code
Dec 08, 2024
Figure 1 for InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction
Figure 2 for InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction
Figure 3 for InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction
Figure 4 for InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction
Viaarxiv icon

NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation

Add code
Nov 23, 2024
Figure 1 for NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation
Figure 2 for NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation
Figure 3 for NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation
Figure 4 for NeRF Inpainting with Geometric Diffusion Prior and Balanced Score Distillation
Viaarxiv icon

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing

Add code
Nov 17, 2024
Viaarxiv icon

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

Add code
Oct 14, 2024
Figure 1 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 2 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 3 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 4 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Viaarxiv icon