Picture for Rongtao Xu

Rongtao Xu

LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel

Add code
Apr 22, 2026
Viaarxiv icon

AnySlot: Goal-Conditioned Vision-Language-Action Policies for Zero-Shot Slot-Level Placement

Add code
Apr 14, 2026
Viaarxiv icon

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Add code
Apr 07, 2026
Viaarxiv icon

ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation

Add code
Mar 30, 2026
Viaarxiv icon

HMR-1: Hierarchical Massage Robot with Vision-Language-Model for Embodied Healthcare

Add code
Mar 09, 2026
Viaarxiv icon

\textsc{NaVIDA}: Vision-Language Navigation with Inverse Dynamics Augmentation

Add code
Jan 26, 2026
Viaarxiv icon

MoFu: Scale-Aware Modulation and Fourier Fusion for Multi-Subject Video Generation

Add code
Dec 26, 2025
Viaarxiv icon

GLaD: Geometric Latent Distillation for Vision-Language-Action Models

Add code
Dec 10, 2025
Figure 1 for GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Figure 2 for GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Figure 3 for GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Figure 4 for GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Viaarxiv icon

Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction

Add code
Nov 13, 2025
Viaarxiv icon

CurriFlow: Curriculum-Guided Depth Fusion with Optical Flow-Based Temporal Alignment for 3D Semantic Scene Completion

Add code
Oct 14, 2025
Viaarxiv icon