Picture for Xiaoda Yang

Xiaoda Yang

TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation

Add code
May 03, 2026
Viaarxiv icon

A Progressive Training Strategy for Vision-Language Models to Counteract Spatio-Temporal Hallucinations in Embodied Reasoning

Add code
Apr 12, 2026
Viaarxiv icon

From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning

Add code
Apr 12, 2026
Viaarxiv icon

ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks

Add code
Apr 09, 2026
Viaarxiv icon

SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation

Add code
Mar 23, 2026
Viaarxiv icon

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation

Add code
Jun 24, 2025
Figure 1 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 2 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 3 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 4 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Viaarxiv icon

Vela: Scalable Embeddings with Voice Large Language Models for Multimodal Retrieval

Add code
Jun 17, 2025
Viaarxiv icon

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

Add code
May 30, 2025
Viaarxiv icon

Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision

Add code
Apr 30, 2025
Viaarxiv icon

EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model

Add code
Apr 18, 2025
Viaarxiv icon