Picture for Xiao Yang

Xiao Yang

Mind over Space: Can Multimodal Large Language Models Mentally Navigate?

Add code
Mar 23, 2026
Viaarxiv icon

OmniEarth: A Benchmark for Evaluating Vision-Language Models in Geospatial Tasks

Add code
Mar 10, 2026
Viaarxiv icon

GeoAlignCLIP: Enhancing Fine-Grained Vision-Language Alignment in Remote Sensing via Multi-Granular Consistency Learning

Add code
Mar 10, 2026
Viaarxiv icon

Helios: Real Real-Time Long Video Generation Model

Add code
Mar 04, 2026
Viaarxiv icon

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

Add code
Mar 02, 2026
Viaarxiv icon

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Add code
Mar 02, 2026
Viaarxiv icon

FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents

Add code
Mar 02, 2026
Viaarxiv icon

UniRef-Image-Edit: Towards Scalable and Consistent Multi-Reference Image Editing

Add code
Feb 15, 2026
Viaarxiv icon

Geometry-Aware Rotary Position Embedding for Consistent Video World Model

Add code
Feb 08, 2026
Viaarxiv icon

Adaptive 1D Video Diffusion Autoencoder

Add code
Feb 04, 2026
Viaarxiv icon