Picture for Xiangyu Zhu

Xiangyu Zhu

MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model

Add code
Jun 16, 2026
Viaarxiv icon

GeoHAT: Geometry-Adaptive Hybrid Action Transformer for Mobile Manipulation

Add code
Jun 11, 2026
Viaarxiv icon

DGSG-Mind: Dynamic 3D Gaussian Scene Graphs for Long-Term Scene Understanding and Grounding

Add code
May 28, 2026
Viaarxiv icon

Adaptive 3D Convolution for Remote Sensing Image Fusion

Add code
May 10, 2026
Viaarxiv icon

TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation

Add code
Apr 16, 2026
Viaarxiv icon

Direct Discrepancy Replay: Distribution-Discrepancy Condensation and Manifold-Consistent Replay for Continual Face Forgery Detection

Add code
Apr 14, 2026
Viaarxiv icon

From Intuition to Investigation: A Tool-Augmented Reasoning MLLM Framework for Generalizable Face Anti-Spoofing

Add code
Mar 01, 2026
Viaarxiv icon

One Ring to Rule Them All: Unifying Group-Based RL via Dynamic Power-Mean Geometry

Add code
Jan 30, 2026
Viaarxiv icon

UPA: Unsupervised Prompt Agent via Tree-Based Search and Selection

Add code
Jan 30, 2026
Viaarxiv icon

Pose-RFT: Enhancing MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning

Add code
Aug 11, 2025
Viaarxiv icon