Picture for Xiaodan Liang

Xiaodan Liang

LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

Add code
Aug 11, 2025
Viaarxiv icon

X-SAM: From Segment Anything to Any Segmentation

Add code
Aug 06, 2025
Viaarxiv icon

C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Add code
Jul 22, 2025
Viaarxiv icon

3D-MoRe: Unified Modal-Contextual Reasoning for Embodied Question Answering

Add code
Jul 16, 2025
Viaarxiv icon

PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

Add code
Jun 10, 2025
Viaarxiv icon

Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs

Add code
Jun 06, 2025
Viaarxiv icon

TreeRPO: Tree Relative Policy Optimization

Add code
Jun 05, 2025
Viaarxiv icon

Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning

Add code
May 26, 2025
Viaarxiv icon

MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents

Add code
May 26, 2025
Viaarxiv icon

SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

Add code
May 25, 2025
Viaarxiv icon