Picture for Xiaodan Liang

Xiaodan Liang

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Add code
Apr 27, 2025
Viaarxiv icon

A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Add code
Apr 21, 2025
Figure 1 for A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Figure 2 for A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Figure 3 for A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Figure 4 for A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Viaarxiv icon

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

Add code
Mar 25, 2025
Viaarxiv icon

Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models

Add code
Mar 24, 2025
Figure 1 for Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models
Figure 2 for Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models
Figure 3 for Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models
Figure 4 for Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models
Viaarxiv icon

WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation

Add code
Mar 11, 2025
Viaarxiv icon

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?

Add code
Mar 08, 2025
Viaarxiv icon

Structured Preference Optimization for Vision-Language Long-Horizon Task Planning

Add code
Feb 28, 2025
Viaarxiv icon

UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Add code
Feb 25, 2025
Figure 1 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Figure 2 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Figure 3 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Figure 4 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
Viaarxiv icon

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

Add code
Feb 21, 2025
Figure 1 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 2 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 3 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Figure 4 for TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba
Viaarxiv icon

ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions

Add code
Jan 21, 2025
Figure 1 for ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
Figure 2 for ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
Figure 3 for ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
Figure 4 for ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
Viaarxiv icon