Picture for Xihui Liu

Xihui Liu

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

Add code
Dec 05, 2024
Viaarxiv icon

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Viaarxiv icon

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

Add code
Dec 05, 2024
Viaarxiv icon

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Add code
Dec 04, 2024
Viaarxiv icon

SAMPart3D: Segment Any Part in 3D Objects

Add code
Nov 11, 2024
Viaarxiv icon

WorldSimBench: Towards Video Generation Models as World Simulators

Add code
Oct 23, 2024
Figure 1 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 2 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 3 for WorldSimBench: Towards Video Generation Models as World Simulators
Figure 4 for WorldSimBench: Towards Video Generation Models as World Simulators
Viaarxiv icon

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Add code
Oct 17, 2024
Viaarxiv icon

LVD-2M: A Long-take Video Dataset with Temporally Dense Captions

Add code
Oct 14, 2024
Viaarxiv icon

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Add code
Oct 03, 2024
Viaarxiv icon

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

Add code
Oct 02, 2024
Figure 1 for Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Figure 2 for Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Figure 3 for Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Figure 4 for Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Viaarxiv icon