Picture for Jiebo Luo

Jiebo Luo

SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users

Add code
Apr 14, 2025
Viaarxiv icon

ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration

Add code
Apr 11, 2025
Viaarxiv icon

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Add code
Apr 04, 2025
Viaarxiv icon

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Add code
Mar 31, 2025
Viaarxiv icon

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Add code
Mar 30, 2025
Viaarxiv icon

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Add code
Mar 16, 2025
Viaarxiv icon

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Add code
Mar 12, 2025
Viaarxiv icon

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

Add code
Mar 11, 2025
Viaarxiv icon

Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs

Add code
Feb 26, 2025
Viaarxiv icon