Picture for Xiaodan Liang

Xiaodan Liang

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos

Add code
Dec 02, 2024
Figure 1 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 2 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 3 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 4 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Viaarxiv icon

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

Add code
Nov 18, 2024
Figure 1 for AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Figure 2 for AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Figure 3 for AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Figure 4 for AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Viaarxiv icon

InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models

Add code
Nov 18, 2024
Figure 1 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 2 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 3 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 4 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Viaarxiv icon

VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation

Add code
Nov 14, 2024
Viaarxiv icon

StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration

Add code
Nov 07, 2024
Figure 1 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 2 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 3 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 4 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Viaarxiv icon

Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models

Add code
Nov 04, 2024
Figure 1 for Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
Figure 2 for Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
Figure 3 for Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
Figure 4 for Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
Viaarxiv icon

Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes

Add code
Oct 14, 2024
Figure 1 for Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
Figure 2 for Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
Figure 3 for Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
Figure 4 for Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
Viaarxiv icon

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation

Add code
Oct 14, 2024
Figure 1 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 2 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 3 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Figure 4 for PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Viaarxiv icon

Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars

Add code
Oct 11, 2024
Figure 1 for Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
Figure 2 for Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
Figure 3 for Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
Figure 4 for Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
Viaarxiv icon

UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

Add code
Oct 03, 2024
Figure 1 for UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Figure 2 for UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Figure 3 for UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Figure 4 for UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation
Viaarxiv icon