Picture for Ruimao Zhang

Ruimao Zhang

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Add code
Jul 17, 2024
Figure 1 for F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Figure 2 for F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Figure 3 for F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Figure 4 for F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Viaarxiv icon

Open-World Human-Object Interaction Detection via Multi-modal Prompts

Add code
Jun 11, 2024
Figure 1 for Open-World Human-Object Interaction Detection via Multi-modal Prompts
Figure 2 for Open-World Human-Object Interaction Detection via Multi-modal Prompts
Figure 3 for Open-World Human-Object Interaction Detection via Multi-modal Prompts
Figure 4 for Open-World Human-Object Interaction Detection via Multi-modal Prompts
Viaarxiv icon

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Add code
May 30, 2024
Viaarxiv icon

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Add code
Apr 25, 2024
Figure 1 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 2 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 3 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Figure 4 for SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Viaarxiv icon

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

Add code
Mar 19, 2024
Figure 1 for MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Figure 2 for MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Figure 3 for MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Figure 4 for MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
Viaarxiv icon

Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration

Add code
Feb 07, 2024
Viaarxiv icon

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Add code
Dec 13, 2023
Figure 1 for MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Figure 2 for MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Figure 3 for MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Figure 4 for MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Viaarxiv icon

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer

Add code
Dec 12, 2023
Viaarxiv icon

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

Add code
Dec 11, 2023
Figure 1 for SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Figure 2 for SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Figure 3 for SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Figure 4 for SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Viaarxiv icon

SEED-Bench-2: Benchmarking Multimodal Large Language Models

Add code
Nov 28, 2023
Figure 1 for SEED-Bench-2: Benchmarking Multimodal Large Language Models
Figure 2 for SEED-Bench-2: Benchmarking Multimodal Large Language Models
Figure 3 for SEED-Bench-2: Benchmarking Multimodal Large Language Models
Figure 4 for SEED-Bench-2: Benchmarking Multimodal Large Language Models
Viaarxiv icon