Picture for Xiaojian Ma

Xiaojian Ma

Multi-modal Situated Reasoning in 3D Scenes

Add code
Sep 04, 2024
Viaarxiv icon

Task-oriented Sequential Grounding in 3D Scenes

Add code
Aug 07, 2024
Viaarxiv icon

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Add code
Jul 07, 2024
Viaarxiv icon

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

Add code
Jun 27, 2024
Viaarxiv icon

Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

Add code
May 27, 2024
Viaarxiv icon

Unifying 3D Vision-Language Understanding via Promptable Queries

Add code
May 19, 2024
Viaarxiv icon

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Add code
Mar 22, 2024
Viaarxiv icon

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Add code
Mar 18, 2024
Viaarxiv icon

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

Add code
Mar 08, 2024
Viaarxiv icon

CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update

Add code
Dec 18, 2023
Viaarxiv icon