Picture for Shilong Liu

Shilong Liu

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

Add code
Jul 02, 2024
Viaarxiv icon

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Add code
Jul 01, 2024
Viaarxiv icon

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Add code
May 16, 2024
Viaarxiv icon

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

Add code
May 07, 2024
Figure 1 for Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
Figure 2 for Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
Figure 3 for Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
Figure 4 for Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
Viaarxiv icon

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Add code
Mar 21, 2024
Figure 1 for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Figure 2 for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Figure 3 for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Figure 4 for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Viaarxiv icon

TAPTR: Tracking Any Point with Transformers as Detection

Add code
Mar 19, 2024
Figure 1 for TAPTR: Tracking Any Point with Transformers as Detection
Figure 2 for TAPTR: Tracking Any Point with Transformers as Detection
Figure 3 for TAPTR: Tracking Any Point with Transformers as Detection
Figure 4 for TAPTR: Tracking Any Point with Transformers as Detection
Viaarxiv icon

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Add code
Jan 25, 2024
Viaarxiv icon

Interfacing Foundation Models' Embeddings

Add code
Dec 12, 2023
Figure 1 for Interfacing Foundation Models' Embeddings
Figure 2 for Interfacing Foundation Models' Embeddings
Figure 3 for Interfacing Foundation Models' Embeddings
Figure 4 for Interfacing Foundation Models' Embeddings
Viaarxiv icon

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Add code
Dec 05, 2023
Figure 1 for LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Figure 2 for LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Figure 3 for LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Figure 4 for LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Viaarxiv icon

T-Rex: Counting by Visual Prompting

Add code
Nov 22, 2023
Figure 1 for T-Rex: Counting by Visual Prompting
Figure 2 for T-Rex: Counting by Visual Prompting
Figure 3 for T-Rex: Counting by Visual Prompting
Figure 4 for T-Rex: Counting by Visual Prompting
Viaarxiv icon