Picture for Qi Wu

Qi Wu

SCI Institute, UC Davis

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

Add code
Aug 21, 2024
Figure 1 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 2 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 3 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 4 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Viaarxiv icon

Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments

Add code
Jul 31, 2024
Viaarxiv icon

XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

Add code
Jul 28, 2024
Figure 1 for XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Figure 2 for XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Figure 3 for XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Figure 4 for XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Viaarxiv icon

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

Add code
Jul 23, 2024
Figure 1 for Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning
Figure 2 for Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning
Figure 3 for Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning
Figure 4 for Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning
Viaarxiv icon

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Add code
Jul 17, 2024
Figure 1 for NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Figure 2 for NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Figure 3 for NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Figure 4 for NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Viaarxiv icon

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Add code
Jul 09, 2024
Figure 1 for Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Figure 2 for Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Figure 3 for Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Viaarxiv icon

HumanPlus: Humanoid Shadowing and Imitation from Humans

Add code
Jun 15, 2024
Viaarxiv icon

SMART: Scene-motion-aware human action recognition framework for mental disorder group

Add code
Jun 07, 2024
Figure 1 for SMART: Scene-motion-aware human action recognition framework for mental disorder group
Figure 2 for SMART: Scene-motion-aware human action recognition framework for mental disorder group
Figure 3 for SMART: Scene-motion-aware human action recognition framework for mental disorder group
Figure 4 for SMART: Scene-motion-aware human action recognition framework for mental disorder group
Viaarxiv icon

Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts

Add code
Jun 04, 2024
Figure 1 for Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Figure 2 for Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Figure 3 for Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Figure 4 for Why Only Text: Empowering Vision-and-Language Navigation with Multi-modal Prompts
Viaarxiv icon

Augmented Commonsense Knowledge for Remote Object Grounding

Add code
Jun 03, 2024
Figure 1 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 2 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 3 for Augmented Commonsense Knowledge for Remote Object Grounding
Figure 4 for Augmented Commonsense Knowledge for Remote Object Grounding
Viaarxiv icon