Picture for Yiwu Zhong

Yiwu Zhong

Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models

Add code
Mar 27, 2024
Figure 1 for Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Figure 2 for Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Figure 3 for Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Figure 4 for Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models
Viaarxiv icon

Towards Learning a Generalist Model for Embodied Navigation

Add code
Dec 06, 2023
Viaarxiv icon

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Add code
Nov 13, 2023
Viaarxiv icon

Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models

Add code
Oct 04, 2023
Figure 1 for Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
Figure 2 for Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
Figure 3 for Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
Figure 4 for Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
Viaarxiv icon

Learning Concise and Descriptive Attributes for Visual Recognition

Add code
Aug 07, 2023
Figure 1 for Learning Concise and Descriptive Attributes for Visual Recognition
Figure 2 for Learning Concise and Descriptive Attributes for Visual Recognition
Figure 3 for Learning Concise and Descriptive Attributes for Visual Recognition
Figure 4 for Learning Concise and Descriptive Attributes for Visual Recognition
Viaarxiv icon

Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations

Add code
Mar 31, 2023
Figure 1 for Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Figure 2 for Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Figure 3 for Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Figure 4 for Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Viaarxiv icon

RegionCLIP: Region-based Language-Image Pretraining

Add code
Dec 16, 2021
Figure 1 for RegionCLIP: Region-based Language-Image Pretraining
Figure 2 for RegionCLIP: Region-based Language-Image Pretraining
Figure 3 for RegionCLIP: Region-based Language-Image Pretraining
Figure 4 for RegionCLIP: Region-based Language-Image Pretraining
Viaarxiv icon

Grounded Language-Image Pre-training

Add code
Dec 07, 2021
Figure 1 for Grounded Language-Image Pre-training
Figure 2 for Grounded Language-Image Pre-training
Figure 3 for Grounded Language-Image Pre-training
Figure 4 for Grounded Language-Image Pre-training
Viaarxiv icon

Learning to Generate Scene Graph from Natural Language Supervision

Add code
Sep 06, 2021
Figure 1 for Learning to Generate Scene Graph from Natural Language Supervision
Figure 2 for Learning to Generate Scene Graph from Natural Language Supervision
Figure 3 for Learning to Generate Scene Graph from Natural Language Supervision
Figure 4 for Learning to Generate Scene Graph from Natural Language Supervision
Viaarxiv icon

Comprehensive Image Captioning via Scene Graph Decomposition

Add code
Jul 23, 2020
Figure 1 for Comprehensive Image Captioning via Scene Graph Decomposition
Figure 2 for Comprehensive Image Captioning via Scene Graph Decomposition
Figure 3 for Comprehensive Image Captioning via Scene Graph Decomposition
Figure 4 for Comprehensive Image Captioning via Scene Graph Decomposition
Viaarxiv icon