Picture for Yatai Ji

Yatai Ji

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Add code
Jul 10, 2024
Viaarxiv icon

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Add code
Jun 20, 2024
Figure 1 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 2 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 3 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Figure 4 for PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Viaarxiv icon

Taming Lookup Tables for Efficient Image Retouching

Add code
Mar 28, 2024
Figure 1 for Taming Lookup Tables for Efficient Image Retouching
Figure 2 for Taming Lookup Tables for Efficient Image Retouching
Figure 3 for Taming Lookup Tables for Efficient Image Retouching
Figure 4 for Taming Lookup Tables for Efficient Image Retouching
Viaarxiv icon

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Add code
Nov 28, 2023
Viaarxiv icon

Global and Local Semantic Completion Learning for Vision-Language Pre-training

Add code
Jun 12, 2023
Figure 1 for Global and Local Semantic Completion Learning for Vision-Language Pre-training
Figure 2 for Global and Local Semantic Completion Learning for Vision-Language Pre-training
Figure 3 for Global and Local Semantic Completion Learning for Vision-Language Pre-training
Figure 4 for Global and Local Semantic Completion Learning for Vision-Language Pre-training
Viaarxiv icon

Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition

Add code
Dec 09, 2022
Figure 1 for Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition
Figure 2 for Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition
Figure 3 for Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition
Figure 4 for Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition
Viaarxiv icon

Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning

Add code
Nov 24, 2022
Figure 1 for Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Figure 2 for Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Figure 3 for Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Figure 4 for Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Viaarxiv icon

MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model

Add code
Oct 11, 2022
Figure 1 for MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model
Figure 2 for MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model
Figure 3 for MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model
Figure 4 for MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model
Viaarxiv icon