Picture for Jianbing Zhang

Jianbing Zhang

The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

Add code
Apr 19, 2024
Figure 1 for The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Figure 2 for The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Figure 3 for The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Figure 4 for The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Viaarxiv icon

MixRED: A Mix-lingual Relation Extraction Dataset

Add code
Mar 23, 2024
Viaarxiv icon

Cobra Effect in Reference-Free Image Captioning Metrics

Add code
Feb 18, 2024
Viaarxiv icon

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

Add code
Feb 15, 2024
Viaarxiv icon

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

Add code
Jan 17, 2024
Viaarxiv icon

M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis

Add code
Oct 23, 2023
Figure 1 for M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis
Figure 2 for M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis
Figure 3 for M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis
Figure 4 for M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis
Viaarxiv icon

Bounding and Filling: A Fast and Flexible Framework for Image Captioning

Add code
Oct 15, 2023
Viaarxiv icon

DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking

Add code
Oct 09, 2023
Figure 1 for DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
Figure 2 for DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
Figure 3 for DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
Figure 4 for DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
Viaarxiv icon

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

Add code
Aug 06, 2023
Figure 1 for Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
Figure 2 for Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
Figure 3 for Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
Figure 4 for Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
Viaarxiv icon

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model

Add code
Aug 02, 2023
Figure 1 for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Figure 2 for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Figure 3 for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Figure 4 for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Viaarxiv icon