Picture for Zhibin Lan

Zhibin Lan

Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework

Add code
Jan 30, 2026
Viaarxiv icon

Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment

Add code
Nov 09, 2025
Viaarxiv icon

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

Add code
Mar 04, 2025
Figure 1 for LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Figure 2 for LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Figure 3 for LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Figure 4 for LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Viaarxiv icon

"I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

Add code
Dec 26, 2024
Figure 1 for "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities
Figure 2 for "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities
Figure 3 for "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities
Figure 4 for "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities
Viaarxiv icon

Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training

Add code
Oct 06, 2024
Figure 1 for Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training
Figure 2 for Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training
Figure 3 for Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training
Figure 4 for Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training
Viaarxiv icon

Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation

Add code
Jul 03, 2024
Viaarxiv icon

A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges

Add code
May 23, 2024
Viaarxiv icon

Exploring Better Text Image Translation with Multimodal Codebook

Add code
Jun 02, 2023
Figure 1 for Exploring Better Text Image Translation with Multimodal Codebook
Figure 2 for Exploring Better Text Image Translation with Multimodal Codebook
Figure 3 for Exploring Better Text Image Translation with Multimodal Codebook
Figure 4 for Exploring Better Text Image Translation with Multimodal Codebook
Viaarxiv icon