Picture for Ran He

Ran He

InfoBFR: Real-World Blind Face Restoration via Information Bottleneck

Add code
Jan 26, 2025
Figure 1 for InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
Figure 2 for InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
Figure 3 for InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
Figure 4 for InfoBFR: Real-World Blind Face Restoration via Information Bottleneck
Viaarxiv icon

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Add code
Jan 03, 2025
Figure 1 for VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Figure 2 for VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Figure 3 for VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Figure 4 for VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Viaarxiv icon

Sample Correlation for Fingerprinting Deep Face Recognition

Add code
Dec 30, 2024
Viaarxiv icon

Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation

Add code
Dec 30, 2024
Figure 1 for Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation
Figure 2 for Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation
Figure 3 for Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation
Figure 4 for Prototypical Distillation and Debiased Tuning for Black-box Unsupervised Domain Adaptation
Viaarxiv icon

Towards Compatible Fine-tuning for Vision-Language Model Updates

Add code
Dec 30, 2024
Figure 1 for Towards Compatible Fine-tuning for Vision-Language Model Updates
Figure 2 for Towards Compatible Fine-tuning for Vision-Language Model Updates
Figure 3 for Towards Compatible Fine-tuning for Vision-Language Model Updates
Figure 4 for Towards Compatible Fine-tuning for Vision-Language Model Updates
Viaarxiv icon

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

Add code
Dec 02, 2024
Figure 1 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 2 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 3 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Figure 4 for T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
Viaarxiv icon

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Add code
Nov 22, 2024
Figure 1 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 2 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 3 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Figure 4 for MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs
Viaarxiv icon

Breaking the Low-Rank Dilemma of Linear Attention

Add code
Nov 14, 2024
Figure 1 for Breaking the Low-Rank Dilemma of Linear Attention
Figure 2 for Breaking the Low-Rank Dilemma of Linear Attention
Figure 3 for Breaking the Low-Rank Dilemma of Linear Attention
Figure 4 for Breaking the Low-Rank Dilemma of Linear Attention
Viaarxiv icon

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

Add code
Nov 14, 2024
Figure 1 for Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Figure 2 for Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Figure 3 for Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Figure 4 for Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Viaarxiv icon

Not Just Object, But State: Compositional Incremental Learning without Forgetting

Add code
Nov 05, 2024
Viaarxiv icon