Picture for Linli Yao

Linli Yao

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction

Add code
May 28, 2025
Viaarxiv icon

RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection

Add code
May 18, 2025
Viaarxiv icon

ICon: In-Context Contribution for Automatic Data Selection

Add code
May 08, 2025
Viaarxiv icon

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Add code
Apr 24, 2025
Viaarxiv icon

Generative Frame Sampler for Long Video Understanding

Add code
Mar 12, 2025
Viaarxiv icon

Temporal Reasoning Transfer from Text to Video

Add code
Oct 08, 2024
Figure 1 for Temporal Reasoning Transfer from Text to Video
Figure 2 for Temporal Reasoning Transfer from Text to Video
Figure 3 for Temporal Reasoning Transfer from Text to Video
Figure 4 for Temporal Reasoning Transfer from Text to Video
Viaarxiv icon

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

Add code
Jun 24, 2024
Figure 1 for UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Figure 2 for UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Figure 3 for UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Figure 4 for UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Viaarxiv icon

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

Add code
May 31, 2024
Figure 1 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 2 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 3 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 4 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Viaarxiv icon

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

Add code
Apr 16, 2024
Figure 1 for LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Figure 2 for LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Figure 3 for LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Figure 4 for LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Viaarxiv icon

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Add code
Dec 04, 2023
Figure 1 for TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Figure 2 for TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Figure 3 for TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Figure 4 for TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Viaarxiv icon