Picture for Linli Yao

Linli Yao

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

Add code
Jun 24, 2024
Figure 1 for UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Figure 2 for UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Figure 3 for UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Figure 4 for UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Viaarxiv icon

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

Add code
May 31, 2024
Figure 1 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 2 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 3 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 4 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Viaarxiv icon

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

Add code
Apr 16, 2024
Viaarxiv icon

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Add code
Dec 04, 2023
Viaarxiv icon

Edit As You Wish: Video Description Editing with Multi-grained Commands

Add code
May 15, 2023
Figure 1 for Edit As You Wish: Video Description Editing with Multi-grained Commands
Figure 2 for Edit As You Wish: Video Description Editing with Multi-grained Commands
Figure 3 for Edit As You Wish: Video Description Editing with Multi-grained Commands
Figure 4 for Edit As You Wish: Video Description Editing with Multi-grained Commands
Viaarxiv icon

Rethinking Benchmarks for Cross-modal Image-text Retrieval

Add code
Apr 21, 2023
Figure 1 for Rethinking Benchmarks for Cross-modal Image-text Retrieval
Figure 2 for Rethinking Benchmarks for Cross-modal Image-text Retrieval
Figure 3 for Rethinking Benchmarks for Cross-modal Image-text Retrieval
Figure 4 for Rethinking Benchmarks for Cross-modal Image-text Retrieval
Viaarxiv icon

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge

Add code
Nov 17, 2022
Figure 1 for CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Figure 2 for CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Figure 3 for CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Figure 4 for CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Viaarxiv icon

Image Difference Captioning with Pre-training and Contrastive Learning

Add code
Feb 09, 2022
Figure 1 for Image Difference Captioning with Pre-training and Contrastive Learning
Figure 2 for Image Difference Captioning with Pre-training and Contrastive Learning
Figure 3 for Image Difference Captioning with Pre-training and Contrastive Learning
Figure 4 for Image Difference Captioning with Pre-training and Contrastive Learning
Viaarxiv icon

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

Add code
Apr 12, 2020
Figure 1 for YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
Figure 2 for YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
Figure 3 for YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
Figure 4 for YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
Viaarxiv icon