Picture for Linjie Yang

Linjie Yang

Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Add code
Dec 19, 2023
Viaarxiv icon

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

Add code
Oct 11, 2023
Viaarxiv icon

Selective Feature Adapter for Dense Vision Transformers

Add code
Oct 03, 2023
Figure 1 for Selective Feature Adapter for Dense Vision Transformers
Figure 2 for Selective Feature Adapter for Dense Vision Transformers
Figure 3 for Selective Feature Adapter for Dense Vision Transformers
Figure 4 for Selective Feature Adapter for Dense Vision Transformers
Viaarxiv icon

The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering

Add code
Sep 27, 2023
Viaarxiv icon

Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

Add code
Jul 27, 2023
Viaarxiv icon

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

Add code
Jul 22, 2023
Viaarxiv icon

Exploring the Role of Audio in Video Captioning

Add code
Jun 21, 2023
Viaarxiv icon

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

Add code
Apr 06, 2023
Figure 1 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition
Figure 2 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition
Figure 3 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition
Figure 4 for $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition
Viaarxiv icon

FAQ: Feature Aggregated Queries for Transformer-based Video Object Detectors

Add code
Mar 20, 2023
Viaarxiv icon

Revisiting Training-free NAS Metrics: An Efficient Training-based Method

Add code
Nov 16, 2022
Viaarxiv icon