Picture for Srikar Appalaraju

Srikar Appalaraju

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Add code
Jul 17, 2024
Viaarxiv icon

RAVEN: Multitask Retrieval Augmented Vision-Language Learning

Add code
Jun 27, 2024
Viaarxiv icon

Enhancing Vision-Language Pre-training with Rich Supervisions

Add code
Mar 05, 2024
Figure 1 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 2 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 3 for Enhancing Vision-Language Pre-training with Rich Supervisions
Figure 4 for Enhancing Vision-Language Pre-training with Rich Supervisions
Viaarxiv icon

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

Add code
Nov 15, 2023
Viaarxiv icon

Multiple-Question Multiple-Answer Text-VQA

Add code
Nov 15, 2023
Viaarxiv icon

A Multi-Modal Multilingual Benchmark for Document Image Classification

Add code
Oct 25, 2023
Figure 1 for A Multi-Modal Multilingual Benchmark for Document Image Classification
Figure 2 for A Multi-Modal Multilingual Benchmark for Document Image Classification
Figure 3 for A Multi-Modal Multilingual Benchmark for Document Image Classification
Figure 4 for A Multi-Modal Multilingual Benchmark for Document Image Classification
Viaarxiv icon

DocFormerv2: Local Features for Document Understanding

Add code
Jun 02, 2023
Figure 1 for DocFormerv2: Local Features for Document Understanding
Figure 2 for DocFormerv2: Local Features for Document Understanding
Figure 3 for DocFormerv2: Local Features for Document Understanding
Figure 4 for DocFormerv2: Local Features for Document Understanding
Viaarxiv icon

SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation

Add code
Feb 07, 2023
Figure 1 for SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Figure 2 for SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Figure 3 for SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Figure 4 for SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Viaarxiv icon

YORO -- Lightweight End to End Visual Grounding

Add code
Nov 15, 2022
Figure 1 for YORO -- Lightweight End to End Visual Grounding
Figure 2 for YORO -- Lightweight End to End Visual Grounding
Figure 3 for YORO -- Lightweight End to End Visual Grounding
Figure 4 for YORO -- Lightweight End to End Visual Grounding
Viaarxiv icon

MixGen: A New Multi-Modal Data Augmentation

Add code
Jun 16, 2022
Figure 1 for MixGen: A New Multi-Modal Data Augmentation
Figure 2 for MixGen: A New Multi-Modal Data Augmentation
Figure 3 for MixGen: A New Multi-Modal Data Augmentation
Figure 4 for MixGen: A New Multi-Modal Data Augmentation
Viaarxiv icon