Scene Text Recognition


Scene text recognition is the process of identifying and transcribing text in natural scenes using computer vision techniques.

A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage

Add code
Oct 30, 2024
Figure 1 for A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage
Figure 2 for A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footage
Viaarxiv icon

Integrating Audio Narrations to Strengthen Domain Generalization in Multimodal First-Person Action Recognition

Add code
Sep 15, 2024
Figure 1 for Integrating Audio Narrations to Strengthen Domain Generalization in Multimodal First-Person Action Recognition
Figure 2 for Integrating Audio Narrations to Strengthen Domain Generalization in Multimodal First-Person Action Recognition
Figure 3 for Integrating Audio Narrations to Strengthen Domain Generalization in Multimodal First-Person Action Recognition
Figure 4 for Integrating Audio Narrations to Strengthen Domain Generalization in Multimodal First-Person Action Recognition
Viaarxiv icon

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

Add code
Feb 21, 2024
Viaarxiv icon

Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild

Add code
Aug 25, 2024
Figure 1 for Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild
Figure 2 for Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild
Figure 3 for Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild
Figure 4 for Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild
Viaarxiv icon

IndicSTR12: A Dataset for Indic Scene Text Recognition

Add code
Mar 12, 2024
Viaarxiv icon

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Add code
Feb 12, 2024
Figure 1 for Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Figure 2 for Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Figure 3 for Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Figure 4 for Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Viaarxiv icon

Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss

Add code
Mar 12, 2024
Viaarxiv icon

AIris: An AI-powered Wearable Assistive Device for the Visually Impaired

Add code
May 13, 2024
Viaarxiv icon

Towards Open-Vocabulary Audio-Visual Event Localization

Add code
Nov 18, 2024
Viaarxiv icon

VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

Add code
Jan 24, 2024
Figure 1 for VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition
Figure 2 for VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition
Figure 3 for VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition
Figure 4 for VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition
Viaarxiv icon