Scene Text Recognition


Scene text recognition is the process of identifying and transcribing text in natural scenes using computer vision techniques.

ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning

Add code
Jul 02, 2025
Viaarxiv icon

LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving

Add code
Aug 17, 2025
Figure 1 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 2 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 3 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Figure 4 for LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Viaarxiv icon

Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis

Add code
Jul 15, 2025
Figure 1 for Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis
Figure 2 for Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis
Figure 3 for Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis
Figure 4 for Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis
Viaarxiv icon

MiDashengLM: Efficient Audio Understanding with General Audio Captions

Add code
Aug 06, 2025
Figure 1 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 2 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 3 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Figure 4 for MiDashengLM: Efficient Audio Understanding with General Audio Captions
Viaarxiv icon

TransLPRNet: Lite Vision-Language Network for Single/Dual-line Chinese License Plate Recognition

Add code
Jul 23, 2025
Viaarxiv icon

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach

Add code
Jul 27, 2025
Viaarxiv icon

Efficient and Accurate Scene Text Recognition with Cascaded-Transformers

Add code
Mar 24, 2025
Viaarxiv icon

Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation

Add code
Mar 20, 2025
Figure 1 for Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Figure 2 for Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Figure 3 for Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Figure 4 for Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Viaarxiv icon

Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition

Add code
Mar 24, 2025
Viaarxiv icon

Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach

Add code
Jul 02, 2025
Viaarxiv icon