Text Extraction From Documents


Text extraction from documents is the process of extracting text data from scanned documents or images.

Measuring the sensitivity of LLM-based structured extraction to prompt, model, and schema choices in clinical discharge summaries

Add code
Jun 04, 2026
Viaarxiv icon

Handwriting Extraction and Analysis of Signature Lists in Swiss Popular Initiatives

Add code
Jun 03, 2026
Viaarxiv icon

AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and Synthesis

Add code
Jun 01, 2026
Viaarxiv icon

Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

Add code
Jun 01, 2026
Viaarxiv icon

Construction of Historical Knowledge Graphs Based on BERT and Graph Neural Networks

Add code
Jun 01, 2026
Viaarxiv icon

Peacemaker at ATE-IT: Automatic term extraction from Italian text for waste management data using encoder model

Add code
May 31, 2026
Viaarxiv icon

Inference-Free Multimodal Learned Sparse Retrieval for Production-Scale Visual Document Search

Add code
May 29, 2026
Viaarxiv icon

Bundesrecht: An Open Library and Corpus for German Statutory Reference Processing

Add code
May 29, 2026
Viaarxiv icon

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

Add code
May 27, 2026
Viaarxiv icon

Inferring the Size of Large Language Models From Popular Text Memorization

Add code
May 28, 2026
Viaarxiv icon