Text Extraction From Documents


Text extraction from documents is the process of extracting text data from scanned documents or images.

AutoSAM: an Agentic Framework for Automating Input File Generation for the SAM Code with Multi-Modal Retrieval-Augmented Generation

Add code
Mar 25, 2026
Viaarxiv icon

CoCR-RAG: Enhancing Retrieval-Augmented Generation in Web Q&A via Concept-oriented Context Reconstruction

Add code
Mar 25, 2026
Viaarxiv icon

Context Selection for Hypothesis and Statistical Evidence Extraction from Full-Text Scientific Articles

Add code
Mar 22, 2026
Viaarxiv icon

TopoChunker: Topology-Aware Agentic Document Chunking Framework

Add code
Mar 19, 2026
Viaarxiv icon

VAREX: A Benchmark for Multi-Modal Structured Extraction from Documents

Add code
Mar 16, 2026
Viaarxiv icon

Selective Fine-Tuning of GPT Architectures for Parameter-Efficient Clinical Text Classification

Add code
Mar 15, 2026
Viaarxiv icon

TheraAgent: Multi-Agent Framework with Self-Evolving Memory and Evidence-Calibrated Reasoning for PET Theranostics

Add code
Mar 14, 2026
Viaarxiv icon

GLM-OCR Technical Report

Add code
Mar 11, 2026
Viaarxiv icon

DocSage: An Information Structuring Agent for Multi-Doc Multi-Entity Question Answering

Add code
Mar 12, 2026
Viaarxiv icon

MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations

Add code
Mar 10, 2026
Viaarxiv icon