Text Extraction From Documents


Text extraction from documents is the process of extracting text data from scanned documents or images.

Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information

Add code
May 09, 2025
Viaarxiv icon

Query-driven Document-level Scientific Evidence Extraction from Biomedical Studies

Add code
May 09, 2025
Viaarxiv icon

DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral

Add code
May 06, 2025
Viaarxiv icon

Evaluation of LLMs on Long-tail Entity Linking in Historical Documents

Add code
May 06, 2025
Viaarxiv icon

HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Real-World Hallucination Detection

Add code
May 01, 2025
Viaarxiv icon

Natural Language Processing tools for Pharmaceutical Manufacturing Information Extraction from Patents

Add code
May 01, 2025
Viaarxiv icon

SubGrapher: Visual Fingerprinting of Chemical Structures

Add code
Apr 28, 2025
Viaarxiv icon

Transformer-Based Extraction of Statutory Definitions from the U.S. Code

Add code
Apr 23, 2025
Viaarxiv icon

Automatic Text Summarization (ATS) for Research Documents in Sorani Kurdish

Add code
Apr 20, 2025
Viaarxiv icon

Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code

Add code
Apr 24, 2025
Viaarxiv icon