Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Simone Giovannini

Hierarchical structure understanding in complex tables with VLLMs: a benchmark and experiments

Nov 11, 2025

Luca Bindini, Simone Giovannini, Simone Marinai, Valeria Nardoni, Kimiya Noor Ali

Abstract:This work investigates the ability of Vision Large Language Models (VLLMs) to understand and interpret the structure of tables in scientific articles. Specifically, we explore whether VLLMs can infer the hierarchical structure of tables without additional processing. As a basis for our experiments we use the PubTables-1M dataset, a large-scale corpus of scientific tables. From this dataset, we extract a subset of tables that we introduce as Complex Hierarchical Tables (CHiTab): a benchmark collection of complex tables containing hierarchical headings. We adopt a series of prompt engineering strategies to probe the models' comprehension capabilities, experimenting with various prompt formats and writing styles. Multiple state-of-the-art open-weights VLLMs are evaluated on the benchmark first using their off-the-shelf versions and then fine-tuning some models on our task. We also measure the performance of humans to solve the task on a small set of tables comparing with performance of the evaluated VLLMs. The experiments support our intuition that generic VLLMs, not explicitly designed for understanding the structure of tables, can perform this task. This study provides insights into the potential and limitations of VLLMs to process complex tables and offers guidance for future work on integrating structured data understanding into general-purpose VLLMs.

Via

Access Paper or Ask Questions

BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Jan 06, 2025

Simone Giovannini, Fabio Coppini, Andrea Gemelli, Simone Marinai

Figure 1 for BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Figure 2 for BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Figure 3 for BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Figure 4 for BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Abstract:We present a unified dataset for document Question-Answering (QA), which is obtained combining several public datasets related to Document AI and visually rich document understanding (VRDU). Our main contribution is twofold: on the one hand we reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task, making it a suitable resource for training and evaluating Large Language Models; on the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box. Using this dataset, we explore the impact of different prompting techniques (that might include bounding box information) on the performance of open-weight models, identifying the most effective approaches for document comprehension.

Via

Access Paper or Ask Questions