Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sunisth Kumar

Table-Text Alignment: Explaining Claim Verification Against Tables in Scientific Papers

Jun 12, 2025

Xanh Ho, Sunisth Kumar, Yun-Ang Wu, Florian Boudin, Atsuhiro Takasu, Akiko Aizawa

Abstract:Scientific claim verification against tables typically requires predicting whether a claim is supported or refuted given a table. However, we argue that predicting the final label alone is insufficient: it reveals little about the model's reasoning and offers limited interpretability. To address this, we reframe table-text alignment as an explanation task, requiring models to identify the table cells essential for claim verification. We build a new dataset by extending the SciTab benchmark with human-annotated cell-level rationales. Annotators verify the claim label and highlight the minimal set of cells needed to support their decision. After the annotation process, we utilize the collected information and propose a taxonomy for handling ambiguous cases. Our experiments show that (i) incorporating table alignment information improves claim verification performance, and (ii) most LLMs, while often predicting correct labels, fail to recover human-aligned rationales, suggesting that their predictions do not stem from faithful reasoning.

* 8 pages; code and data are available at https://github.com/Alab-NII/SciTabAlign

Via

Access Paper or Ask Questions

Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages

Jul 16, 2023

Sunisth Kumar, Davide Liu, Alexandre Boulenger

Figure 1 for Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages

Figure 2 for Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages

Figure 3 for Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages

Figure 4 for Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages

Abstract:We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data. Our approach relies on both knowledge distillation and consistency training. The modeling framework leverages knowledge from a large language model (XLMRoBERTa) pre-trained on the source language, with a student-teacher relationship (knowledge distillation). The student model incorporates unsupervised consistency training (with KL divergence loss) on the low-resource target language. We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information, and focus on exhibiting the transfer of knowledge from English to Arabic. With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic. We show that our modeling approach, while efficient, performs best overall when compared to state-of-the-art approaches like DistilBERT pre-trained on the target language or a supervised model directly trained on labeled data in the target language. Our experiments show that it is enough to learn to recognize entities in English to reach reasonable performance in a low-resource language in the presence of a few labeled samples of semi-structured data. The proposed framework has implications for developing multi-lingual applications, especially in geographies where digital endeavors rely on both English and one or more low-resource language(s), sometimes mixed with English or employed singly.

* 5 pages, 3 figures. Presented at the SIGIR 2023 Workshop on Knowledge Discovery from Unstructured Data in Financial Services (KDF)

Via

Access Paper or Ask Questions