Abstract:In spite of the remarkable advancements in the field of Natural Language Processing, the task of Entity Linking (EL) remains challenging in the field of humanities due to complex document typologies, lack of domain-specific datasets and models, and long-tail entities, i.e., entities under-represented in Knowledge Bases (KBs). The goal of this paper is to address these issues with two main contributions. The first contribution is DELICATE, a novel neuro-symbolic method for EL on historical Italian which combines a BERT-based encoder with contextual information from Wikidata to select appropriate KB entities using temporal plausibility and entity type consistency. The second contribution is ENEIDE, a multi-domain EL corpus in historical Italian semi-automatically extracted from two annotated editions spanning from the 19th to the 20th century and including literary and political texts. Results show how DELICATE outperforms other EL models in historical Italian even if compared with larger architectures with billions of parameters. Moreover, further analyses reveal how DELICATE confidence scores and features sensitivity provide results which are more explainable and interpretable than purely neural methods.



Abstract:This paper introduces KwicKwocKwac 1.0 (KwicKK), a web application designed to enhance the annotation and enrichment of digital texts in the humanities. KwicKK provides a user-friendly interface that enables scholars and researchers to perform semi-automatic markup of textual documents, facilitating the identification of relevant entities such as people, organizations, and locations. Key functionalities include the visualization of annotated texts using KeyWord in Context (KWIC), KeyWord Out Of Context (KWOC), and KeyWord After Context (KWAC) methodologies, alongside automatic disambiguation of generic references and integration with Wikidata for Linked Open Data connections. The application supports metadata input and offers multiple download formats, promoting accessibility and ease of use. Developed primarily for the National Edition of Aldo Moro's works, KwicKK aims to lower the technical barriers for users while fostering deeper engagement with digital scholarly resources. The architecture leverages contemporary web technologies, ensuring scalability and reliability. Future developments will explore user experience enhancements, collaborative features, and integration of additional data sources.