Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lydia Pintscher

Schema Generation for Large Knowledge Graphs Using Large Language Models

Jun 04, 2025

Bohui Zhang, Yuan He, Lydia Pintscher, Albert Meroño Peñuela, Elena Simperl

Figure 1 for Schema Generation for Large Knowledge Graphs Using Large Language Models

Figure 2 for Schema Generation for Large Knowledge Graphs Using Large Language Models

Figure 3 for Schema Generation for Large Knowledge Graphs Using Large Language Models

Figure 4 for Schema Generation for Large Knowledge Graphs Using Large Language Models

Abstract:Schemas are vital for ensuring data quality in the Semantic Web and natural language processing. Traditionally, their creation demands substantial involvement from knowledge engineers and domain experts. Leveraging the impressive capabilities of large language models (LLMs) in related tasks like ontology engineering, we explore automatic schema generation using LLMs. To bridge the resource gap, we introduce two datasets: YAGO Schema and Wikidata EntitySchema, along with evaluation metrics. The LLM-based pipelines effectively utilize local and global information from knowledge graphs (KGs) to generate validating schemas in Shape Expressions (ShEx). Experiments demonstrate LLMs' strong potential in producing high-quality ShEx schemas, paving the way for scalable, automated schema generation for large KGs. Furthermore, our benchmark introduces a new challenge for structured generation, pushing the limits of LLMs on syntactically rich formalisms.

Via

Access Paper or Ask Questions

Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection

May 23, 2025

Mykola Trokhymovych, Lydia Pintscher, Ricardo Baeza-Yates, Diego Saez-Trumper

Figure 1 for Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection

Figure 2 for Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection

Figure 3 for Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection

Figure 4 for Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection

Abstract:We introduce a next-generation vandalism detection system for Wikidata, one of the largest open-source structured knowledge bases on the Web. Wikidata is highly complex: its items incorporate an ever-expanding universe of factual triples and multilingual texts. While edits can alter both structured and textual content, our approach converts all edits into a single space using a method we call Graph2Text. This allows for evaluating all content changes for potential vandalism using a single multilingual language model. This unified approach improves coverage and simplifies maintenance. Experiments demonstrate that our solution outperforms the current production system. Additionally, we are releasing the code under an open license along with a large dataset of various human-generated knowledge alterations, enabling further research.

Via

Access Paper or Ask Questions

QURATOR: Innovative Technologies for Content and Data Curation

Apr 25, 2020

Georg Rehm, Peter Bourgonje, Stefanie Hegele, Florian Kintzel, Julián Moreno Schneider, Malte Ostendorff, Karolina Zaczynska, Armin Berger, Stefan Grill, Sören Räuchle(+30 more)

Figure 1 for QURATOR: Innovative Technologies for Content and Data Curation

Figure 2 for QURATOR: Innovative Technologies for Content and Data Curation

Figure 3 for QURATOR: Innovative Technologies for Content and Data Curation

Figure 4 for QURATOR: Innovative Technologies for Content and Data Curation

Abstract:In all domains and sectors, the demand for intelligent systems to support the processing and generation of digital content is rapidly increasing. The availability of vast amounts of content and the pressure to publish new content quickly and in rapid succession requires faster, more efficient and smarter processing and generation methods. With a consortium of ten partners from research and industry and a broad range of expertise in AI, Machine Learning and Language Technologies, the QURATOR project, funded by the German Federal Ministry of Education and Research, develops a sustainable and innovative technology platform that provides services to support knowledge workers in various industries to address the challenges they face when curating digital content. The project's vision and ambition is to establish an ecosystem for content curation technologies that significantly pushes the current state of the art and transforms its region, the metropolitan area Berlin-Brandenburg, into a global centre of excellence for curation technologies.

* Proceedings of QURATOR 2020: The conference for intelligent content solutions, Berlin, Germany, February 2020

Via

Access Paper or Ask Questions