Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

May 16, 2025

Erica Cai, Sean McQuade, Kevin Young, Brendan O'Connor

Figure 1 for Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

Figure 2 for Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

Figure 3 for Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

Figure 4 for Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

Share this with someone who'll enjoy it:

Abstract:When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets can not be used to evaluate this question, since their KGs are highly disconnected, too small, or overly complex. To address this gap, we introduce AffilKG (https://doi.org/10.5281/zenodo.15427977), which is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs. Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities -- useful in studies of migration, community interactions, and other social phenomena. In addition, three datasets include expanded KGs with a wider variety of relation types. Our preliminary experiments demonstrate significant variability in model performance across datasets, underscoring AffilKG's ability to enable two critical advances: (1) benchmarking how extraction errors propagate to graph-level analyses (e.g., community structure), and (2) validating KG extraction methods for real-world social science research.

View paper on

Share this with someone who'll enjoy it:

Title:Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

Paper and Code