Drexel University College of Computing and Informatics, Philadelphia, PA, USA
Abstract:Metadata vocabularies are essential for advancing FAIR and FARR data principles, but their development constrained by limited human resources and inconsistent standardization practices. This paper introduces MatSci-YAMZ, a platform that integrates artificial intelligence (AI) and human-in-the-loop (HILT), including crowdsourcing, to support metadata vocabulary development. The paper reports on a proof-of-concept use case evaluating the AI-HILT model in materials science, a highly interdisciplinary domain Six (6) participants affiliated with the NSF Institute for Data-Driven Dynamical Design (ID4) engaged with the MatSci-YAMZ plaform over several weeks, contributing term definitions and providing examples to prompt the AI-definitions refinement. Nineteen (19) AI-generated definitions were successfully created, with iterative feedback loops demonstrating the feasibility of AI-HILT refinement. Findings confirm the feasibility AI-HILT model highlighting 1) a successful proof of concept, 2) alignment with FAIR and open-science principles, 3) a research protocol to guide future studies, and 4) the potential for scalability across domains. Overall, MatSci-YAMZ's underlying model has the capacity to enhance semantic transparency and reduce time required for consensus building and metadata vocabulary development.


Abstract:FAIR metadata is critical to supporting FAIR data overall. Transparency, community engagement, and flexibility are key aspects of FAIR that apply to metadata. This paper presents YAMZ (Yet Another Metadata Zoo), a community-driven vocabulary application that supports FAIR. The history ofYAMZ and its original features are reviewed, followed by a presentation of recent innovations and a discussion of how YAMZ supports FAIR principles. The conclusion identifies next steps and key outputs.