Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roque Lopez

BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization

Apr 07, 2026

Roque Lopez, Yurong Liu, Christos Koutras, Juliana Freire

Abstract:Data harmonization remains a major bottleneck for integrative analysis due to heterogeneity in schemas, value representations, and domain-specific conventions. BDI-Kit provides an extensible toolkit for schema and value matching. It exposes two complementary interfaces tailored to different user needs: a Python API enabling developers to construct harmonization pipelines programmatically, and an AI-assisted chat interface allowing domain experts to harmonize data through natural language dialogue. This demonstration showcases how users interact with BDI-Kit to iteratively explore, validate, and refine schema and value matches through a combination of automated matching, AI-assisted reasoning, and user-driven refinement. We present two scenarios: (i) using the Python API to programmatically compose primitives, examine intermediate outputs, and reuse transformations; and (ii) conversing with the AI assistant in natural language to access BDI-Kit's capabilities and iteratively refine outputs based on the assistant's suggestions.

Via

Access Paper or Ask Questions

Interactive Data Harmonization with LLM Agents

Feb 10, 2025

Aécio Santos, Eduardo H. M. Pena, Roque Lopez, Juliana Freire

Figure 1 for Interactive Data Harmonization with LLM Agents

Figure 2 for Interactive Data Harmonization with LLM Agents

Figure 3 for Interactive Data Harmonization with LLM Agents

Figure 4 for Interactive Data Harmonization with LLM Agents

Abstract:Data harmonization is an essential task that entails integrating datasets from diverse sources. Despite years of research in this area, it remains a time-consuming and challenging task due to schema mismatches, varying terminologies, and differences in data collection methodologies. This paper presents the case for agentic data harmonization as a means to both empower experts to harmonize their data and to streamline the process. We introduce Harmonia, a system that combines LLM-based reasoning, an interactive user interface, and a library of data harmonization primitives to automate the synthesis of data harmonization pipelines. We demonstrate Harmonia in a clinical data harmonization scenario, where it helps to interactively create reusable pipelines that map datasets to a standard format. Finally, we discuss challenges and open problems, and suggest research directions for advancing our vision.

Via

Access Paper or Ask Questions