Abstract:As Large Language Models (LLMs) have become capable of generating long and descriptive code summaries, accurate and reliable evaluation of factual consistency has become a critical challenge. However, previous evaluation methods are primarily designed for short summaries of isolated code snippets. Consequently, they struggle to provide fine-grained evaluation of multi-sentence functionalities and fail to accurately assess dependency context commonly found in real-world code summaries. To address this, we propose ReFEree, a reference-free and fine-grained method for evaluating factual consistency in real-world code summaries. We define factual inconsistency criteria specific to code summaries and evaluate them at the segment level using these criteria along with dependency information. These segment-level results are then aggregated into a fine-grained score. We construct a code summarization benchmark with human-annotated factual consistency labels. The evaluation results demonstrate that ReFEree achieves the highest correlation with human judgment among 13 baselines, improving 15-18% over the previous state-of-the-art. Our code and data are available at https://github.com/bsy99615/ReFEree.git.
Abstract:Scientific research metadata is vital to ensure the validity, reusability, and cost-effectiveness of research efforts. The MEDFORD metadata language was previously introduced to simplify the process of writing and maintaining metadata for non-programmers. However, barriers to entry and usability remain, including limited automatic validation, difficulty of data transport, and user unfamiliarity with text file editing. To address these issues, we introduce MEDFORD-in-a-Box (MIAB), a documentation ecosystem to facilitate researcher adoption and earlier metadata capture. MIAB contains many improvements, including an updated MEDFORD parser with expanded validation routines and BagIt export capability. MIAB also includes an improved VS Code extension that supports these changes through a visual IDE. By simplifying metadata generation, this new tool supports the creation of correct, consistent, and reusable metadata, ultimately improving research reproducibility.
Abstract:Semi-supervised learning on real-world graphs is frequently challenged by heterophily, where the observed graph is unreliable or label-disassortative. Many existing graph neural networks either rely on a fixed adjacency structure or attempt to handle structural noise through regularization. In this work, we explicitly capture structural uncertainty by modeling a posterior distribution over signed adjacency matrices, allowing each edge to be positive, negative, or absent. We propose a sparse signed message passing network that is naturally robust to edge noise and heterophily, which can be interpreted from a Bayesian perspective. By combining (i) posterior marginalization over signed graph structures with (ii) sparse signed message aggregation, our approach offers a principled way to handle both edge noise and heterophily. Experimental results demonstrate that our method outperforms strong baseline models on heterophilic benchmarks under both synthetic and real-world structural noise.