Abstract:How can system-generated responses be efficiently verified, especially in the high-stakes biomedical domain? To address this challenge, we introduce eTracer, a plug-and-play framework that enables traceable text generation by grounding claims against contextual evidence. Through post-hoc grounding, each response claim is aligned with contextual evidence that either supports or contradicts it. Building on claim-level grounding results, eTracer not only enables users to precisely trace responses back to their contextual source but also quantifies response faithfulness, thereby enabling the verifiability and trustworthiness of generated responses. Experiments show that our claim-level grounding approach alleviates the limitations of conventional grounding methods in aligning generated statements with contextual sentence-level evidence, resulting in substantial improvements in overall grounding quality and user verification efficiency. The code and data are available at https://github.com/chubohao/eTracer.
Abstract:Verifying system-generated summaries remains challenging, as effective verification requires precise attribution to the source context, which is especially crucial in high-stakes medical domains. To address this challenge, we introduce PCoA, an expert-annotated benchmark for medical aspect-based summarization with phrase-level context attribution. PCoA aligns each aspect-based summary with its supporting contextual sentences and contributory phrases within them. We further propose a fine-grained, decoupled evaluation framework that independently assesses the quality of generated summaries, citations, and contributory phrases. Through extensive experiments, we validate the quality and consistency of the PCoA dataset and benchmark several large language models on the proposed task. Experimental results demonstrate that PCoA provides a reliable benchmark for evaluating system-generated summaries with phrase-level context attribution. Furthermore, comparative experiments show that explicitly identifying relevant sentences and contributory phrases before summarization can improve overall quality. The data and code are available at https://github.com/chubohao/PCoA.