Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Veronica Chatrath

Emily

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

May 21, 2026

Akshay Manglik, Apaar Shanker, Kaustubh Deshpande, Jason Qin, Yash Maurya, Veronica Chatrath, Vijay S. Kalmath, Levi Lentz, Yuan, Xue

Abstract:Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patterns that only emerge across trace populations and does not scale to production corpora where individual traces span tens of thousands of tokens. We formalize the problem of corpus-level trace diagnostics. Given a corpus of execution traces, the goal is to produce grounded natural-language insights that characterize systematic behavioral patterns across trace groups, each linked to supporting evidence. We present the Insights Generator (IG), a multi-agent system that answers diagnostic questions by proposing and testing hypotheses across the trace corpus to produce an evidence-backed insights report. We evaluate IG across qualitative and objective dimensions, spanning rubric-based report assessment and downstream performance improvements achieved by implementing IG insights. Human experts using IG reports improve scaffold performance by 30.4pp over the unmodified baseline scaffold, and coding agents leveraging IG-derived insights show consistent and stable gains. Across benchmarks, IG's scout-investigator architecture produces findings comparable in detection coverage to competing approaches, while domain experts rated IG reports as leading depth and evidence quality.

Via

Access Paper or Ask Questions

MIDST Challenge at SaTML 2025: Membership Inference over Diffusion-models-based Synthetic Tabular data

Mar 19, 2026

Masoumeh Shafieinejad, Xi He, Mahshid Alinoori, John Jewell, Sana Ayromlou, Wei Pang, Veronica Chatrath, Garui Sharma, Deval Pandya

Abstract:Synthetic data is often perceived as a silver-bullet solution to data anonymization and privacy-preserving data publishing. Drawn from generative models like diffusion models, synthetic data is expected to preserve the statistical properties of the original dataset while remaining resilient to privacy attacks. Recent developments of diffusion models have been effective on a wide range of data types, but their privacy resilience, particularly for tabular formats, remains largely unexplored. MIDST challenge sought a quantitative evaluation of the privacy gain of synthetic tabular data generated by diffusion models, with a specific focus on its resistance to membership inference attacks (MIAs). Given the heterogeneity and complexity of tabular data, multiple target models were explored for MIAs, including diffusion models for single tables of mixed data types and multi-relational tables with interconnected constraints. MIDST inspired the development of novel black-box and white-box MIAs tailored to these target diffusion models as a key outcome, enabling a comprehensive evaluation of their privacy efficacy. The MIDST GitHub repository is available at https://github.com/VectorInstitute/MIDST

* 4 page, 1 table

Via

Access Paper or Ask Questions

VeRO: An Evaluation Harness for Agents to Optimize Agents

Feb 25, 2026

Varun Ursekar, Apaar Shanker, Veronica Chatrath, Yuan, Xue, Sam Denton

Abstract:An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its relevance, the community lacks a systematic understanding of coding agent performance on this task. Agent optimization differs fundamentally from conventional software engineering: the target agent interleaves deterministic code with stochastic LLM completions, requiring structured capture of both intermediate reasoning and downstream execution outcomes. To address these challenges, we introduce VERO (Versioning, Rewards, and Observations), which provides (1) a reproducible evaluation harness with versioned agent snapshots, budget-controlled evaluation, and structured execution traces, and (2) a benchmark suite of target agents and tasks with reference evaluation procedures. Using VERO, we conduct an empirical study comparing optimizer configurations across tasks and analyzing which modifications reliably improve target agent performance. We release VERO to support research on agent optimization as a core capability for coding agents.

Via

Access Paper or Ask Questions

FedRAG: A Framework for Fine-Tuning Retrieval-Augmented Generation Systems

Jun 12, 2025

Val Andrei Fajardo, David B. Emerson, Amandeep Singh, Veronica Chatrath, Marcelo Lotif, Ravi Theja, Alex Cheung, Izuki Matsuba

Abstract:Retrieval-augmented generation (RAG) systems have been shown to be effective in addressing many of the drawbacks of relying solely on the parametric memory of large language models. Recent work has demonstrated that RAG systems can be improved via fine-tuning of their retriever and generator models. In this work, we introduce FedRAG, a framework for fine-tuning RAG systems across centralized and federated architectures. FedRAG supports state-of-the-art fine-tuning methods, offering a simple and intuitive interface and a seamless conversion from centralized to federated training tasks. FedRAG is also deeply integrated with the modern RAG ecosystem, filling a critical gap in available tools.

* 9 pages, 4 figures, 2 tables. Accepted for the CODEML Workshop at ICML 2025. Framework code available at https://github.com/VectorInstitute/fed-rag

Via

Access Paper or Ask Questions

ViLBias: A Framework for Bias Detection using Linguistic and Visual Cues

Dec 22, 2024

Shaina Raza, Caesar Saleh, Emrul Hasan, Franklin Ogidi, Maximus Powers, Veronica Chatrath, Marcelo Lotif, Roya Javadi, Anam Zahid, Vahid Reza Khazaie

Figure 1 for ViLBias: A Framework for Bias Detection using Linguistic and Visual Cues

Figure 2 for ViLBias: A Framework for Bias Detection using Linguistic and Visual Cues

Figure 3 for ViLBias: A Framework for Bias Detection using Linguistic and Visual Cues

Figure 4 for ViLBias: A Framework for Bias Detection using Linguistic and Visual Cues

Abstract:The integration of Large Language Models (LLMs) and Vision-Language Models (VLMs) opens new avenues for addressing complex challenges in multimodal content analysis, particularly in biased news detection. This study introduces ViLBias, a framework that leverages state of the art LLMs and VLMs to detect linguistic and visual biases in news content, addressing the limitations of traditional text-only approaches. Our contributions include a novel dataset pairing textual content with accompanying visuals from diverse news sources and a hybrid annotation framework, combining LLM-based annotations with human review to enhance quality while reducing costs and improving scalability. We evaluate the efficacy of LLMs and VLMs in identifying biases, revealing their strengths in detecting subtle framing and text-visual inconsistencies. Empirical analysis demonstrates that incorporating visual cues alongside text enhances bias detection accuracy by 3 to 5 %, showcasing the complementary strengths of LLMs in generative reasoning and Small Language Models (SLMs) in classification. This study offers a comprehensive exploration of LLMs and VLMs as tools for detecting multimodal biases in news content, highlighting both their potential and limitations. Our research paves the way for more robust, scalable, and nuanced approaches to media bias detection, contributing to the broader field of natural language processing and multimodal analysis. (The data and code will be made available for research purposes).

* Under review

Via

Access Paper or Ask Questions

Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?

Nov 08, 2024

Veronica Chatrath, Marcelo Lotif, Shaina Raza

Figure 1 for Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?

Figure 2 for Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?

Figure 3 for Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?

Figure 4 for Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?

Abstract:Political misinformation poses significant challenges to democratic processes, shaping public opinion and trust in media. Manual fact-checking methods face issues of scalability and annotator bias, while machine learning models require large, costly labelled datasets. This study investigates the use of state-of-the-art large language models (LLMs) as reliable annotators for detecting political factuality in news articles. Using open-source LLMs, we create a politically diverse dataset, labelled for bias through LLM-generated annotations. These annotations are validated by human experts and further evaluated by LLM-based judges to assess the accuracy and reliability of the annotations. Our approach offers a scalable and robust alternative to traditional fact-checking, enhancing transparency and public trust in media.

* Accepted at Socially Responsible Language Modelling Research (SoLaR) Workshop at NeurIPS 2024

Via

Access Paper or Ask Questions

MBIAS: Mitigating Bias in Large Language Models While Retaining Context

May 22, 2024

Shaina Raza, Ananya Raval, Veronica Chatrath

Figure 1 for MBIAS: Mitigating Bias in Large Language Models While Retaining Context

Figure 2 for MBIAS: Mitigating Bias in Large Language Models While Retaining Context

Figure 3 for MBIAS: Mitigating Bias in Large Language Models While Retaining Context

Figure 4 for MBIAS: Mitigating Bias in Large Language Models While Retaining Context

Abstract:In addressing the critical need for safety in Large Language Models (LLMs), it is crucial to ensure that the outputs are not only safe but also retain their contextual accuracy. Many existing LLMs are safe fine-tuned either with safety demonstrations, or rely only on adversarial testing. While able to get safe outputs, they often risk losing contextual meaning as they mitigate bias and toxicity. In response, we present MBIAS, a LLM framework instruction fine-tuned on a custom dataset specifically designed for safety interventions. MBIAS aims to address the significant issues of bias and toxicity in LLMs generations that typically manifest as underrepresentation or negative portrayals across various demographics, including inappropriate linguistic mentions and biased content in social media. We experiment on MBIAS for safety interventions using various configurations, and demonstrate more than a 30\% reduction in overall bias and toxicity while successfully retaining key information. Additionally, a demographic analysis on an out-of-distribution test set confirms the robustness of our approach, with reductions in bias and toxicity exceeding 90\% across various demographics. The dataset and instruction fine-tuned MBIAS are made available to the research community at https://huggingface.co/newsmediabias/MBIAS.

Via

Access Paper or Ask Questions

Closing the Perception-Action Loop for Semantically Safe Navigation in Semi-Static Environments

Apr 22, 2024

Jingxing Qian, Siqi Zhou, Nicholas Jianrui Ren, Veronica Chatrath, Angela P. Schoellig

Figure 1 for Closing the Perception-Action Loop for Semantically Safe Navigation in Semi-Static Environments

Figure 2 for Closing the Perception-Action Loop for Semantically Safe Navigation in Semi-Static Environments

Figure 3 for Closing the Perception-Action Loop for Semantically Safe Navigation in Semi-Static Environments

Figure 4 for Closing the Perception-Action Loop for Semantically Safe Navigation in Semi-Static Environments

Abstract:Autonomous robots navigating in changing environments demand adaptive navigation strategies for safe long-term operation. While many modern control paradigms offer theoretical guarantees, they often assume known extrinsic safety constraints, overlooking challenges when deployed in real-world environments where objects can appear, disappear, and shift over time. In this paper, we present a closed-loop perception-action pipeline that bridges this gap. Our system encodes an online-constructed dense map, along with object-level semantic and consistency estimates into a control barrier function (CBF) to regulate safe regions in the scene. A model predictive controller (MPC) leverages the CBF-based safety constraints to adapt its navigation behaviour, which is particularly crucial when potential scene changes occur. We test the system in simulations and real-world experiments to demonstrate the impact of semantic information and scene change handling on robot behavior, validating the practicality of our approach.

* Manuscript accepted to ICRA 2024

Via

Access Paper or Ask Questions

FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections

Mar 14, 2024

Shaina Raza, Tahniat Khan, Drai Paulen-Patterson, Veronica Chatrath, Mizanur Rahman, Oluwanifemi Bamgbose

Figure 1 for FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections

Figure 2 for FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections

Figure 3 for FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections

Figure 4 for FakeWatch: A Framework for Detecting Fake News to Ensure Credible Elections

Abstract:In today's technologically driven world, the rapid spread of fake news, particularly during critical events like elections, poses a growing threat to the integrity of information. To tackle this challenge head-on, we introduce FakeWatch, a comprehensive framework carefully designed to detect fake news. Leveraging a newly curated dataset of North American election-related news articles, we construct robust classification models. Our framework integrates a model hub comprising of both traditional machine learning (ML) techniques and cutting-edge Language Models (LMs) to discern fake news effectively. Our overarching objective is to provide the research community with adaptable and precise classification models adept at identifying the ever-evolving landscape of misinformation. Quantitative evaluations of fake news classifiers on our dataset reveal that, while state-of-the-art LMs exhibit a slight edge over traditional ML models, classical models remain competitive due to their balance of accuracy and computational efficiency. Additionally, qualitative analyses shed light on patterns within fake news articles. This research lays the groundwork for future endeavors aimed at combating misinformation, particularly concerning electoral processes. We provide our labeled data and model publicly for use and reproducibility.

* arXiv admin note: substantial text overlap with arXiv:2312.03730

Via

Access Paper or Ask Questions

FakeWatch ElectionShield: A Benchmarking Framework to Detect Fake News for Credible US Elections

Nov 27, 2023

Tahniat Khan, Mizanur Rahman, Veronica Chatrath, Oluwanifemi Bamgbose, Shaina Raza

Abstract:In today's technologically driven world, the spread of fake news, particularly during crucial events such as elections, presents an increasing challenge to the integrity of information. To address this challenge, we introduce FakeWatch ElectionShield, an innovative framework carefully designed to detect fake news. We have created a novel dataset of North American election-related news articles through a blend of advanced language models (LMs) and thorough human verification, for precision and relevance. We propose a model hub of LMs for identifying fake news. Our goal is to provide the research community with adaptable and accurate classification models in recognizing the dynamic nature of misinformation. Extensive evaluation of fake news classifiers on our dataset and a benchmark dataset shows our that while state-of-the-art LMs slightly outperform the traditional ML models, classical models are still competitive with their balance of accuracy, explainability, and computational efficiency. This research sets the foundation for future studies to address misinformation related to elections.

* under review

Via

Access Paper or Ask Questions