Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arkajyoti Chakraborty

When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents

Dec 12, 2025

Mrinal Rawat, Arkajyoti Chakraborty, Neha Gupta, Roberto Pieraccini

Abstract:Supervised fine-tuning (SFT) has emerged as one of the most effective ways to improve the performance of large language models (LLMs) in downstream tasks. However, SFT can have difficulty generalizing when the underlying data distribution changes, even when the new data does not fall completely outside the training domain. Recent reasoning-focused models such as o1 and R1 have demonstrated consistent gains over their non-reasoning counterparts, highlighting the importance of reasoning for improved generalization and reliability. However, collecting high-quality reasoning traces for SFT remains challenging -- annotations are costly, subjective, and difficult to scale. To address this limitation, we leverage Reinforcement Learning (RL) to enable models to learn reasoning strategies directly from task outcomes. We propose a pipeline in which LLMs generate reasoning steps that guide both the invocation of tools (e.g., function calls) and the final answer generation for conversational agents. Our method employs Group Relative Policy Optimization (GRPO) with rewards designed around tool accuracy and answer correctness, allowing the model to iteratively refine its reasoning and actions. Experimental results demonstrate that our approach improves both the quality of reasoning and the precision of tool invocations, achieving a 1.5% relative improvement over the SFT model (trained without explicit thinking) and a 40% gain compared to the base of the vanilla Qwen3-1.7B model. These findings demonstrate the promise of unifying reasoning and action learning through RL to build more capable and generalizable conversational agents.

Via

Access Paper or Ask Questions

UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output

May 05, 2025

Sicong Huang, Jincheng He, Shiyuan Huang, Karthik Raja Anandan, Arkajyoti Chakraborty, Ian Lane

Abstract:Hallucinations pose a significant challenge for large language models when answering knowledge-intensive queries. As LLMs become more widely adopted, it is crucial not only to detect if hallucinations occur but also to pinpoint exactly where in the LLM output they occur. SemEval 2025 Task 3, Mu-SHROOM: Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes, is a recent effort in this direction. This paper describes the UCSC system submission to the shared Mu-SHROOM task. We introduce a framework that first retrieves relevant context, next identifies false content from the answer, and finally maps them back to spans in the LLM output. The process is further enhanced by automatically optimizing prompts. Our system achieves the highest overall performance, ranking #1 in average position across all languages. We release our code and experiment results.

* 6 pages, 1 figure

Via

Access Paper or Ask Questions

Mechanistic Anomaly Detection for "Quirky" Language Models

Apr 09, 2025

David O. Johnston, Arkajyoti Chakraborty, Nora Belrose

Figure 1 for Mechanistic Anomaly Detection for "Quirky" Language Models

Figure 2 for Mechanistic Anomaly Detection for "Quirky" Language Models

Figure 3 for Mechanistic Anomaly Detection for "Quirky" Language Models

Figure 4 for Mechanistic Anomaly Detection for "Quirky" Language Models

Abstract:As LLMs grow in capability, the task of supervising LLMs becomes more challenging. Supervision failures can occur if LLMs are sensitive to factors that supervisors are unaware of. We investigate Mechanistic Anomaly Detection (MAD) as a technique to augment supervision of capable models; we use internal model features to identify anomalous training signals so they can be investigated or discarded. We train detectors to flag points from the test environment that differ substantially from the training environment, and experiment with a large variety of detector features and scoring rules to detect anomalies in a set of ``quirky'' language models. We find that detectors can achieve high discrimination on some tasks, but no detector is effective across all models and tasks. MAD techniques may be effective in low-stakes applications, but advances in both detection and evaluation are likely needed if they are to be used in high stakes settings.

* ICLR Building Trust Workshop 2025

Via

Access Paper or Ask Questions

Generating Clarification Questions for Disambiguating Contracts

Mar 12, 2024

Anmol Singhal, Chirag Jain, Preethu Rose Anish, Arkajyoti Chakraborty, Smita Ghaisas

Figure 1 for Generating Clarification Questions for Disambiguating Contracts

Figure 2 for Generating Clarification Questions for Disambiguating Contracts

Figure 3 for Generating Clarification Questions for Disambiguating Contracts

Figure 4 for Generating Clarification Questions for Disambiguating Contracts

Abstract:Enterprises frequently enter into commercial contracts that can serve as vital sources of project-specific requirements. Contractual clauses are obligatory, and the requirements derived from contracts can detail the downstream implementation activities that non-legal stakeholders, including requirement analysts, engineers, and delivery personnel, need to conduct. However, comprehending contracts is cognitively demanding and error-prone for such stakeholders due to the extensive use of Legalese and the inherent complexity of contract language. Furthermore, contracts often contain ambiguously worded clauses to ensure comprehensive coverage. In contrast, non-legal stakeholders require a detailed and unambiguous comprehension of contractual clauses to craft actionable requirements. In this work, we introduce a novel legal NLP task that involves generating clarification questions for contracts. These questions aim to identify contract ambiguities on a document level, thereby assisting non-legal stakeholders in obtaining the necessary details for eliciting requirements. This task is challenged by three core issues: (1) data availability, (2) the length and unstructured nature of contracts, and (3) the complexity of legal text. To address these issues, we propose ConRAP, a retrieval-augmented prompting framework for generating clarification questions to disambiguate contractual text. Experiments conducted on contracts sourced from the publicly available CUAD dataset show that ConRAP with ChatGPT can detect ambiguities with an F2 score of 0.87. 70% of the generated clarification questions are deemed useful by human evaluators.

* 9 pages, 3 figures, accepted to LREC-COLING 2024

Via

Access Paper or Ask Questions

An Emotion-guided Approach to Domain Adaptive Fake News Detection using Adversarial Learning

Nov 26, 2022

Arkajyoti Chakraborty, Inder Khatri, Arjun Choudhry, Pankaj Gupta, Dinesh Kumar Vishwakarma, Mukesh Prasad

Figure 1 for An Emotion-guided Approach to Domain Adaptive Fake News Detection using Adversarial Learning

Figure 2 for An Emotion-guided Approach to Domain Adaptive Fake News Detection using Adversarial Learning

Figure 3 for An Emotion-guided Approach to Domain Adaptive Fake News Detection using Adversarial Learning

Figure 4 for An Emotion-guided Approach to Domain Adaptive Fake News Detection using Adversarial Learning

Abstract:Recent works on fake news detection have shown the efficacy of using emotions as a feature for improved performance. However, the cross-domain impact of emotion-guided features for fake news detection still remains an open problem. In this work, we propose an emotion-guided, domain-adaptive, multi-task approach for cross-domain fake news detection, proving the efficacy of emotion-guided models in cross-domain settings for various datasets.

* Accepted in the Student Abstract & Poster Presentation track at AAAI 2023. arXiv admin note: substantial text overlap with arXiv:2211.13718

Via

Access Paper or Ask Questions

Emotion-guided Cross-domain Fake News Detection using Adversarial Domain Adaptation

Nov 24, 2022

Arjun Choudhry, Inder Khatri, Arkajyoti Chakraborty, Dinesh Kumar Vishwakarma, Mukesh Prasad

Abstract:Recent works on fake news detection have shown the efficacy of using emotions as a feature or emotions-based features for improved performance. However, the impact of these emotion-guided features for fake news detection in cross-domain settings, where we face the problem of domain shift, is still largely unexplored. In this work, we evaluate the impact of emotion-guided features for cross-domain fake news detection, and further propose an emotion-guided, domain-adaptive approach using adversarial learning. We prove the efficacy of emotion-guided models in cross-domain settings for various combinations of source and target datasets from FakeNewsAMT, Celeb, Politifact and Gossipcop datasets.

* Accepted as a Short Paper in the 19th International Conference on Natural Language Processing (ICON) 2022

Via

Access Paper or Ask Questions