Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vatsal Gupta

Prose2Policy (P2P): A Practical LLM Pipeline for Translating Natural-Language Access Policies into Executable Rego

Mar 16, 2026

Vatsal Gupta, Darshan Sreenivasamurthy

Abstract:Prose2Policy (P2P) is a LLM-based practical tool that translates natural-language access control policies (NLACPs) into executable Rego code (the policy language of Open Policy Agent, OPA). It provides a modular, end-to-end pipeline that performs policy detection, component extraction, schema validation, linting, compilation, automatic test generation and execution. Prose2Policy is designed to bridge the gap between human-readable access requirements and machine-enforceable policy-as-code (PaC) while emphasizing deployment reliability and auditability. We evaluated Prose2Policy on the ACRE dataset and demonstrated a 95.3\% compile rate for accepted policies, with automated testing achieving a 82.2\% positive-test pass rate and a 98.9\% negative-test pass rate. These results indicate that Prose2Policy produces syntactically robust and behaviorally consistent Rego policies suitable for Zero Trust and compliance-driven environments.

Via

Access Paper or Ask Questions

NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Jul 15, 2024

Pranshu Pandya, Agney S Talwarr, Vatsal Gupta, Tushar Kataria, Vivek Gupta, Dan Roth

Figure 1 for NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Figure 2 for NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Figure 3 for NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Figure 4 for NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Abstract:Cognitive textual and visual reasoning tasks, such as puzzles, series, and analogies, demand the ability to quickly reason, decipher, and evaluate patterns both textually and spatially. While LLMs and VLMs, through extensive training on large amounts of human-curated data, have attained a high level of pseudo-human intelligence in some common sense reasoning tasks, they still struggle with more complex reasoning tasks that require cognitive understanding. In this work, we introduce a new dataset, NTSEBench, designed to evaluate the cognitive multi-modal reasoning and problem-solving skills of large models. The dataset comprises 2,728 multiple-choice questions comprising of a total of 4,642 images across 26 categories sampled from the NTSE examination conducted nationwide in India, featuring both visual and textual general aptitude questions that do not rely on rote learning. We establish baselines on the dataset using state-of-the-art LLMs and VLMs. To facilitate a comparison between open source and propriety models, we propose four distinct modeling strategies to handle different modalities (text and images) in the dataset instances.

* 15 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts

Jun 27, 2024

Shubhankar Singh, Purvi Chaurasia, Yerram Varun, Pranshu Pandya, Vatsal Gupta, Vivek Gupta, Dan Roth

Abstract:Existing benchmarks for visual question answering lack in visual grounding and complexity, particularly in evaluating spatial reasoning skills. We introduce FlowVQA, a novel benchmark aimed at assessing the capabilities of visual question-answering multimodal language models in reasoning with flowcharts as visual contexts. FlowVQA comprises 2,272 carefully generated and human-verified flowchart images from three distinct content sources, along with 22,413 diverse question-answer pairs, to test a spectrum of reasoning tasks, including information localization, decision-making, and logical progression. We conduct a thorough baseline evaluation on a suite of both open-source and proprietary multimodal language models using various strategies, followed by an analysis of directional bias. The results underscore the benchmark's potential as a vital tool for advancing the field of multimodal modeling, providing a focused and challenging environment for enhancing model performance in visual and logical reasoning tasks.

Via

Access Paper or Ask Questions

ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation

Jan 31, 2024

Bhabesh Mali, Karthik Maddala, Sweeya Reddy, Vatsal Gupta, Chandan Karfa, Ramesh Karri

Figure 1 for ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation

Figure 2 for ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation

Figure 3 for ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation

Figure 4 for ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation

Abstract:System Verilog Assertion (SVA) formulation, a critical yet complex task, is a pre-requisite in the Formal Property Verification (FPV) process. Traditionally, SVA formulation involves expert-driven interpretation of specifications. This is time consuming and prone to human error. However, recent advances in Large Language Models (LLM), LLM-informed automatic assertion generation is gaining interest. We designed a novel LLM-based pipeline to generate assertions in English Language, Linear Temporal Logic, and SVA from natural language specifications. We developed a custom LLM-based on OpenAI GPT4 for our experiments. Furthermore, we developed testbenches to verify/validate the LLM-generated assertions. Only 43% of LLM-generated raw assertions had errors, including syntax and logical errors. By iteratively prompting the LLMs using carefully crafted prompts derived from test case failures, the pipeline could generate correct SVAs after a maximum of nine iterations of prompting. Our results show that LLMs can streamline the assertion generation workflow, reshaping verification workflows.

* 6 pages, 3 figures and 1 table

Via

Access Paper or Ask Questions

Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

Nov 15, 2023

Vatsal Gupta, Pranshu Pandya, Tushar Kataria, Vivek Gupta, Dan Roth

Figure 1 for Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

Figure 2 for Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

Figure 3 for Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

Figure 4 for Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

Abstract:Language models, given their black-box nature, often exhibit sensitivity to input perturbations, leading to trust issues due to hallucinations. To bolster trust, it's essential to understand these models' failure modes and devise strategies to enhance their performance. In this study, we propose a framework to study the effect of input perturbations on language models of different scales, from pre-trained models to large language models (LLMs). We use fine-tuning to train a robust model to perturbations, and we investigate whether exposure to one perturbation improves or degrades the model's performance on other perturbations. To address multi-perturbation robustness, we suggest three distinct training strategies. We also extend the framework to LLMs via a chain of thought(COT) prompting with exemplars. We instantiate our framework for the Tabular-NLI task and show that the proposed strategies train the model robust to different perturbations without losing accuracy on a given dataset.

* 13 pages, 2 Figure, 12 Tables

Via

Access Paper or Ask Questions