Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anirban Majumdar

Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning

May 14, 2026

Debraj Chakraborty, Anirban Majumdar, Prince Mathew, Sayan Mukherjee, Jean-François Raskin

Abstract:Partially Observable Markov Decision Processes (POMDPs) are the standard framework for decision-making under uncertainty. While sampling-based methods scale well, they lack formal correctness guarantees, making them unsuitable for safety-critical applications. Conversely, formal synthesis techniques provide correctness-by-construction but often struggle with scalability, as general POMDP synthesis is undecidable. To bridge this gap, we propose a synthesis framework that integrates sampling, automata learning, and model-checking. Inspired by Angluin's $L^*$ algorithm, our approach utilizes sampling as a membership oracle and model-checking as an equivalence oracle. This enables the synthesis of finite-state controllers with formal guarantees, provided the sampling-induced policy is regular. We establish a relative completeness result for this framework. Experimental results from our prototypical implementation demonstrate that this method successfully solves threshold-safety problems that remain challenging for existing formal synthesis tools. We believe our algorithm serves as a valuable component in a portfolio approach to tackling the inherent difficulty of POMDP synthesis problems.

* Paper accepted at 38th International Conference on Computer Aided Verification (CAV 2026), Lisbon, Portugal, July 2026

Via

Access Paper or Ask Questions

About Time: Model-free Reinforcement Learning with Timed Reward Machines

Dec 19, 2025

Anirban Majumdar, Ritam Raha, Rajarshi Roy, David Parker, Marta Kwiatkowska

Figure 1 for About Time: Model-free Reinforcement Learning with Timed Reward Machines

Figure 2 for About Time: Model-free Reinforcement Learning with Timed Reward Machines

Figure 3 for About Time: Model-free Reinforcement Learning with Timed Reward Machines

Figure 4 for About Time: Model-free Reinforcement Learning with Timed Reward Machines

Abstract:Reward specification plays a central role in reinforcement learning (RL), guiding the agent's behavior. To express non-Markovian rewards, formalisms such as reward machines have been introduced to capture dependencies on histories. However, traditional reward machines lack the ability to model precise timing constraints, limiting their use in time-sensitive applications. In this paper, we propose timed reward machines (TRMs), which are an extension of reward machines that incorporate timing constraints into the reward structure. TRMs enable more expressive specifications with tunable reward logic, for example, imposing costs for delays and granting rewards for timely actions. We study model-free RL frameworks (i.e., tabular Q-learning) for learning optimal policies with TRMs under digital and real-time semantics. Our algorithms integrate the TRM into learning via abstractions of timed automata, and employ counterfactual-imagining heuristics that exploit the structure of the TRM to improve the search. Experimentally, we demonstrate that our algorithm learns policies that achieve high rewards while satisfying the timing constraints specified by the TRM on popular RL benchmarks. Moreover, we conduct comparative studies of performance under different TRM semantics, along with ablations that highlight the benefits of counterfactual-imagining.

Via

Access Paper or Ask Questions

Static and Dynamic Synthesis of Bengali and Devanagari Signatures

Jan 30, 2024

Miguel A. Ferrer, Sukalpa Chanda, Moises Diaz, Chayan Kr. Banerjee, Anirban Majumdar, Cristina Carmona-Duarte, Parikshit Acharya, Umapada Pal

Figure 1 for Static and Dynamic Synthesis of Bengali and Devanagari Signatures

Figure 2 for Static and Dynamic Synthesis of Bengali and Devanagari Signatures

Figure 3 for Static and Dynamic Synthesis of Bengali and Devanagari Signatures

Figure 4 for Static and Dynamic Synthesis of Bengali and Devanagari Signatures

Abstract:Developing an automatic signature verification system is challenging and demands a large number of training samples. This is why synthetic handwriting generation is an emerging topic in document image analysis. Some handwriting synthesizers use the motor equivalence model, the well-established hypothesis from neuroscience, which analyses how a human being accomplishes movement. Specifically, a motor equivalence model divides human actions into two steps: 1) the effector independent step at cognitive level and 2) the effector dependent step at motor level. In fact, recent work reports the successful application to Western scripts of a handwriting synthesizer, based on this theory. This paper aims to adapt this scheme for the generation of synthetic signatures in two Indic scripts, Bengali (Bangla), and Devanagari (Hindi). For this purpose, we use two different online and offline databases for both Bengali and Devanagari signatures. This paper reports an effective synthesizer for static and dynamic signatures written in Devanagari or Bengali scripts. We obtain promising results with artificially generated signatures in terms of appearance and performance when we compare the results with those for real signatures.

* IEEE Transactions on Cybernetics, v. 48(10), p. 2896-2907, 2018
* Accepted version. Published on IEEE Transactions on Cybernetics [ISSN 2168-2267], v. 48(10), p. 2896-2907

Via

Access Paper or Ask Questions