Alert button
Picture for Abhishek Kulkarni

Abhishek Kulkarni

Alert button

Automaton-Guided Curriculum Generation for Reinforcement Learning Agents

Apr 11, 2023
Yash Shukla, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, Jivko Sinapov

Figure 1 for Automaton-Guided Curriculum Generation for Reinforcement Learning Agents
Figure 2 for Automaton-Guided Curriculum Generation for Reinforcement Learning Agents
Figure 3 for Automaton-Guided Curriculum Generation for Reinforcement Learning Agents

Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e.g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e.g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task's OOMDP description, and also when distractor objects are present that are not modeled in the logical specification of the tasks' objectives.

* To be presented at The International Conference on Automated Planning and Scheduling (ICAPS) 2023 
Viaarxiv icon

Neural Decoder for Topological Codes using Pseudo-Inverse of Parity Check Matrix

Jan 24, 2019
Chaitanya Chinni, Abhishek Kulkarni, Dheeraj M. Pai, Kaushik Mitra, Pradeep Kiran Sarvepalli

Figure 1 for Neural Decoder for Topological Codes using Pseudo-Inverse of Parity Check Matrix
Figure 2 for Neural Decoder for Topological Codes using Pseudo-Inverse of Parity Check Matrix
Figure 3 for Neural Decoder for Topological Codes using Pseudo-Inverse of Parity Check Matrix
Figure 4 for Neural Decoder for Topological Codes using Pseudo-Inverse of Parity Check Matrix

Recent developments in the field of deep learning have motivated many researchers to apply these methods to problems in quantum information. Torlai and Melko first proposed a decoder for surface codes based on neural networks. Since then, many other researchers have applied neural networks to study a variety of problems in the context of decoding. An important development in this regard was due to Varsamopoulos et al. who proposed a two-step decoder using neural networks. Subsequent work of Maskara et al. used the same concept for decoding for various noise models. We propose a similar two-step neural decoder using inverse parity-check matrix for topological color codes. We show that it outperforms the state-of-the-art performance of non-neural decoders for independent Pauli errors noise model on a 2D hexagonal color code. Our final decoder is independent of the noise model and achieves a threshold of $10 \%$. Our result is comparable to the recent work on neural decoder for quantum error correction by Maskara et al.. It appears that our decoder has significant advantages with respect to training cost and complexity of the network for higher lengths when compared to that of Maskara et al.. Our proposed method can also be extended to arbitrary dimension and other stabilizer codes.

* 12 pages, 12 figures, 2 tables, submitted to the 2019 IEEE International Symposium on Information Theory 
Viaarxiv icon