Abstract:AlphaZero, using a combination of Deep Neural Networks and Monte Carlo Tree Search (MCTS), has successfully trained reinforcement learning agents in a tabula-rasa way. The neural MCTS algorithm has been successful in finding near-optimal strategies for games through self-play. However, the AlphaZero algorithm has a significant drawback; it takes a long time to converge and requires high computational power due to complex neural networks for solving games like Chess, Go, Shogi, etc. Owing to this, it is very difficult to pursue neural MCTS research without cutting-edge hardware, which is a roadblock for many aspiring neural MCTS researchers. In this paper, we propose a new neural MCTS algorithm, called Dual MCTS, which helps overcome these drawbacks. Dual MCTS uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the epsilon-greedy algorithm. This technique is applicable to any MCTS based algorithm to reduce the number of updates to the tree. We show that Dual MCTS performs better than one of the most widely used neural MCTS algorithms, AlphaZero, for various symmetric and asymmetric games.
Abstract:The formal semantics of an interpreted first-order logic (FOL) statement can be given in Tarskian Semantics or a basically equivalent Game Semantics. The latter maps the statement and the interpretation into a two-player semantic game. Many combinatorial problems can be described using interpreted FOL statements and can be mapped into a semantic game. Therefore, learning to play a semantic game perfectly leads to the solution of a specific instance of a combinatorial problem. We adapt the AlphaZero algorithm so that it becomes better at learning to play semantic games that have different characteristics than Go and Chess. We propose a general framework, Persephone, to map the FOL description of a combinatorial problem to a semantic game so that it can be solved through a neural MCTS based reinforcement learning algorithm. Our goal for Persephone is to make it tabula-rasa, mapping a problem stated in interpreted FOL to a solution without human intervention.