Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision. When used alongside function approximators such as Neural Networks (NNs), RL is capable of solving extremely complex problems. Deep Q-Learning, a RL algorithm that uses Deep NNs, achieved super-human performance in some specific tasks. Nonetheless, it is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms. This work empirically studies the performance and trainability of such VQC-based Deep Q-Learning models in classic control benchmark environments. More specifically, we research how data re-uploading affects both these metrics. We show that the magnitude and the variance of the gradients of these models remain substantial throughout training due to the moving targets of Deep Q-Learning. Moreover, we empirically show that increasing the number of qubits does not lead to an exponential vanishing behavior of the magnitude and variance of the gradients for a PQC approximating a 2-design, unlike what was expected due to the Barren Plateau Phenomenon. This hints at the possibility of VQCs being specially adequate for being used as function approximators in such a context.
This research delves into the role of the quantum Fisher Information Matrix (FIM) in enhancing the performance of Parameterized Quantum Circuit (PQC)-based reinforcement learning agents. While previous studies have highlighted the effectiveness of PQC-based policies preconditioned with the quantum FIM in contextual bandits, its impact in broader reinforcement learning contexts, such as Markov Decision Processes, is less clear. Through a detailed analysis of L\"owner inequalities between quantum and classical FIMs, this study uncovers the nuanced distinctions and implications of using each type of FIM. Our results indicate that a PQC-based agent using the quantum FIM without additional insights typically incurs a larger approximation error and does not guarantee improved performance compared to the classical FIM. Empirical evaluations in classic control benchmarks suggest even though quantum FIM preconditioning outperforms standard gradient ascent, in general it is not superior to classical FIM preconditioning.
Quantum Machine Learning models are composed by Variational Quantum Circuits (VQCs) in a very natural way. There are already some empirical results proving that such models provide an advantage in supervised/unsupervised learning tasks. However, when applied to Reinforcement Learning (RL), less is known. In this work, we consider Policy Gradients using a hardware-efficient ansatz. We prove that the complexity of obtaining an {\epsilon}-approximation of the gradient using quantum hardware scales only logarithmically with the number of parameters, considering the number of quantum circuits executions. We test the performance of such models in benchmarking environments and verify empirically that such quantum models outperform typical classical neural networks used in those environments, using a fraction of the number of parameters. Moreover, we propose the utilization of the Fisher Information spectrum to show that the quantum model is less prone to barren plateaus than its classical counterpart. As a different use case, we consider the application of such variational quantum models to the problem of quantum control and show its feasibility in the quantum-quantum domain.