The use of cellular networks for massive machine-type communications (mMTC) is an appealing solution due to the availability of the existing infrastructure. However, the massive number of user equipments (UEs) poses a significant challenge to the cellular network's random access channel (RACH) regarding congestion and overloading. To mitigate this problem, we first present a novel approach to model a two-priority RACH, which allows us to define access patterns that describe the random access behavior of UEs as observed by the base station (BS). A non-uniform preamble selection scheme is proposed, offering increased flexibility in resource allocation for different UE priority classes. Then, we formulate an allocation model that finds the optimal access probabilities to maximize the success rate of high-priority UEs while constraining low-priority UEs. Finally, we develop a reinforcement learning approach to solving the optimization problem using multi-armed bandits, which provides a near-optimal but scalable solution and does not require the BS to know the number of UEs in the network.