Accurate disturbance estimation is essential for safe robot operations. The recently proposed neural moving horizon estimation (NeuroMHE), which uses a portable neural network to model the MHE's weightings, shows promise in this context. Currently, NeuroMHE is trained through gradient descent, with its gradient computed recursively using a Kalman filter. This paper proposes a trust-region policy optimization method for training NeuroMHE. We achieve this by providing the second-order derivatives of MHE, referred to as the MHE Hessian. Remarkably, we establish that much of computation already used to obtain the gradient, especially the Kalman filter, can be efficiently reused to compute the MHE Hessian. This offers linear computational complexity relative to the MHE horizon. Through validation with an open-source real quadrotor flight dataset, our approach demonstrates data-efficient training (<5 min) and outperforms a state-of-the-art neural estimator by up to 68.1% in force estimation accuracy, utilizing only 1.4% of its network parameters. Furthermore, our method showcases enhanced robustness to network initialization compared to the gradient descent counterpart.