Extremely large-scale multiple-input multipleoutput (XL-MIMO) enables the formation of narrow beams, effectively mitigating path loss in high-frequency communications. This capability makes the integration of wideband high-frequency communications with XL-MIMO a key enabler for future 6G networks. Realizing the full potential of such wideband XL-MIMO systems critically depends on acquiring accurate channel state information. However, this acquisition is significantly challenged by inherent wideband XLMIMO channel characteristics, including near-field propagation effects, beam split, and spatial non-stationarity. We formulate the channel estimation as a maximum a posteriori problem and propose an unrolled proximal gradient descent network. The network integrates learnable step sizes and replaces the proximal operator with a neural network to implicitly learn channel prior knowledge without requiring explicit regularization terms. To enhance the convergence behavior, we incorporated a monotonic descent constraint on the layer-wise estimation error during training. This constrained learning problem is addressed using a primal-dual training approach. Theoretical analysis is provided to characterize the duality gap and convergence behavior of the proposed method. Furthermore, simulation results are presented to validate its effectiveness, demonstrating gains in estimation accuracy over both traditional and deep learning-based methods.