Abstract:Accurate cascaded channel state information is pivotal for extremely large-scale intelligent reflecting surfaces (XL-IRS) in next-generation wireless networks. However, the large XL-IRS aperture induces spherical wavefront propagation due to near-field (NF) effects, complicating cascaded channel estimation. Conventional dictionary-based methods suffer from cumulative quantization errors and high complexity, especially in uniform planar array (UPA) systems. To address these issues, we first propose a tensor modelization method for NF cascaded channels by exploiting the tensor product among the horizontal and vertical response vectors of the UPA-structured base station (BS) and the incident-reflective array response vector of the IRS. This structure leverages spatial characteristics, enabling independent estimation of factor matrices to improve efficiency. Meanwhile, to avoid quantization errors, we propose an off-grid cascaded channel estimation framework based on sparse Tucker decomposition. Specifically, we model the received signal as a Tucker tensor, where the sparse core tensor captures path gain-delay terms and three factor matrices are spanned by BS and NF IRS array responses. We then formulate a sparse core tensor minimization problem with tri-modal log-sum sparsity constraints to tackle the NP-hard challenge. Finally, the method is accelerated via higher-order singular value decomposition preprocessing, combined with majorization-minimization and a tailored tensor over-relaxation fast iterative shrinkage-thresholding technique. We derive the Cramér-Rao lower bound and conduct convergence analysis. Simulations show the proposed scheme achieves a 13.6 dB improvement in normalized mean square error over benchmarks with significantly reduced runtime.