The massive number of antennas in extremely large aperture array (ELAA) systems shifts the propagation regime of signals in internet of things (IoT) communication systems towards near-field spherical wave propagation. We propose a reconfigurable intelligent surfaces (RIS)-assisted beamfocusing mechanism, where the design of the two-dimensional beam codebook that contains both the angular and distance domains is challenging. To address this issue, we introduce a novel Transformer-based two-stage beam training algorithm, which includes the coarse and fine search phases. The proposed mechanism provides a fine-grained codebook with enhanced spatial resolution, enabling precise beamfocusing. Specifically, in the first stage, the beam training is performed to estimate the approximate location of the device by using a simple codebook, determining whether it is within the beamfocusing range (BFR) or the none-beamfocusing range (NBFR). In the second stage, by using a more precise codebook, a fine-grained beam search strategy is conducted. Experimental results unveil that the precision of the RIS-assisted beamfocusing is greatly improved. The proposed method achieves beam selection accuracy up to 97% at signal-to-noise ratio (SNR) of 20 dB, and improves 10% to 50% over the baseline method at different SNRs.