Abstract:Despite offering high sensitivity, a high signal-to-noise ratio, and a broad spectral range, single-pixel imaging (SPI) is limited by low measurement efficiency and long data-acquisition times. To address this, we propose a wavelength-multiplexed, spatially incoherent diffractive optical processor combined with a compact/shallow digital artificial neural network (ANN) to implement compressive SPI. Specifically, we model the bucket detection process in conventional SPI as a linear intensity transformation with spatially and spectrally varying point-spread functions. This transformation matrix is treated as a learnable parameter and jointly optimized with a shallow digital ANN composed of 2 hidden nonlinear layers. The wavelength-multiplexed diffractive processor is then configured via data-free optimization to approximate this pre-trained transformation matrix; after this optimization, the diffractive processor remains static/fixed. Upon multi-wavelength illumination and diffractive modulation, the target spatial information of the input object is spectrally encoded. A single-pixel detector captures the output spectral power at each illumination band, which is then rapidly decoded by the jointly trained digital ANN to reconstruct the input image. In addition to our numerical analyses demonstrating the feasibility of this approach, we experimentally validated its proof-of-concept using an array of light-emitting diodes (LEDs). Overall, this work demonstrates a computational imaging framework for compressive SPI that can be useful in applications such as biomedical imaging, autonomous devices, and remote sensing.
Abstract:Reinforcement finetuning (RFT) has shown great potential for enhancing the mathematical reasoning capabilities of large language models (LLMs), but it is often sample- and compute-inefficient, requiring extensive training. In this work, we introduce AdaRFT (Adaptive Curriculum Reinforcement Finetuning), a method that significantly improves both the efficiency and final accuracy of RFT through adaptive curriculum learning. AdaRFT dynamically adjusts the difficulty of training problems based on the model's recent reward signals, ensuring that the model consistently trains on tasks that are challenging but solvable. This adaptive sampling strategy accelerates learning by maintaining an optimal difficulty range, avoiding wasted computation on problems that are too easy or too hard. AdaRFT requires only a lightweight extension to standard RFT algorithms like Proximal Policy Optimization (PPO), without modifying the reward function or model architecture. Experiments on competition-level math datasets-including AMC, AIME, and IMO-style problems-demonstrate that AdaRFT significantly improves both training efficiency and reasoning performance. We evaluate AdaRFT across multiple data distributions and model sizes, showing that it reduces the number of training steps by up to 2x and improves accuracy by a considerable margin, offering a more scalable and effective RFT framework.