Abstract:Recent R1-Zero-like research further demonstrates that reasoning extension has given large language models (LLMs) unprecedented reasoning capabilities, and Reinforcement Learning is the core technology to elicit its complex reasoning. However, conducting RL experiments directly on hyperscale models involves high computational costs and resource demands, posing significant risks. We propose the Compass-Thinker-7B model, which aims to explore the potential of Reinforcement Learning with less computational resources and costs, and provides insights for further research into RL recipes for larger models. Compass-Thinker-7B is trained from an open source model through a specially designed Reinforcement Learning Pipeline. we curate a dataset of 30k verifiable mathematics problems for the Reinforcement Learning Pipeline. By configuring data and training settings with different difficulty distributions for different stages, the potential of the model is gradually released and the training efficiency is improved. Extensive evaluations show that Compass-Thinker-7B possesses exceptional reasoning potential, and achieves superior performance on mathematics compared to the same-sized RL model.Especially in the challenging AIME2024 evaluation, Compass-Thinker-7B achieves 40% accuracy.
Abstract:Intelligent reflecting surface (IRS) has emerged as a promising technique to extend the wireless signal coverage of access point (AP) and improve the communication performance cost-effectively. In order to reduce the path-loss of the cascaded user-IRS-AP channels, the IRS-integrated AP architecture has been proposed to deploy the IRSs and the antenna array of the AP within the same antenna radome. To reduce the pilot overhead for estimating all IRS-involved channels, in this paper, we propose a novel codebook-based IRS reflection design for the IRS-integrated AP to enhance the coverage performance in a given area. In particular, the codebook consisting of a small number of codewords is designed offline by employing an efficient sector division strategy based on the azimuth angle. To ensure the performance of each sector, we optimize its corresponding codeword for IRS reflection pattern to maximize the sector-min-average-effective-channel-power (SMAECP) by applying the alternating optimization (AO) and semidefinite relaxation (SDR) methods. With the designed codebook, the AP performs the IRS reflection training by sequentially applying all codewords and selects the one achieving the best communication performance for data transmission. Numerical results show that our proposed codebook design can enhance the average channel power of the whole coverage area, as compared to the system without IRS. Moreover, our proposed codebook-based IRS reflection design is shown to achieve significant performance gain over other benchmark schemes in both single-user and multi-user transmissions.
Abstract:Existing works on IRS have mainly considered IRS being deployed in the environment to dynamically control the wireless channels between the BS and its served users. In contrast, we propose in this paper a new integrated IRS BS architecture by deploying IRSs inside the BS antenna radome. Since the distance between the integrated IRSs and BS antenna array is practically small, the path loss among them is significantly reduced and the real time control of the IRS reflection by the BS becomes easier to implement. However, the resultant near field channel model also becomes drastically different. Thus, we propose an element wise channel model for IRS to characterize the channel vector between each single antenna user and the antenna array of the BS, which includes the direct (without any IRS reflection) as well as the single and double IRS-reflection channel components. Then, we formulate a problem to optimize the reflection coefficients of all IRS reflecting elements for maximizing the uplink sum rate of the users. By considering two typical cases with/without perfect CSI at the BS, the formulated problem is solved efficiently by adopting the successive refinement method and iterative random phase algorithm (IRPA), respectively. Numerical results validate the substantial capacity gain of the integrated IRS BS architecture over the conventional multi antenna BS without integrated IRS. Moreover, the proposed algorithms significantly outperform other benchmark schemes in terms of sum rate, and the IRPA without CSI can approach the performance upper bound with perfect CSI as the training overhead increases.
Abstract:Intelligent reflecting surface (IRS) has emerged as a promising technique to enhance wireless communication performance cost effectively. The existing literature has mainly considered IRS being deployed near user terminals to improve their performance. However, this approach may incur a high cost if IRSs need to be densely deployed in the network to cater to random user locations. To avoid such high deployment cost, in this paper we consider a new IRS aided wireless network architecture, where IRSs are deployed in the vicinity of each base station (BS) to assist in its communications with distributed users regardless of their locations. Besides significantly enhancing IRSs' signal coverage, this scheme helps reduce the IRS associated channel estimation overhead as compared to conventional user-side IRSs, by exploiting the nearly static BS-IRS channels over short distance. For this scheme, we propose a new two stage transmission protocol to achieve IRS channel estimation and reflection optimization for uplink data transmission efficiently. In addition, we propose effective methods for solving the user IRS association problem based on long term channel knowledge and the selected user IRS BS cascaded channel estimation problem. Finally, all IRSs' passive reflections are jointly optimized with the BS's multi-antenna receive combining to maximize the minimum achievable rate among all users for data transmission. Numerical results show that the proposed co site IRS empowered BS scheme can achieve significant performance gains over the conventional BS without co site IRS and existing schemes for IRS channel estimation and reflection optimization, thus enabling an appealing low cost and high performance BS design for future wireless networks.