Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hailin Zhang

SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling

May 30, 2025

Xiaodong Ji, Hailin Zhang, Fangcheng Fu, Bin Cui

Figure 1 for SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling

Figure 2 for SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling

Figure 3 for SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling

Figure 4 for SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling

Abstract:Many advanced Large Language Model (LLM) applications require long-context processing, but the self-attention module becomes a bottleneck during the prefilling stage of inference due to its quadratic time complexity with respect to sequence length. Existing sparse attention methods accelerate attention computation by skipping less significant regions of the attention map. However, these approaches typically perform coarse-grained inspection of the attention map, rendering considerable loss in model accuracy. In this paper, we propose SALE, a fine-grained sparse attention method that accelerates the long-context prefilling stage of LLM with negligible loss in model accuracy. SALE achieves fast and accurate fine-grained attention weight estimation through 4-bit quantized query-key products, followed by block-sparse attention to accelerate prefilling computations. For importance evaluation for query-key pairs, we adopt our Relative Attention Score metric, which offers significantly higher efficiency within our framework. We implement a custom CUDA kernel optimized for our approach for hardware efficiency, reducing the additional overhead to approximately 11% of the full attention latency. Notably, SALE requires no parameter training and can be seamlessly integrated into existing systems with trivial code modifications. Experiments on long-context benchmarks demonstrate that our method outperforms existing approaches in accuracy-efficiency trade-offs, achieving at least 3.36x speedups on Llama-3.1-8B for sequences longer than 64K while maintaining model quality.

Via

Access Paper or Ask Questions

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

May 12, 2025

Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu(+55 more)

Abstract:We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available at https://github.com/xiaomimimo/MiMo.

Via

Access Paper or Ask Questions

Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Aug 13, 2024

Haiyue Jing, Wenchi Cheng, Xiang-Gen Xia, Hailin Zhang

Figure 1 for Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Figure 2 for Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Figure 3 for Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Figure 4 for Orbital-Angular-Momentum Versus MIMO: Orthogonality, Degree of Freedom,and Capacity

Abstract:The plane wave based wireless communications have becoming more and more matured, along with the well utilization of the traditional resources such as time and frequency. To further increase the capacity for rapidly increasing capacity demand of wireless communications, it is potential to use the twist wave, which has the orbital angular momentum (OAM). In this paper, we discuss the OAM based wireless communications in the aspect of orthogonality, degree of freedom (DoF), and capacity, where both the transmitter and the receiver use uniform circular array (UCA) antennas. In particular, we compare OAM based wireless communications with multiple-input-multiple-output (MIMO) based wireless communications in terms of DoF and capacity. Numerical results are presented to validate and evaluate that the DoF of OAM based wireless communications is greater than or equal to that of correlated MIMO based wireless communications when the transmitter and the receiver antennas are aligned well. The OAM based wireless communications can achieve larger capacity than the correlated MIMO in high signal-to-noise ratio (SNR) region under line-of-sight scenario.

Via

Access Paper or Ask Questions

Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Aug 13, 2024

Wenchi Cheng, Hailin Zhang, Liping Liang, Haiyue Jing, Zan Li

Figure 1 for Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Figure 2 for Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Figure 3 for Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Figure 4 for Orbital-Angular-Momentum Embedded Massive MIMO: Achieving Multiplicative Spectrum-Efficiency for mmWave Communications

Abstract:By enabling very high bandwidth for radio communications, the millimeter-wave (mmWave), which can easily be integrated with massive-multiple-input-multiple-output (massive-MIMO) due to small antenna size, has been attracting growing attention as a candidate for the fifth-generation (5G) and 5G-beyond wireless communications networks. On the other hand, the communication over the orthogonal states/modes of orbital angular momentum (OAM) is a subset of the solutions offered by massive-MIMO communications. Traditional massive-MIMO based mmWave communications did not concern the potential spectrum-efficiency-gain (SE-gain) offered by orthogonal states of OAM. However, the highly expecting maximum SE-gain for OAM and massive-MIMO communications is the product of SE-gains offered by OAM and multiplexing-MIMO. In this paper, we propose the OAM-embedded-MIMO (OEM) communication framework to obtain the multiplicative SE-gain for joint OAM and massive-MIMO based mmWave wireless communications. We design the parabolic antenna for each uniform circular array antenna to converge OAM signals. Then, we develop the mode-decomposition and multiplexing-detection scheme to obtain the transmit signal on each OAM-mode of each transmit antenna. Also, we develop the OEM-water-filling power allocation policy to achieve the maximum multiplicative SE-gain for OEM communications. The extensive simulations obtained validate and evaluate our developed parabolic antenna based converging method, mode-decomposition and multiplexing-detection scheme, and OEM-water-filling policy, showing that our proposed OEM mmWave communications can significantly increase the spectrum-efficiency as compared with traditional massive-MIMO based mmWave communications.

Via

Access Paper or Ask Questions

Achieving Practical OAM Based Wireless Communications With Misaligned Transceiver

Aug 13, 2024

Wenchi Cheng, Haiyue Jing, Wei Zhang, Zan Li, Hailin Zhang

Figure 1 for Achieving Practical OAM Based Wireless Communications With Misaligned Transceiver

Figure 2 for Achieving Practical OAM Based Wireless Communications With Misaligned Transceiver

Figure 3 for Achieving Practical OAM Based Wireless Communications With Misaligned Transceiver

Figure 4 for Achieving Practical OAM Based Wireless Communications With Misaligned Transceiver

Abstract:Orbital angular momentum (OAM) has attracted much attention for radio vortex wireless communications due to the orthogonality among different OAM-modes. To maintain the orthogonality among different OAM modes at the receiver, the strict alignment between transmit and receive antennas is highly demanded. However, it is not practical to guarantee the transceiver alignment in wireless communications. The phase turbulence, resulting from the misaligned transceivers, leads to serious inter-mode interference among different OAM modes and therefore fail for signals detection of multiple OAM modes at the receiver. To achieve practical OAM based wireless communications, in this paper we investigate the radio vortex wireless communications with misaligned transmit and receive antennas. We propose a joint Beamforming and Pre-detection (BePre) scheme, which uses two unitary matrices to convert the channel matrix into the equivalent circulant matrix for keeping the orthogonality among OAM-modes at the receiver. Then, the OAM signals can be detected with the mode-decomposition scheme at the misaligned receiver. Extensive simulations obtained validate and evaluate that our developed joint BePre scheme can efficiently detect the signals of multiple OAM-modes for the misaligned transceiver and can significantly increase the spectrum efficiency.

Via

Access Paper or Ask Questions

Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Aug 10, 2024

Wenchi Cheng, Haiyue Jing, Wei Zhang, Keyi Zhang, Hailin Zhang

Figure 1 for Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Figure 2 for Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Figure 3 for Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Figure 4 for Quasi-Fractal UCA Based OAM for Highly Efficient Orthogonal Transmission

Abstract:The development of orbital angular momentum (OAM)-based radio vortex transmission presents a promising opportunity for increasing the capacity of wireless communication in correlated channels due to its inherent orthogonality among different OAM modes. One of the most popular schemes for high-efficient OAM transmission is the digital baseband associated with uniform circular array (UCA) based transceiver. However, the periodicity of complex-exponential feed makes the maximum number of orthogonal signals carried by multiple OAM modes generally restricted to the array-element number of UCA antenna, which poses an open question of how to employ more OAM modes given a fixed number of array elements. Furthermore, signals modulated with high-order OAM modes are difficult to be captured by the receiver due to their serious divergence as propagating in free space, thus severely limiting the capacity of radio vortex communications. To overcome the above challenges, in this paper based on the partly element-overlapped fractal geometry layout and effectively using low-order OAM modes, we propose the quasi-fractal UCA (QF-UCA) antenna based OAM multiplexing transmission. We perform the two-dimension OAM modulation (TOM) and demodulation (TOD) schemes with the orthogonal OAM mode number exceeding the array-element number, which is beyond the traditional concept of multiple antennas based wireless communications. Simulation results show that our proposed scheme can achieve more number of orthogonal multiplexing streams than the maximum number of orthogonal multiplexing corresponding to traditional multiple antenna systems.

Via

Access Paper or Ask Questions

The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Jul 24, 2024

Zhuohui Yao, Wenchi Cheng, Wei Zhang, Tao Zhang, Hailin Zhang

Figure 1 for The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Figure 2 for The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Figure 3 for The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Figure 4 for The Rise of UAV Fleet Technologies for Emergency Wireless Communications in Harsh Environments

Abstract:For unforeseen emergencies, such as natural disasters and pandemic events, it is highly demanded to cope with the explosive growth of mobile data traffic in extremely critical environments. An Unmanned aerial vehicle (UAV) fleet is an effective way to facilitate the Emergency wireless COmmunication NETwork (EcoNet). In this article, a MUlti-tier Heterogeneous UAV Network (MuHun), which is with different UAV fleets in different altitudes, is proposed to flexibly serve various emergencies. We refresh the key performance indicators of full coverage, network capacity, low latency, and energy efficiency in harsh environments. Then, we present the special challenges regarding shadowing-dominated complex channel model, energy supply limited short-endurance, various communication mechanisms coexistence, and communication island for underground users in UAV-based EcoNet, followed by the MuHun-based EcoNet architecture and its advantages. Furthermore, some potential solutions such as the new hybrid-channel adapted resource allocation, reconfigurable intelligent surface assisted UAV communications, competitive heterogenous-networks, and magnetic induction based air-to-ground/underground communications are discussed to effectively achieve full coverage, high capacity, high energy efficiency, and diverse qualities of services for EcoNets in harsh environments.

Via

Access Paper or Ask Questions

Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Jul 24, 2024

Jianyu Wang, Wenchi Cheng, Wei Zhang, Hailin Zhang

Figure 1 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 2 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 3 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Figure 4 for Virtual Full-Duplex Wireless Communications with Zero-Interval Modulation and Sampling

Abstract:In this paper, we propose a virtual full-duplex (VFD) technique with zero-interval modulation and sampling (ZIMS), where two half-duplex (HD) transceivers can simultaneously transmit signals and each transceiver can effectively receive the desired information. In ZIMS-VFD, the transceiver inserts a zero-interval for each symbol in the transmit signal and provides self-interference (SI)-free intervals for itself. Meanwhile, it samples the receive signal in the provided SI-free intervals and restores the desired symbols. Based on orthogonal frequency division multiplexing (OFDM), we formulate the system model and show the transmit signal structure. Then, we give the transceiver design for single input single output (SISO) ZIMS-VFD and extend it to multiple input multiple output (MIMO) communications. Numerical results verify our theoretical analyses and show that ZIMS-VFD can effectively increase the capacity and approach the FD without SI.

Via

Access Paper or Ask Questions

Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Jul 24, 2024

Zhuohui Yao, Wenchi Cheng, Wei Zhang, Hailin Zhang

Figure 1 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 2 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 3 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Figure 4 for Resource Allocation for 5G-UAV Based Emergency Wireless Communications

Abstract:For unforeseen natural disasters, such as earthquakes, hurricanes, and floods, etc., the traditional communication infrastructure is unavailable or seriously disrupted along with persistent secondary disasters. Under such circumstances, it is highly demanded to deploy emergency wireless communication (EWC) networks to restore connectivity in accident/incident areas. The emerging fifth-generation (5G)/beyond-5G (B5G) wireless communication system, like unmanned aerial vehicle (UAV) assisted networks and intelligent reflecting surface (IRS) based communication systems, are expected to be designed or re-farmed for supporting temporary high quality communications in post-disaster areas. However, the channel characteristics of post-disaster areas quickly change as the secondary disaster resulted topographical changes, imposing new but critical challenges for EWC networks. In this paper, we propose a novel heterogeneous $\mathcal{F}$ composite fading channel model for EWC networks which accurately models and characterizes the composite fading channel with reflectors, path-loss exponent, fading, and shadowing parameters in 5G-UAV based EWC networks. Based on the model, we develop the optimal power allocation scheme with the simple closed-form expression and the numerical results based optimal joint bandwidth-power allocation scheme. We derive the corresponding capacities and compare the energy efficiency between IRS and traditional relay based 5G-UAVs. Numerical results show that the new heterogeneous Fisher-Snedecor $\mathcal{F}$ composite fading channel adapted resource allocation schemes can achieve higher capacity and energy efficiency than those of traditional channel model adapted resource allocation schemes, thus providing better communications service for post-disaster areas.

Via

Access Paper or Ask Questions

Mode Hopping with OAM-Based Index Modulation

Jul 18, 2024

Liping Liang, Wenchi Cheng, Wei Zhang, Hailin Zhang

Abstract:Orbital angular momentum (OAM) based mode hopping (MH) scheme is expected to be a potential anti-jamming technology in radio vortex wireless communications. However, it only uses one OAM-mode for hopping, thus resulting in low spectrum efficiency (SE). Index modulation offers a trade-off balance between the SE and performance reliability. In this paper, we propose an MH with OAM-based index modulation scheme, where several OAM-modes are activated for hopping, to achieve high SE at a given bit error rate in radio vortex wireless communications. Based on the proposed scheme, we derive the upper bound and lower bound of achievable SEs. Furthermore, in order to take advantage of index information, we derive the optimal hopped OAM-modes to achieve the maximum SE. Numerical results show that our proposed MH with index modulation scheme can achieve high SE while satisfying a certain reliability of radio vortex wireless communications.

* 7 pages, 5 figures, accepted by 2019 IEEE Global Communications Conference (GLOBECOM)

Via

Access Paper or Ask Questions