Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jialin Zhuang

Advancing Expert Specialization for Better MoE

May 28, 2025

Hongcan Guo, Haolang Lu, Guoshun Nan, Bolun Chu, Jialin Zhuang, Yuan Yang, Wenhao Che, Sicong Leng, Qimei Cui, Xudong Jiang

Figure 1 for Advancing Expert Specialization for Better MoE

Figure 2 for Advancing Expert Specialization for Better MoE

Figure 3 for Advancing Expert Specialization for Better MoE

Figure 4 for Advancing Expert Specialization for Better MoE

Abstract:Mixture-of-Experts (MoE) models enable efficient scaling of large language models (LLMs) by activating only a subset of experts per input. However, we observe that the commonly used auxiliary load balancing loss often leads to expert overlap and overly uniform routing, which hinders expert specialization and degrades overall performance during post-training. To address this, we propose a simple yet effective solution that introduces two complementary objectives: (1) an orthogonality loss to encourage experts to process distinct types of tokens, and (2) a variance loss to encourage more discriminative routing decisions. Gradient-level analysis demonstrates that these objectives are compatible with the existing auxiliary loss and contribute to optimizing the training process. Experimental results over various model architectures and across multiple benchmarks show that our method significantly enhances expert specialization. Notably, our method improves classic MoE baselines with auxiliary loss by up to 23.79%, while also maintaining load balancing in downstream tasks, without any architectural modifications or additional components. We will release our code to contribute to the community.

* 33pages, 6figures

Via

Access Paper or Ask Questions

Extract the Best, Discard the Rest: CSI Feedback with Offline Large AI Models

May 13, 2025

Jialin Zhuang, Yafei Wang, Hongwei Hou, Yu Han, Wenjin Wang, Shi Jin, Jiangzhou Wang

Figure 1 for Extract the Best, Discard the Rest: CSI Feedback with Offline Large AI Models

Figure 2 for Extract the Best, Discard the Rest: CSI Feedback with Offline Large AI Models

Figure 3 for Extract the Best, Discard the Rest: CSI Feedback with Offline Large AI Models

Figure 4 for Extract the Best, Discard the Rest: CSI Feedback with Offline Large AI Models

Abstract:Large AI models (LAMs) have shown strong potential in wireless communication tasks, but their practical deployment remains hindered by latency and computational constraints. In this work, we focus on the challenge of integrating LAMs into channel state information (CSI) feedback for frequency-division duplex (FDD) massive multiple-intput multiple-output (MIMO) systems. To this end, we propose two offline frameworks, namely site-specific LAM-enhanced CSI feedback (SSLCF) and multi-scenario LAM-enhanced CSI feedback (MSLCF), that incorporate LAMs into the codebook-based CSI feedback paradigm without requiring real-time inference. Specifically, SSLCF generates a site-specific enhanced codebook through fine-tuning on locally collected CSI data, while MSLCF improves generalization by pre-generating a set of environment-aware codebooks. Both of these frameworks build upon the LAM with vision-based backbone, which is pre-trained on large-scale image datasets and fine-tuned with CSI data to generate customized codebooks. This resulting network named LVM4CF captures the structural similarity between CSI and image, allowing the LAM to refine codewords tailored to the specific environments. To optimize the codebook refinement capability of LVM4CF under both single- and dual-side deployment modes, we further propose corresponding training and inference algorithms. Simulation results show that our frameworks significantly outperform existing schemes in both reconstruction accuracy and system throughput, without introducing additional inference latency or computational overhead. These results also support the core design methodology of our proposed frameworks, extracting the best and discarding the rest, as a promising pathway for integrating LAMs into future wireless systems.

* This work has been submitted to the IEEE for possible publication.Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

CovNet: Covariance Information-Assisted CSI Feedback for FDD Massive MIMO Systems

Dec 17, 2024

Jialin Zhuang, Xuan He, Yafei Wang, Jiale Liu, Wenjin Wang

Abstract:In this paper, we propose a novel covariance information-assisted channel state information (CSI) feedback scheme for frequency-division duplex (FDD) massive multi-input multi-output (MIMO) systems. Unlike most existing CSI feedback schemes, which rely on instantaneous CSI only, the proposed CovNet leverages CSI covariance information to achieve high-performance CSI reconstruction, primarily consisting of convolutional neural network (CNN) and Transformer architecture. To efficiently utilize covariance information, we propose a covariance information processing procedure and sophisticatedly design the covariance information processing network (CIPN) to further process it. Moreover, the feed-forward network (FFN) in CovNet is designed to jointly leverage the 2D characteristics of the CSI matrix in the angle and delay domains. Simulation results demonstrate that the proposed network effectively leverages covariance information and outperforms the state-of-the-art (SOTA) scheme across the full compression ratio (CR) range.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions