Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jia Guo

When Attention is Beneficial for Learning Wireless Resource Allocation Efficiently?

Jul 03, 2025

Jia Guo, Chenyang Yang

Abstract:Owing to the use of attention mechanism to leverage the dependency across tokens, Transformers are efficient for natural language processing. By harnessing permutation properties broadly exist in resource allocation policies, each mapping measurable environmental parameters (e.g., channel matrix) to optimized variables (e.g., precoding matrix), graph neural networks (GNNs) are promising for learning these policies efficiently in terms of scalability and generalizability. To reap the benefits of both architectures, there is a recent trend of incorporating attention mechanism with GNNs for learning wireless policies. Nevertheless, is the attention mechanism really needed for resource allocation? In this paper, we strive to answer this question by analyzing the structures of functions defined on sets and numerical algorithms, given that the permutation properties of wireless policies are induced by the involved sets (say user set). In particular, we prove that the permutation equivariant functions on a single set can be recursively expressed by two types of functions: one involves attention, and the other does not. We proceed to re-express the numerical algorithms for optimizing several representative resource allocation problems in recursive forms. We find that when interference (say multi-user or inter-data stream interference) is not reflected in the measurable parameters of a policy, attention needs to be used to model the interference. With the insight, we establish a framework of designing GNNs by aligning with the structures. By taking reconfigurable intelligent surface-aided hybrid precoding as an example, the learning efficiency of the proposed GNN is validated via simulations.

* 13 pages, 6 figures

Via

Access Paper or Ask Questions

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Jun 18, 2025

Ling Team, Bin Hu, Cai Chen, Deng Zhao, Ding Liu, Dingnan Jin, Feng Zhu, Hao Dai, Hongzhi Luan, Jia Guo(+36 more)

Abstract:We present Ring-lite, a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL) to achieve efficient and robust reasoning capabilities. Built upon the publicly available Ling-lite model, a 16.8 billion parameter model with 2.75 billion activated parameters, our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks (e.g., AIME, LiveCodeBench, GPQA-Diamond) while activating only one-third of the parameters required by comparable models. To accomplish this, we introduce a joint training pipeline integrating distillation with RL, revealing undocumented challenges in MoE RL training. First, we identify optimization instability during RL training, and we propose Constrained Contextual Computation Policy Optimization(C3PO), a novel approach that enhances training stability and improves computational throughput via algorithm-system co-design methodology. Second, we empirically demonstrate that selecting distillation checkpoints based on entropy loss for RL training, rather than validation metrics, yields superior performance-efficiency trade-offs in subsequent RL training. Finally, we develop a two-stage training paradigm to harmonize multi-domain data integration, addressing domain conflicts that arise in training with mixed dataset. We will release the model, dataset, and code.

* Technical Report

Via

Access Paper or Ask Questions

Search is All You Need for Few-shot Anomaly Detection

Apr 16, 2025

Qishan Wang, Jia Guo, Shuyong Gao, Haofen Wang, Li Xiong, Junjie Hu, Hanqi Guo, Wenqiang Zhang

Abstract:Few-shot anomaly detection (FSAD) has emerged as a crucial yet challenging task in industrial inspection, where normal distribution modeling must be accomplished with only a few normal images. While existing approaches typically employ multi-modal foundation models combining language and vision modalities for prompt-guided anomaly detection, these methods often demand sophisticated prompt engineering and extensive manual tuning. In this paper, we demonstrate that a straightforward nearest-neighbor search framework can surpass state-of-the-art performance in both single-class and multi-class FSAD scenarios. Our proposed method, VisionAD, consists of four simple yet essential components: (1) scalable vision foundation models that extract universal and discriminative features; (2) dual augmentation strategies - support augmentation to enhance feature matching adaptability and query augmentation to address the oversights of single-view prediction; (3) multi-layer feature integration that captures both low-frequency global context and high-frequency local details with minimal computational overhead; and (4) a class-aware visual memory bank enabling efficient one-for-all multi-class detection. Extensive evaluations across MVTec-AD, VisA, and Real-IAD benchmarks demonstrate VisionAD's exceptional performance. Using only 1 normal images as support, our method achieves remarkable image-level AUROC scores of 97.4%, 94.8%, and 70.8% respectively, outperforming current state-of-the-art approaches by significant margins (+1.6%, +3.2%, and +1.4%). The training-free nature and superior few-shot capabilities of VisionAD make it particularly appealing for real-world applications where samples are scarce or expensive to obtain. Code is available at https://github.com/Qiqigeww/VisionAD.

Via

Access Paper or Ask Questions

Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models

Apr 09, 2025

Ling Team, Caizhi Tang, Chilin Fu, Chunwei Wu, Jia Guo, Jianwen Wang, Jingyu Hu, Liang Jiang, Meng Li, Peng Jiao(+8 more)

Abstract:This technical report presents Ring-Lite-Distill, a lightweight reasoning model derived from our open-source Mixture-of-Experts (MoE) Large Language Models (LLMs) Ling-Lite. This study demonstrates that through meticulous high-quality data curation and ingenious training paradigms, the compact MoE model Ling-Lite can be further trained to achieve exceptional reasoning capabilities, while maintaining its parameter-efficient architecture with only 2.75 billion activated parameters, establishing an efficient lightweight reasoning architecture. In particular, in constructing this model, we have not merely focused on enhancing advanced reasoning capabilities, exemplified by high-difficulty mathematical problem solving, but rather aimed to develop a reasoning model with more comprehensive competency coverage. Our approach ensures coverage across reasoning tasks of varying difficulty levels while preserving generic capabilities, such as instruction following, tool use, and knowledge retention. We show that, Ring-Lite-Distill's reasoning ability reaches a level comparable to DeepSeek-R1-Distill-Qwen-7B, while its general capabilities significantly surpass those of DeepSeek-R1-Distill-Qwen-7B. The models are accessible at https://huggingface.co/inclusionAI

* 10 pages

Via

Access Paper or Ask Questions

Learning Precoding in Multi-user Multi-antenna Systems: Transformer or Graph Transformer?

Mar 04, 2025

Yuxuan Duan, Jia Guo, Chenyang Yang

Abstract:Transformers have been designed for channel acquisition tasks such as channel prediction and other tasks such as precoding, while graph neural networks (GNNs) have been demonstrated to be efficient for learning a multitude of communication tasks. Nonetheless, whether or not Transformers are efficient for the tasks other than channel acquisition and how to reap the benefits of both architectures are less understood. In this paper, we take learning precoding policies in multi-user multi-antenna systems as an example to answer the questions. We notice that a Transformer tailored for precoding can reflect multiuser interference, which is essential for its generalizability to the number of users. Yet the tailored Transformer can only leverage partial permutation property of precoding policies and hence is not generalizable to the number of antennas, same as a GNN learning over a homogeneous graph. To provide useful insight, we establish the relation between Transformers and the GNNs that learn over heterogeneous graphs. Based on the relation, we propose Graph Transformers, namely 2D- and 3D-Gformers, for exploiting the permutation properties of baseband precoding and hybrid precoding policies. The learning performance, inference and training complexity, and size-generalizability of the Gformers are evaluated and compared with Transformers and GNNs via simulations.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

Diffusion Model for Multiple Antenna Communications

Feb 03, 2025

Jia Guo, Xiaoxia Xu, Yuanwei Liu, Arumugam Nallanathan

Figure 1 for Diffusion Model for Multiple Antenna Communications

Figure 2 for Diffusion Model for Multiple Antenna Communications

Figure 3 for Diffusion Model for Multiple Antenna Communications

Figure 4 for Diffusion Model for Multiple Antenna Communications

Abstract:The potential of applying diffusion models (DMs) for multiple antenna communications is discussed. A unified framework of applying DM for multiple antenna tasks is first proposed. Then, the tasks are innovatively divided into two categories, i.e., decision-making tasks and generation tasks, depending on whether an optimization of system parameters is involved. For each category, it is conceived 1) how the framework can be used for each task and 2) why the DM is superior to traditional artificial intelligence (TAI) and conventional optimization tasks. It is highlighted that the DMs are well-suited for scenarios with strong interference and noise, excelling in modeling complex data distribution and exploring better actions. A case study of learning beamforming with a DM is then provided, to demonstrate the superiority of the DMs with simulation results. Finally, the applications of DM for emerging multiple antenna technologies and promising research directions are discussed.

* 7 pages, 4 figures

Via

Access Paper or Ask Questions

GPASS: Deep Learning for Beamforming in Pinching-Antenna Systems (PASS)

Feb 03, 2025

Jia Guo, Yuanwei Liu, Arumugam Nallanathan

Abstract:A novel GPASS architecture is proposed for jointly learning pinching beamforming and transmit beamforming in pinching antenna systems (PASS). The GPASS is with a staged architecture, where the positions of pinching antennas are first learned by a sub-GNN. Then, the transmit beamforming is learned by another sub-GNN based on the antenna positions. The sub-GNNs are incorporated with the permutation property of the beamforming policy, which helps improve the learning performance. The optimal solution structure of transmit beamforming is also leveraged to simplify the mappings to be learned. Numerical results demonstrate that the proposed architecture can achieve a higher SE than a heuristic baseline method with low inference complexity.

* 5 pages, 3 Figs

Via

Access Paper or Ask Questions

SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages

Dec 02, 2024

Jia Guo, Longxu Dou, Guangtao Zeng, Stanley Kok, Wei Lu, Qian Liu

Abstract:In this paper, we introduce SailCompass, a reproducible and robust evaluation benchmark for assessing Large Language Models (LLMs) on Southeast Asian Languages (SEA). SailCompass encompasses three main SEA languages, eight primary tasks including 14 datasets covering three task types (generation, multiple-choice questions, and classification). To improve the robustness of the evaluation approach, we explore different prompt configurations for multiple-choice questions and leverage calibrations to improve the faithfulness of classification tasks. With SailCompass, we derive the following findings: (1) SEA-specialized LLMs still outperform general LLMs, although the gap has narrowed; (2) A balanced language distribution is important for developing better SEA-specialized LLMs; (3) Advanced prompting techniques (e.g., calibration, perplexity-based ranking) are necessary to better utilize LLMs. All datasets and evaluation scripts are public.

* code: https://github.com/sail-sg/sailcompass

Via

Access Paper or Ask Questions

Enhancing Brain Age Estimation with a Multimodal 3D CNN Approach Combining Structural MRI and AI-Synthesized Cerebral Blood Volume Data

Dec 01, 2024

Jordan Jomsky, Zongyu Li, Yiren Zhang, Jia Guo

Abstract:The growing global aging population necessitates enhanced methods for assessing brain aging and related neurodegenerative changes. Brain Age Gap Estimation (BrainAGE) offers a neuroimaging biomarker for understanding these changes by predicting brain age from MRI scans. Current approaches primarily use T1-weighted magnetic resonance imaging (T1w MRI) data, capturing only structural brain information. To address the lack of functional data, we integrated AI-generated Cerebral Blood Volume (AICBV) with T1w MRI, combining both structural and functional metrics. We developed a deep learning model using a VGG-based architecture to predict brain age. Our model achieved a mean absolute error (MAE) of 3.95 years and a correlation of \(R^2 = 0.94\) on the test set (\(n = 288\)), outperforming existing models trained on similar data. We have further created gradient-based class activation maps (Grad-CAM) to visualize the regions of the brain that most influenced the model's predictions, providing interpretable insights into the structural and functional contributors to brain aging.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Deep Learning for Beamforming in Multi-User Continuous Aperture Array (CAPA) Systems

Nov 14, 2024

Jia Guo, Yuanwei Liu, Hyundong Shin, Arumugam Nallanathan

Figure 1 for Deep Learning for Beamforming in Multi-User Continuous Aperture Array (CAPA) Systems

Figure 2 for Deep Learning for Beamforming in Multi-User Continuous Aperture Array (CAPA) Systems

Figure 3 for Deep Learning for Beamforming in Multi-User Continuous Aperture Array (CAPA) Systems

Figure 4 for Deep Learning for Beamforming in Multi-User Continuous Aperture Array (CAPA) Systems

Abstract:A DeepCAPA (Deep Learning for Continuous Aperture Array (CAPA)) framework is proposed to learn beamforming in CAPA systems. The beamforming optimization problem is firstly formulated, and it is mathematically proved that the optimal beamforming lies in the subspace spanned by users' conjugate channel responses. Two challenges are encountered when directly applying deep neural networks (DNNs) for solving the formulated problem, i) both the input and output spaces are infinite-dimensional, which are not compatible with DNNs. The finite-dimensional representations of inputs and outputs are derived to address this challenge. ii) A closed-form loss function is unavailable for training the DNN. To tackle this challenge, two additional DNNs are trained to approximate the operations without closed-form expressions for expediting gradient back-propagation. To improve learning performance and reduce training complexity, the permutation equivariance properties of the mappings to be learned are mathematically proved. As a further advance, the DNNs are designed as graph neural networks to leverage the properties. Numerical results demonstrate that: i) the proposed DeepCAPA framework achieves higher spectral efficiency and lower inference complexity compared to match-filtering and state-of-art Fourier-based discretization method, and ii) DeepCAPA approaches the performance upper bound of optimizing beamforming in the spatially discrete array-based system as the number of antennas in a fixed-sized area tends toward infinity.

* 13 pages, 11 figures. arXiv admin note: text overlap with arXiv:2408.11230

Via

Access Paper or Ask Questions