Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingjie Shao

LLM Agent for Hyper-Parameter Optimization

Jun 18, 2025

Wanzhe Wang, Jianqiu Peng, Menghao Hu, Weihuang Zhong, Tong Zhang, Shuai Wang, Yixin Zhang, Mingjie Shao, Wanli Ni

Figure 1 for LLM Agent for Hyper-Parameter Optimization

Figure 2 for LLM Agent for Hyper-Parameter Optimization

Figure 3 for LLM Agent for Hyper-Parameter Optimization

Figure 4 for LLM Agent for Hyper-Parameter Optimization

Abstract:Hyper-parameters are essential and critical for the performance of communication algorithms. However, current hyper-parameters tuning methods for warm-start particles swarm optimization with cross and mutation (WS-PSO-CM) algortihm for radio map-enabled unmanned aerial vehicle (UAV) trajectory and communication are primarily heuristic-based, exhibiting low levels of automation and unsatisfactory performance. In this paper, we design an large language model (LLM) agent for automatic hyper-parameters-tuning, where an iterative framework and model context protocol (MCP) are applied. In particular, the LLM agent is first setup via a profile, which specifies the mission, background, and output format. Then, the LLM agent is driven by the prompt requirement, and iteratively invokes WS-PSO-CM algorithm for exploration. Finally, the LLM agent autonomously terminates the loop and returns a set of hyper-parameters. Our experiment results show that the minimal sum-rate achieved by hyper-parameters generated via our LLM agent is significantly higher than those by both human heuristics and random generation methods. This indicates that an LLM agent with PSO knowledge and WS-PSO-CM algorithm background is useful in finding high-performance hyper-parameters.

* 6 pages, 6 figures

Via

Access Paper or Ask Questions

Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks

Jan 20, 2025

Shuai Wang, Yanqing Xu, Chaoqun You, Mingjie Shao, Tony Q. S. Quek

Figure 1 for Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks

Figure 2 for Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks

Figure 3 for Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks

Figure 4 for Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks

Abstract:Federated learning (FL) has been recognized as a viable solution for local-privacy-aware collaborative model training in wireless edge networks, but its practical deployment is hindered by the high communication overhead caused by frequent and costly server-device synchronization. Notably, most existing communication-efficient FL algorithms fail to reduce the significant inter-device variance resulting from the prevalent issue of device heterogeneity. This variance severely decelerates algorithm convergence, increasing communication overhead and making it more challenging to achieve a well-performed model. In this paper, we propose a novel communication-efficient FL algorithm, named FedQVR, which relies on a sophisticated variance-reduced scheme to achieve heterogeneity-robustness in the presence of quantized transmission and heterogeneous local updates among active edge devices. Comprehensive theoretical analysis justifies that FedQVR is inherently resilient to device heterogeneity and has a comparable convergence rate even with a small number of quantization bits, yielding significant communication savings. Besides, considering non-ideal wireless channels, we propose FedQVR-E which enhances the convergence of FedQVR by performing joint allocation of bandwidth and quantization bits across devices under constrained transmission delays. Extensive experimental results are also presented to demonstrate the superior performance of the proposed algorithms over their counterparts in terms of both communication efficiency and application performance.

Via

Access Paper or Ask Questions

Downlink MIMO Channel Estimation from Bits: Recoverability and Algorithm

Nov 25, 2024

Rajesh Shrestha, Mingjie Shao, Mingyi Hong, Wing-Kin Ma, Xiao Fu

Figure 1 for Downlink MIMO Channel Estimation from Bits: Recoverability and Algorithm

Figure 2 for Downlink MIMO Channel Estimation from Bits: Recoverability and Algorithm

Figure 3 for Downlink MIMO Channel Estimation from Bits: Recoverability and Algorithm

Figure 4 for Downlink MIMO Channel Estimation from Bits: Recoverability and Algorithm

Abstract:In frequency division duplex (FDD) massive MIMO systems, a major challenge lies in acquiring the downlink channel state information}\ (CSI) at the base station (BS) from limited feedback sent by the user equipment (UE). To tackle this fundamental task, our contribution is twofold: First, a simple feedback framework is proposed, where a compression and Gaussian dithering-based quantization strategy is adopted at the UE side, and then a maximum likelihood estimator (MLE) is formulated at the BS side. Recoverability of the MIMO channel under the widely used double directional model is established. Specifically, analyses are presented for two compression schemes -- showing one being more overhead-economical and the other computationally lighter at the UE side. Second, to realize the MLE, an alternating direction method of multipliers (ADMM) algorithm is proposed. The algorithm is carefully designed to integrate a sophisticated harmonic retrieval (HR) solver as subroutine, which turns out to be the key of effectively tackling this hard MLE problem.Extensive numerical experiments are conducted to validate the efficacy of our approach.

Via

Access Paper or Ask Questions

Language-Queried Target Sound Extraction Without Parallel Training Data

Sep 14, 2024

Hao Ma, Zhiyuan Peng, Xu Li, Yukai Li, Mingjie Shao, Qiuqiang Kong, Ju Liu

Figure 1 for Language-Queried Target Sound Extraction Without Parallel Training Data

Figure 2 for Language-Queried Target Sound Extraction Without Parallel Training Data

Figure 3 for Language-Queried Target Sound Extraction Without Parallel Training Data

Figure 4 for Language-Queried Target Sound Extraction Without Parallel Training Data

Abstract:Language-queried target sound extraction (TSE) aims to extract specific sounds from mixtures based on language queries. Traditional fully-supervised training schemes require extensively annotated parallel audio-text data, which are labor-intensive. We introduce a language-free training scheme, requiring only unlabelled audio clips for TSE model training by utilizing the multi-modal representation alignment nature of the contrastive language-audio pre-trained model (CLAP). In a vanilla language-free training stage, target audio is encoded using the pre-trained CLAP audio encoder to form a condition embedding for the TSE model, while during inference, user language queries are encoded by CLAP text encoder. This straightforward approach faces challenges due to the modality gap between training and inference queries and information leakage from direct exposure to target audio during training. To address this, we propose a retrieval-augmented strategy. Specifically, we create an embedding cache using audio captions generated by a large language model (LLM). During training, target audio embeddings retrieve text embeddings from this cache to use as condition embeddings, ensuring consistent modalities between training and inference and eliminating information leakage. Extensive experiment results show that our retrieval-augmented approach achieves consistent and notable performance improvements over existing state-of-the-art with better generalizability.

* Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

One-Bit MIMO Detection: From Global Maximum-Likelihood Detector to Amplitude Retrieval Approach

Jul 13, 2024

Mingjie Shao, Wei-Kun Chen, Cheng-Yang Yu, Ya-Feng Liu, Wing-Kin Ma

Abstract:As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-input multiple-output (MIMO) detection in an uplink multiuser transmission scenario where the BS employs one-bit ADCs. One-bit quantization retains only the sign information and loses the amplitude information, which poses a unique challenge in the corresponding detection problem. The maximum-likelihood (ML) formulation of one-bit MIMO detection has a challenging likelihood function that hinders the application of many high-performance detectors developed for classic MIMO detection (under high-resolution ADCs). While many approximate methods for the ML detection problem have been studied, it lacks an efficient global algorithm. This paper fills this gap by proposing an efficient branch-and-bound algorithm, which is guaranteed to find the global solution of the one-bit ML MIMO detection problem. Additionally, a new amplitude retrieval (AR) detection approach is developed, incorporating explicit amplitude variables into the problem formulation. The AR approach yields simpler objective functions that enable the development of efficient algorithms offering both global and approximate solutions. The paper also contributes to the computational complexity analysis of both ML and AR detection problems. Extensive simulations are conducted to demonstrate the effectiveness and efficiency of the proposed formulations and algorithms.

Via

Access Paper or Ask Questions

An Efficient Algorithm for Sum-Rate Maximization in Fluid Antenna-Assisted ISAC System

May 10, 2024

Qian Zhang, Mingjie Shao, Tong Zhang, Gaojie Chen, Ju Liu

Abstract:In this letter, we investigate the fluid antenna (FA)-assisted integrated sensing and communication (ISAC) system, where communication and radar sensing employ the co-waveform design. Specifically, we focus on the beamformer design and antenna position configuration to realize a higher communication rate while guaranteeing the minimum radar probing power. Different from existing beamformer algorithms, we propose an efficient proximal distance algorithm (PDA) to solve the multiuser sum-rate maximization problem with radar sensing constraint to obtain the closed-form beamforming vector. In addition, we develop an extrapolated projected gradient (EPG) algorithm to obtain a better antenna location configuration for FA to enhance the ISAC performance. Numerical results show that the considered FA-assisted ISAC system enjoys a higher sum-rate by the proposed algorithm, compared with that in existing non-FA ISAC systems.

Via

Access Paper or Ask Questions

Extreme Point Pursuit -- Part I: A Framework for Constant Modulus Optimization

Mar 11, 2024

Junbin Liu, Ya Liu, Wing-Kin Ma, Mingjie Shao, Anthony Man-Cho So

Figure 1 for Extreme Point Pursuit -- Part I: A Framework for Constant Modulus Optimization

Figure 2 for Extreme Point Pursuit -- Part I: A Framework for Constant Modulus Optimization

Abstract:This study develops a framework for a class of constant modulus (CM) optimization problems, which covers binary constraints, discrete phase constraints, semi-orthogonal matrix constraints, non-negative semi-orthogonal matrix constraints, and several types of binary assignment constraints. Capitalizing on the basic principles of concave minimization and error bounds, we study a convex-constrained penalized formulation for general CM problems. The advantage of such formulation is that it allows us to leverage non-convex optimization techniques, such as the simple projected gradient method, to build algorithms. As the first part of this study, we explore the theory of this framework. We study conditions under which the formulation provides exact penalization results. We also examine computational aspects relating to the use of the projected gradient method for each type of CM constraint. Our study suggests that the proposed framework has a broad scope of applicability.

Via

Access Paper or Ask Questions

Extreme Point Pursuit -- Part II: Further Error Bound Analysis and Applications

Mar 11, 2024

Junbin Liu, Ya Liu, Wing-Kin Ma, Mingjie Shao, Anthony Man-Cho So

Figure 1 for Extreme Point Pursuit -- Part II: Further Error Bound Analysis and Applications

Figure 2 for Extreme Point Pursuit -- Part II: Further Error Bound Analysis and Applications

Figure 3 for Extreme Point Pursuit -- Part II: Further Error Bound Analysis and Applications

Figure 4 for Extreme Point Pursuit -- Part II: Further Error Bound Analysis and Applications

Abstract:In the first part of this study, a convex-constrained penalized formulation was studied for a class of constant modulus (CM) problems. In particular, the error bound techniques were shown to play a vital role in providing exact penalization results. In this second part of the study, we continue our error bound analysis for the cases of partial permutation matrices, size-constrained assignment matrices and non-negative semi-orthogonal matrices. We develop new error bounds and penalized formulations for these three cases, and the new formulations possess good structures for building computationally efficient algorithms. Moreover, we provide numerical results to demonstrate our framework in a variety of applications such as the densest k-subgraph problem, graph matching, size-constrained clustering, non-negative orthogonal matrix factorization and sparse fair principal component analysis.

Via

Access Paper or Ask Questions

CLAPSep: Leveraging Contrastive Pre-trained Models for Multi-Modal Query-Conditioned Target Sound Extraction

Feb 27, 2024

Hao Ma, Zhiyuan Peng, Mingjie Shao, Ju Liu, Xu Li, Xixin Wu

Figure 1 for CLAPSep: Leveraging Contrastive Pre-trained Models for Multi-Modal Query-Conditioned Target Sound Extraction

Figure 2 for CLAPSep: Leveraging Contrastive Pre-trained Models for Multi-Modal Query-Conditioned Target Sound Extraction

Figure 3 for CLAPSep: Leveraging Contrastive Pre-trained Models for Multi-Modal Query-Conditioned Target Sound Extraction

Figure 4 for CLAPSep: Leveraging Contrastive Pre-trained Models for Multi-Modal Query-Conditioned Target Sound Extraction

Abstract:Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world sound recordings. Language-queried target sound extraction (TSE) is an effective approach to achieving USS. Such systems consist of two components: a query network that converts user queries into conditional embeddings, and a separation network that extracts the target sound based on conditional embeddings. Existing methods mainly suffer from two issues: firstly, they require training a randomly initialized model from scratch, lacking the utilization of pre-trained models, and substantial data and computational resources are needed to ensure model convergence; secondly, existing methods need to jointly train a query network and a separation network, which tends to lead to overfitting. To address these issues, we build the CLAPSep model based on contrastive language-audio pre-trained model (CLAP). We achieve this by using a pre-trained text encoder of CLAP as the query network and introducing pre-trained audio encoder weights of CLAP into the separation network to fully utilize the prior knowledge embedded in the pre-trained model to assist in target sound extraction tasks. Extensive experimental results demonstrate that the proposed method saves training resources while ensuring the model's performance and generalizability. Additionally, we explore the model's ability to comprehensively utilize language/audio multi-modal and positive/negative multi-valent user queries, enhancing system performance while providing diversified application modes.

Via

Access Paper or Ask Questions

Extending Whisper with prompt tuning to target-speaker ASR

Dec 13, 2023

Hao Ma, Zhiyuan Peng, Mingjie Shao, Jing Li, Ju Liu

Figure 1 for Extending Whisper with prompt tuning to target-speaker ASR

Figure 2 for Extending Whisper with prompt tuning to target-speaker ASR

Figure 3 for Extending Whisper with prompt tuning to target-speaker ASR

Figure 4 for Extending Whisper with prompt tuning to target-speaker ASR

Abstract:Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech of a target speaker from multi-talker overlapped utterances. Most of the existing target-speaker ASR (TS-ASR) methods involve either training from scratch or fully fine-tuning a pre-trained model, leading to significant training costs and becoming inapplicable to large foundation models. This work leverages prompt tuning, a parameter-efficient fine-tuning approach, to extend Whisper, a large-scale single-talker ASR model, to TS-ASR. Experimental results show that prompt tuning can achieve performance comparable to state-of-the-art full fine-tuning approaches while only requiring about 1% of task-specific model parameters. Notably, the original Whisper's features, such as inverse text normalization and timestamp prediction, are retained in target-speaker ASR, keeping the generated transcriptions natural and informative.

* ICASSP 2024

Via

Access Paper or Ask Questions