School of Computer Science, Tianjin University
Abstract:The development of Large AI Models (LAMs) for wireless communications, particularly for complex tasks like spectrum sensing, is critically dependent on the availability of vast, diverse, and realistic datasets. Addressing this need, this paper introduces the ChangShuoRadioData (CSRD) framework, an open-source, modular simulation platform designed for generating large-scale synthetic radio frequency (RF) data. CSRD simulates the end-to-end transmission and reception process, incorporating an extensive range of modulation schemes (100 types, including analog, digital, OFDM, and OTFS), configurable channel models featuring both statistical fading and site-specific ray tracing using OpenStreetMap data, and detailed modeling of realistic RF front-end impairments for various antenna configurations (SISO/MISO/MIMO). Using this framework, we characterize CSRD2025, a substantial dataset benchmark comprising over 25,000,000 frames (approx. 200TB), which is approximately 10,000 times larger than the widely used RML2018 dataset. CSRD2025 offers unprecedented signal diversity and complexity, specifically engineered to bridge the Sim2Real gap. Furthermore, we provide processing pipelines to convert IQ data into spectrograms annotated in COCO format, facilitating object detection approaches for time-frequency signal analysis. The dataset specification includes standardized 8:1:1 training, validation, and test splits (via frame indices) to ensure reproducible research. The CSRD framework is released at https://github.com/Singingkettle/ChangShuoRadioData to accelerate the advancement of AI-driven spectrum sensing and management.
Abstract:Integrated sensing and communication (ISAC) is a promising candidate technology for 6G due to its improvement in spectral efficiency and energy efficiency. Orthogonal frequency division multiplexing (OFDM) signal is a mainstream candidate ISAC waveform. However, there are inter-symbol interference (ISI) and inter-carrier interference (ICI) when the round-trip delay exceeds the cyclic prefix (CP) duration for OFDM signals, which limits the maximum sensing range of ISAC system. When detecting a long-range target, the wide beam inevitably covers the close-range target, of which the echo's power is much larger than that of the long-range target. In order to tackle the above problem, a multiple signal classification (MUSIC) and least squares (LS)-based spatial signal separation method is proposed to separate the echo signals reflected from different targets. Moreover, a coherent compensation-based sensing signal processing method at the receiver is proposed to enhance the signal to interference plus noise power ratio (SINR) of the OFDM block for generating the range-Doppler map (RDM) with higher SINR. Simulation results reveal that the proposed method greatly enhances the SINR of RDM by 10 dB for a target at 500 m compared with two-dimensional fast Fourier transform (2D-FFT) method. Besides, the detection probability is also significantly improved compared to the benchmarking method.
Abstract:Two subspace fitting approaches are proposed for wideband near-field localization. Unlike in conventional far-field systems, where distance and angle can be estimated separately, spherical wave propagation in near-field systems couples these parameters. We therefore derive a frequency-domain near-field signal model for multi-target wideband systems and develop a subspace fitting-based MUSIC method that jointly estimates distance and angle. To reduce complexity, a Fresnel approximation MUSIC algorithm is further introduced to decouple the distance and angle parameters. Numerical results verify the effectiveness of both proposed approaches.
Abstract:As privacy protection gains increasing importance, more models are being trained on edge devices and subsequently merged into the central server through Federated Learning (FL). However, current research overlooks the impact of network topology, physical distance, and data heterogeneity on edge devices, leading to issues such as increased latency and degraded model performance. To address these issues, we propose a new federated learning scheme on edge devices that called Federated Learning with Encrypted Data Sharing(FedEDS). FedEDS uses the client model and the model's stochastic layer to train the data encryptor. The data encryptor generates encrypted data and shares it with other clients. The client uses the corresponding client's stochastic layer and encrypted data to train and adjust the local model. FedEDS uses the client's local private data and encrypted shared data from other clients to train the model. This approach accelerates the convergence speed of federated learning training and mitigates the negative impact of data heterogeneity, making it suitable for application services deployed on edge devices requiring rapid convergence. Experiments results show the efficacy of FedEDS in promoting model performance.
Abstract:Integrated sensing and communication (ISAC) has gained traction in academia and industry. Recently, multipath components (MPCs), as a type of spatial resource, have the potential to improve the sensing performance in ISAC systems, especially in richly scattering environments. In this paper, we propose to leverage MPC and Khatri-Rao space-time (KRST) code within a single ISAC system to realize high-accuracy sensing for multiple dynamic targets and multi-user communication. Specifically, we propose a novel MPC-enhanced sensing processing scheme with symbol-level fusion, referred to as the "SL-MPS" scheme, to achieve high-accuracy localization of multiple dynamic targets and empower the single ISAC system with a new capability of absolute velocity estimation for multiple targets with a single sensing attempt. Furthermore, the KRST code is applied to flexibly balance communication and sensing performance in richly scattering environments. To evaluate the contribution of MPCs, the closed-form Cram\'er-Rao lower bounds (CRLBs) of location and absolute velocity estimation are derived. Simulation results illustrate that the proposed SL-MPS scheme is more robust and accurate in localization and absolute velocity estimation compared with the existing state-of-the-art schemes.
Abstract:A near-field motion parameter estimation method is proposed. In contract to far-field sensing systems, the near-field sensing system leverages spherical-wave characteristics to enable full-vector location and velocity estimation. Despite promising advantages, the near-field sensing system faces a significant challenge, where location and velocity parameters are intricately coupled within the signal. To address this challenge, a novel subarray-based variational message passing (VMP) method is proposed for near-field joint location and velocity estimation. First, a factor graph representation is introduced, employing subarray-level directional and Doppler parameters as intermediate variables to decouple the complex location-velocity dependencies. Based on this, the variational Bayesian inference is employed to obtain closed-form posterior distributions of subarray-level parameters. Subsequently, the message passing technique is employed, enabling tractable computation of location and velocity marginal distributions. Two implementation strategies are proposed: 1) System-level fusion that aggregates all subarray posteriors for centralized estimation, or 2) Subarray-level fusion where locally processed estimates from subarrays are fused through Guassian product rule. Cram\'er-Rao bounds for location and velocity estimation are derived, providing theoretical performance limits. Numerical results demonstrate that the proposed VMP method outperforms existing approaches while achieving a magnitude lower complexity. Specifically, the proposed VMP method achieves centimeter-level location accuracy and sub-m/s velocity accuracy. It also demonstrates robust performance for high-mobility targets, making the proposed VMP method suitable for real-time near-field sensing and communication applications.
Abstract:Unsafe prompts pose significant safety risks to large language models (LLMs). Existing methods for detecting unsafe prompts rely on data-driven fine-tuning to train guardrail models, necessitating significant data and computational resources. In contrast, recent few-shot gradient-based methods emerge, requiring only few safe and unsafe reference prompts. A gradient-based approach identifies unsafe prompts by analyzing consistent patterns of the gradients of safety-critical parameters in LLMs. Although effective, its restriction to directional similarity (cosine similarity) introduces ``directional bias'', limiting its capability to identify unsafe prompts. To overcome this limitation, we introduce GradCoo, a novel gradient co-occurrence analysis method that expands the scope of safety-critical parameter identification to include unsigned gradient similarity, thereby reducing the impact of ``directional bias'' and enhancing the accuracy of unsafe prompt detection. Comprehensive experiments on the widely-used benchmark datasets ToxicChat and XStest demonstrate that our proposed method can achieve state-of-the-art (SOTA) performance compared to existing methods. Moreover, we confirm the generalizability of GradCoo in detecting unsafe prompts across a range of LLM base models with various sizes and origins.
Abstract:A Large Language Model (LLM) tends to generate inconsistent and sometimes contradictory outputs when presented with a prompt that has equivalent semantics but is expressed differently from the original prompt. To achieve semantic consistency of an LLM, one of the key approaches is to finetune the model with prompt-output pairs with semantically equivalent meanings. Despite its effectiveness, a data-driven finetuning method incurs substantial computation costs in data preparation and model optimization. In this regime, an LLM is treated as a ``black box'', restricting our ability to gain deeper insights into its internal mechanism. In this paper, we are motivated to enhance the semantic consistency of LLMs through a more interpretable method (i.e., model editing) to this end. We first identify the model components (i.e., attention heads) that have a key impact on the semantic consistency of an LLM. We subsequently inject biases into the output of these model components along the semantic-consistency activation direction. It is noteworthy that these modifications are cost-effective, without reliance on mass manipulations of the original model parameters. Through comprehensive experiments on the constructed NLU and open-source NLG datasets, our method demonstrates significant improvements in the semantic consistency and task performance of LLMs. Additionally, our method exhibits promising generalization capabilities by performing well on tasks beyond the primary tasks.
Abstract:Large Language Models (LLMs) often generate inconsistent responses when prompted with semantically equivalent paraphrased inputs. Recently, activation steering, a technique that modulates LLM behavior by adjusting their latent representations during inference time, has been explored to improve the semantic consistency of LLMs. However, these methods typically operate at the model component level, such as layer hidden states or attention heads. They face a challenge due to the ``polysemanticity issue'', where the model components of LLMs typically encode multiple entangled features, making precise steering difficult. To address this challenge, we drill down to feature-level representations and propose LF-Steering, a novel activation steering approach to precisely identify latent feature representations responsible for semantic inconsistency. More specifically, our method maps the hidden states of relevant transformer layer into a sparsely activated, high-dimensional feature space based on a sparse autoencoder (SAE), ensuring model steering based on decoupled feature representations with minimal interference. Comprehensive experiments on both NLU and NLG datasets demonstrate the effectiveness of our method in enhancing semantic consistency, resulting in significant performance gains for various NLU and NLG tasks.
Abstract:Integrated sensing and communication (ISAC) has emerged as a pivotal enabling technology for sixth-generation (6G) mobile communication system. The ISAC research in dense urban areas has been plaguing by severe multipath interference, propelling the thorough research of ISAC multipath interference elimination. However, transforming the multipath component (MPC) from enemy into friend is a viable and mutually beneficial option. In this paper, we preliminarily explore the MPC-aided ISAC signal processing and apply a space-time code to improve the ISAC performance. Specifically, we propose a symbol-level fusion for MPC-aided localization (SFMC) scheme to achieve robust and high-accuracy localization, and apply a Khatri-Rao space-time (KRST) code to improve the communication and sensing performance in rich multipath environment. Simulation results demonstrate that the proposed SFMC scheme has more robust localization performance with higher accuracy, compared with the existing state-of-the-art schemes. The proposed SFMC would benefit highly reliable communication and sub-meter level localization in rich multipath scenarios.