By integrating recent advances in large language models (LLMs) and generative models into the emerging semantic communication (SC) paradigm, in this article we put forward to a novel framework of language-oriented semantic communication (LSC). In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency. To demonstrate LSC's potential, we introduce three innovative algorithms: 1) semantic source coding (SSC) which compresses a text prompt into its key head words capturing the prompt's syntactic essence while maintaining their appearance order to keep the prompt's context; 2) semantic channel coding (SCC) that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD) that produces listener-customized prompts via in-context learning the listener's language style. In a communication task for progressive text-to-image generation, the proposed methods achieve higher perceptual similarities with fewer transmissions while enhancing robustness in noisy communication channels.
This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models. Regarding nowadays smart applications, successful communication can be made by conveying the perceptual meaning, which we set as text prompt. Text serves as a suitable semantic representation of image data as it has evolved to instruct an image or generate image through multi-modal techniques, by being interpreted in a manner similar to human cognition. Utilizing text can also reduce the overload compared to transmitting the intact data itself. The transmitter converts objective image to text through multi-model generation process and the receiver reconstructs the image using reverse process. Each word in the text sentence has each syntactic role, responsible for particular piece of information the text contains. For further efficiency in communication load, the transmitter sequentially sends words in priority of carrying the most information until reaches successful communication. Therefore, our primary focus is on the promising design of a communication system based on image-to-text transformation and the proposed schemes for sequentially transmitting word tokens. Our work is expected to pave a new road of utilizing state-of-the-art generative models to real communication systems
In this paper, we introduce a novel semantic generative communication (SGC) framework, where generative users leverage text-to-image (T2I) generators to create images locally from downloaded text prompts, while non-generative users directly download images from a base station (BS). Although generative users help reduce downlink transmission energy at the BS, they consume additional energy for image generation and for uploading their generator state information (GSI). We formulate the problem of minimizing the total energy consumption of the BS and the users, and devise a generative user selection algorithm. Simulation results corroborate that our proposed algorithm reduces total energy by up to 54% compared to a baseline with all non-generative users.
This paper presents a wireless data collection framework that employs an unmanned aerial vehicle (UAV) to efficiently gather data from distributed IoT sensors deployed in a large area. Our approach takes into account the non-zero communication ranges of the sensors to optimize the flight path of the UAV, resulting in a variation of the Traveling Salesman Problem (TSP). We prove mathematically that the optimal waypoints for this TSP-variant problem are restricted to the boundaries of the sensor communication ranges, greatly reducing the solution space. Building on this finding, we develop a low-complexity UAV-assisted sensor data collection algorithm, and demonstrate its effectiveness in a selected use case where we minimize the total energy consumption of the UAV and sensors by jointly optimizing the UAV's travel distance and the sensors' communication ranges.
Low probability of detection (LPD) has recently emerged as a means to enhance the privacy and security of wireless networks. Unlike existing wireless security techniques, LPD measures aim to conceal the entire existence of wireless communication instead of safeguarding the information transmitted from users. Motivated by LPD communication, in this paper, we study a privacy-preserving and distributed framework based on graph neural networks to minimise the detectability of a wireless ad-hoc network as a whole and predict an optimal communication region for each node in the wireless network, allowing them to communicate while remaining undetected from external actors. We also demonstrate the effectiveness of the proposed method in terms of two performance measures, i.e., mean absolute error and median absolute error.
In this work, we propose and develop a simple experimental testbed to study the feasibility of a novel idea by coupling radio frequency (RF) sensing technology with Correlated Knowledge Distillation (CKD) theory towards designing lightweight, near real-time and precise human pose monitoring systems. The proposed CKD framework transfers and fuses pose knowledge from a robust "Teacher" model to a parameterized "Student" model, which can be a promising technique for obtaining accurate yet lightweight pose estimates. To assure its efficacy, we implemented CKD for distilling logits in our integrated Software Defined Radio (SDR)-based experimental setup and investigated the RF-visual signal correlation. Our CKD-RF sensing technique is characterized by two modes -- a camera-fed Teacher Class Network (e.g., images, videos) with an SDR-fed Student Class Network (e.g., RF signals). Specifically, our CKD model trains a dual multi-branch teacher and student network by distilling and fusing knowledge bases. The resulting CKD models are then subsequently used to identify the multimodal correlation and teach the student branch in reverse. Instead of simply aggregating their learnings, CKD training comprised multiple parallel transformations with the two domains, i.e., visual images and RF signals. Once trained, our CKD model can efficiently preserve privacy and utilize the multimodal correlated logits from the two different neural networks for estimating poses without using visual signals/video frames (by using only the RF signals).
Response ranking in dialogues plays a crucial role in retrieval-based conversational systems. In a multi-turn dialogue, to capture the gist of a conversation, contextual information serves as essential knowledge to achieve this goal. In this paper, we present a flexible neural framework that can integrate contextual information from multiple channels. Specifically for the current task, our approach is to provide two information channels in parallel, Fusing Conversation history and domain knowledge extracted from Candidate provenance (FCC), where candidate responses are curated, as contextual information to improve the performance of multi-turn dialogue response ranking. The proposed approach can be generalized as a module to incorporate miscellaneous contextual features for other context-oriented tasks. We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks. Our experimental results show that our framework significantly outperforms the previous state-of-the-art models, improving Recall@1 by 7% and MAP by 4%. Furthermore, we conduct ablation studies to evaluate the contributions of each information channel, and of the framework components, to the overall ranking performance, providing additional insights and directions for further improvements.
We present the InterviewBot that dynamically integrates conversation history and customized topics into a coherent embedding space to conduct 10 mins hybrid-domain (open and closed) conversations with foreign students applying to U.S. colleges for assessing their academic and cultural readiness. To build a neural-based end-to-end dialogue model, 7,361 audio recordings of human-to-human interviews are automatically transcribed, where 440 are manually corrected for finetuning and evaluation. To overcome the input/output size limit of a transformer-based encoder-decoder model, two new methods are proposed, context attention and topic storing, allowing the model to make relevant and consistent interactions. Our final model is tested both statistically by comparing its responses to the interview data and dynamically by inviting professional interviewers and various students to interact with it in real-time, finding it highly satisfactory in fluency and context awareness.
Autoencoder (AE) is a neural network (NN) architecture that is trained to reconstruct an input at its output. By measuring the reconstruction errors of new input samples, AE can detect anomalous samples deviated from the trained data distribution. The key to success is to achieve high-fidelity reconstruction (HFR) while restricting AE's capability of generalization beyond training data, which should be balanced commonly via iterative re-training. Alternatively, we propose a novel framework of AE-based anomaly detection, coined HFR-AE, by projecting new inputs into a subspace wherein the trained AE achieves HFR, thereby increasing the gap between normal and anomalous sample reconstruction errors. Simulation results corroborate that HFR-AE improves the area under receiver operating characteristic curve (AUROC) under different AE architectures and settings by up to 13.4% compared to Vanilla AE-based anomaly detection.
Federated learning (FL) enables the training of models among distributed clients without compromising the privacy of training datasets, while the invisibility of clients datasets and the training process poses a variety of security threats. This survey provides the taxonomy of poisoning attacks and experimental evaluation to discuss the need for robust FL.