Semantic communications have gained significant attention as a promising approach to address the transmission bottleneck, especially with the continuous development of 6G techniques. Distinct from the well investigated physical channel impairments, this paper focuses on semantic impairments in image, particularly those arising from adversarial perturbations. Specifically, we propose a novel metric for quantifying the intensity of semantic impairment and develop a semantic impairment dataset. Furthermore, we introduce a deep learning enabled semantic communication system, termed as DeepSC-RI, to enhance the robustness of image transmission, which incorporates a multi-scale semantic extractor with a dual-branch architecture for extracting semantics with varying granularity, thereby improving the robustness of the system. The fine-grained branch incorporates a semantic importance evaluation module to identify and prioritize crucial semantics, while the coarse-grained branch adopts a hierarchical approach for capturing the robust semantics. These two streams of semantics are seamlessly integrated via an advanced cross-attention-based semantic fusion module. Experimental results demonstrate the superior performance of DeepSC-RI under various levels of semantic impairment intensity.
In this paper, we propose a robust semantic communication system to achieve the speech-to-text translation task, named Ross-S2T, by delivering the essential semantic information. Particularly, a deep semantic encoder is developed to directly condense and convert the speech in the source language to the textual semantic features associated with the target language, thus encouraging the design of a deep learning-enabled semantic communication system for speech-to-text translation that can be jointly trained in an end-to-end manner. Moreover, to cope with the practical communication scenario when the input speech is corrupted, a novel generative adversarial network (GAN)-enabled deep semantic compensator is proposed to predict the lost semantic information in the source speech and produce the textual semantic features in the target language simultaneously, which establishes a robust semantic transmission mechanism for dynamic speech input. According to the simulation results, the proposed Ross-S2T achieves significant speech-to-text translation performance compared to the conventional approach and exhibits high robustness against the corrupted speech input.
Deep learning enabled semantic communications have shown great potential to significantly improve transmission efficiency and alleviate spectrum scarcity, by effectively exchanging the semantics behind the data. Recently, the emergence of large models, boasting billions of parameters, has unveiled remarkable human-like intelligence, offering a promising avenue for advancing semantic communication by enhancing semantic understanding and contextual understanding. This article systematically investigates the large model-empowered semantic communication systems from potential applications to system design. First, we propose a new semantic communication architecture that seamlessly integrates large models into semantic communication through the introduction of a memory module. Then, the typical applications are illustrated to show the benefits of the new architecture. Besides, we discuss the key designs in implementing the new semantic communication systems from module design to system training. Finally, the potential research directions are identified to boost the large model-empowered semantic communications.
The metaverse is expected to provide immersive entertainment, education, and business applications. However, virtual reality (VR) transmission over wireless networks is data- and computation-intensive, making it critical to introduce novel solutions that meet stringent quality-of-service requirements. With recent advances in edge intelligence and deep learning, we have developed a novel multi-view synthesizing framework that can efficiently provide computation, storage, and communication resources for wireless content delivery in the metaverse. We propose a three-dimensional (3D)-aware generative model that uses collections of single-view images. These single-view images are transmitted to a group of users with overlapping fields of view, which avoids massive content transmission compared to transmitting tiles or whole 3D models. We then present a federated learning approach to guarantee an efficient learning process. The training performance can be improved by characterizing the vertical and horizontal data samples with a large latent feature space, while low-latency communication can be achieved with a reduced number of transmitted parameters during federated learning. We also propose a federated transfer learning framework to enable fast domain adaptation to different target domains. Simulation results have demonstrated the effectiveness of our proposed federated multi-view synthesizing framework for VR content delivery.
The enormous data volume of video poses a significant burden on the network. Particularly, transferring high-definition surveillance videos to the cloud consumes a significant amount of spectrum resources. To address these issues, we propose a surveillance video transmission system enabled by end-cloud computing. Specifically, the cameras actively down-sample the original video and then a redundant frame elimination module is employed to further reduce the data volume of surveillance videos. Then we develop a key-frame assisted video super-resolution model to reconstruct the high-quality video at the cloud side. Moreover, we propose a strategy of extracting key frames from source videos for better reconstruction performance by utilizing the peak signal-to-noise ratio (PSNR) of adjacent frames to measure the propagation distance of key frame information. Simulation results show that the developed system can effectively reduce the data volume by the end-cloud collaboration and outperforms existing video super-resolution models significantly in terms of PSNR and structural similarity index (SSIM).
In cellular networks, resource allocation is usually performed in a centralized way, which brings huge computation complexity to the base station (BS) and high transmission overhead. This paper explores a distributed resource allocation method that aims to maximize energy efficiency (EE) while ensuring the quality of service (QoS) for users. Specifically, in order to address wireless channel conditions, we propose a robust meta federated reinforcement learning (\textit{MFRL}) framework that allows local users to optimize transmit power and assign channels using locally trained neural network models, so as to offload computational burden from the cloud server to the local users, reducing transmission overhead associated with local channel state information. The BS performs the meta learning procedure to initialize a general global model, enabling rapid adaptation to different environments with improved EE performance. The federated learning technique, based on decentralized reinforcement learning, promotes collaboration and mutual benefits among users. Analysis and numerical results demonstrate that the proposed \textit{MFRL} framework accelerates the reinforcement learning process, decreases transmission overhead, and offloads computation, while outperforming the conventional decentralized reinforcement learning algorithm in terms of convergence speed and EE performance across various scenarios.
In semantic communications, only task-relevant information is transmitted, yielding significant performance gains over conventional communications. To satisfy user requirements for different tasks, we investigate the semantic-aware resource allocation in a multi-cell network for serving multiple tasks in this paper. First, semantic entropy is defined and quantified to measure the semantic information for different tasks. Then, we develop a novel quality-of-experience (QoE) model to formulate the semantic-aware resource allocation problem in terms of semantic compression, channel assignment, and transmit power allocation. To solve the formulated problem, we first decouple it into two subproblems. The first one is to optimize semantic compression with given channel assignment and power allocation results, which is solved by a developed deep Q-network (DQN) based method. The second one is to optimize the channel assignment and transmit power, which is modeled as a many-to-one matching game and solved by a proposed low-complexity matching algorithm. Simulation results validate the effectiveness and superiority of the proposed semantic-aware resource allocation method, as well as its compatibility with conventional and semantic communications.
Data-driven method for Structural Health Monitoring (SHM), that mine the hidden structural performance from the correlations among monitored time series data, has received widely concerns recently. However, missing data significantly impacts the conduction of this method. Missing data is a frequently encountered issue in time series data in SHM and many other real-world applications, that harms to the standardized data mining and downstream tasks, such as condition assessment. Imputation approaches based on spatiotemporal relations among monitoring data are developed to handle this issue, however, no additional information is added during imputation. This paper thus develops a robust method for damage identification that considers the missing data occasions, based on long-short term memory (LSTM) model and dropout mechanism in the autoencoder (AE) framework. Inputs channels are randomly dropped to simulate the missing data in training, and reconstruction errors are used as the loss function and the damage indicator. Quasi-static response (cable tension) of a cable-stayed bridge released in 1st IPC-SHM is employed to verify this proposed method, and results show that the missing data imputation and damage identification can be implemented together in a unified way.
In this paper, we propose an environment semantics aided wireless communication framework to reduce the transmission latency and improve the transmission reliability, where semantic information is extracted from environment image data, selectively encoded based on its task-relevance, and then fused to make decisions for channel related tasks. As a case study, we develop an environment semantics aided network architecture for mmWave communication systems, which is composed of a semantic feature extraction network, a feature selection algorithm, a task-oriented encoder, and a decision network. With images taken from street cameras and user's identification information as the inputs, the environment semantics aided network architecture is trained to predict the optimal beam index and the blockage state for the base station. It is seen that without pilot training or the costly beam scans, the environment semantics aided network architecture can realize extremely efficient beam prediction and timely blockage prediction, thus meeting requirements for ultra-reliable and low-latency communications (URLLCs). Simulation results demonstrate that compared with existing works, the proposed environment semantics aided network architecture can reduce system overheads such as storage space and computational cost while achieving satisfactory prediction accuracy and protecting user privacy.
Semantic communication is regarded as the breakthrough beyond the Shannon paradigm, which transmits only semantic information to significantly improve communication efficiency. This article introduces a framework for generalized semantic communication system, which exploits the semantic information in both the multimodal source and the wireless channel environment. Subsequently, the developed deep learning enabled end-to-end semantic communication and environment semantics aided wireless communication techniques are demonstrated through two examples. The article concludes with several research challenges to boost the development of such a generalized semantic communication system.