Generative artificial intelligence (GenAI) and communication networks are expected to have groundbreaking synergies in 6G. Connecting GenAI agents over a wireless network can potentially unleash the power of collective intelligence and pave the way for artificial general intelligence (AGI). However, current wireless networks are designed as a "data pipe" and are not suited to accommodate and leverage the power of GenAI. In this paper, we propose the GenAINet framework in which distributed GenAI agents communicate knowledge (high-level concepts or abstracts) to accomplish arbitrary tasks. We first provide a network architecture integrating GenAI capabilities to manage both network protocols and applications. Building on this, we investigate effective communication and reasoning problems by proposing a semantic-native GenAINet. Specifically, GenAI agents extract semantic concepts from multi-modal raw data, build a knowledgebase representing their semantic relations, which is retrieved by GenAI models for planning and reasoning. Under this paradigm, an agent can learn fast from other agents' experience for making better decisions with efficient communications. Furthermore, we conduct two case studies where in wireless device query, we show that extracting and transferring knowledge can improve query accuracy with reduced communication; and in wireless power control, we show that distributed agents can improve decisions via collaborative reasoning. Finally, we address that developing a hierarchical semantic level Telecom world model is a key path towards network of collective intelligence.
Wireless communications at high-frequency bands with large antenna arrays face challenges in beam management, which can potentially be improved by multimodality sensing information from cameras, LiDAR, radar, and GPS. In this paper, we present a multimodal transformer deep learning framework for sensing-assisted beam prediction. We employ a convolutional neural network to extract the features from a sequence of images, point clouds, and radar raw data sampled over time. At each convolutional layer, we use transformer encoders to learn the hidden relations between feature tokens from different modalities and time instances over abstraction space and produce encoded vectors for the next-level feature extraction. We train the model on a combination of different modalities with supervised learning. We try to enhance the model over imbalanced data by utilizing focal loss and exponential moving average. We also evaluate data processing and augmentation techniques such as image enhancement, segmentation, background filtering, multimodal data flipping, radar signal transformation, and GPS angle calibration. Experimental results show that our solution trained on image and GPS data produces the best distance-based accuracy of predicted beams at 78.44%, with effective generalization to unseen day scenarios near 73% and night scenarios over 84%. This outperforms using other modalities and arbitrary data processing techniques, which demonstrates the effectiveness of transformers with feature fusion in performing radio beam prediction from images and GPS. Furthermore, our solution could be pretrained from large sequences of multimodality wireless data, on fine-tuning for multiple downstream radio network tasks.
In this work, we study the problem of semantic communication and inference, in which a student agent (i.e. mobile device) queries a teacher agent (i.e. cloud sever) to generate higher-order data semantics living in a simplicial complex. Specifically, the teacher first maps its data into a k-order simplicial complex and learns its high-order correlations. For effective communication and inference, the teacher seeks minimally sufficient and invariant semantic structures prior to conveying information. These minimal simplicial structures are found via judiciously removing simplices selected by the Hodge Laplacians without compromising the inference query accuracy. Subsequently, the student locally runs its own set of queries based on a masked simplicial convolutional autoencoder (SCAE) leveraging both local and remote teacher's knowledge. Numerical results corroborate the effectiveness of the proposed approach in terms of improving inference query accuracy under different channel conditions and simplicial structures. Experiments on a coauthorship dataset show that removing simplices by ranking the Laplacian values yields a 85% reduction in payload size without sacrificing accuracy. Joint semantic communication and inference by masked SCAE improves query accuracy by 25% compared to local student based query and 15% compared to remote teacher based query. Finally, incorporating channel semantics is shown to effectively improve inference accuracy, notably at low SNR values.
The evolution of generative artificial intelligence (GenAI) constitutes a turning point in reshaping the future of technology in different aspects. Wireless networks in particular, with the blooming of self-evolving networks, represent a rich field for exploiting GenAI and reaping several benefits that can fundamentally change the way how wireless networks are designed and operated nowadays. To be specific, large language models (LLMs), a subfield of GenAI, are envisioned to open up a new era of autonomous wireless networks, in which a multimodal large model trained over various Telecom data, can be fine-tuned to perform several downstream tasks, eliminating the need for dedicated AI models for each task and paving the way for the realization of artificial general intelligence (AGI)-empowered wireless networks. In this article, we aim to unfold the opportunities that can be reaped from integrating LLMs into the Telecom domain. In particular, we aim to put a forward-looking vision on a new realm of possibilities and applications of LLMs in future wireless networks, defining directions for designing, training, testing, and deploying Telecom LLMs, and reveal insights on the associated theoretical and practical challenges.
The recent progress of artificial intelligence (AI) opens up new frontiers in the possibility of automating many tasks involved in Telecom networks design, implementation, and deployment. This has been further pushed forward with the evolution of generative artificial intelligence (AI), including the emergence of large language models (LLMs), which is believed to be the cornerstone toward realizing self-governed, interactive AI agents. Motivated by this, in this paper, we aim to adapt the paradigm of LLMs to the Telecom domain. In particular, we fine-tune several LLMs including BERT, distilled BERT, RoBERTa and GPT-2, to the Telecom domain languages, and demonstrate a use case for identifying the 3rd Generation Partnership Project (3GPP) standard working groups. We consider training the selected models on 3GPP technical documents (Tdoc) pertinent to years 2009-2019 and predict the Tdoc categories in years 2020-2023. The results demonstrate that fine-tuning BERT and RoBERTa model achieves 84.6% accuracy, while GPT-2 model achieves 83% in identifying 3GPP working groups. The distilled BERT model with around 50% less parameters achieves similar performance as others. This corroborates that fine-tuning pretrained LLM can effectively identify the categories of Telecom language. The developed framework shows a stepping stone towards realizing intent-driven and self-evolving wireless networks from Telecom languages, and paves the way for the implementation of generative AI in the Telecom domain.
Semantic communication enables intelligent agents to extract meaning (or semantics) of information via interaction, to carry out collaborative tasks. In this paper, we study semantic communication from a topological space perspective, in which higher-order data semantics live in a simplicial complex. Specifically, a transmitter first maps its data into a $k$-order simplicial complex and then learns its high-order correlations. The simplicial structure and corresponding features are encoded into semantic embeddings in latent space for transmission. Subsequently, the receiver decodes the structure and infers the missing or distorted data. The transmitter and receiver collaboratively train a simplicial convolutional autoencoder to accomplish the semantic communication task. Experiments are carried out on a real dataset of Semantic Scholar Open Research Corpus, where one part of the semantic embedding is missing or distorted during communication. Numerical results show that the simplicial convolutional autoencoder enabled semantic communication effectively rebuilds the simplicial features and infer the missing data with $95\%$ accuracy, while achieving stable performance under channel noise. In contrast, the conventional autoencoder enabled communication fails to infer any missing data. Moreover, our approach is shown to effectively infer the distorted data without prior simplicial structure knowledge at the receiver, by learning extracted semantic information during communications. Leveraging the topological nature of information, the proposed method is also shown to be more reliable and efficient compared to several baselines, notably at low signal-to-noise (SNR) levels.
Recently, along with the rapid development of mobile communication technology, edge computing theory and techniques have been attracting more and more attentions from global researchers and engineers, which can significantly bridge the capacity of cloud and requirement of devices by the network edges, and thus can accelerate the content deliveries and improve the quality of mobile services. In order to bring more intelligence to the edge systems, compared to traditional optimization methodology, and driven by the current deep learning techniques, we propose to integrate the Deep Reinforcement Learning techniques and Federated Learning framework with the mobile edge systems, for optimizing the mobile edge computing, caching and communication. And thus, we design the "In-Edge AI" framework in order to intelligently utilize the collaboration among devices and edge nodes to exchange the learning parameters for a better training and inference of the models, and thus to carry out dynamic system-level optimization and application-level enhancement while reducing the unnecessary system communication load. "In-Edge AI" is evaluated and proved to have near-optimal performance but relatively low overhead of learning, while the system is cognitive and adaptive to the mobile communication systems. Finally, we discuss several related challenges and opportunities for unveiling a promising upcoming future of "In-Edge AI".
It is well accepted that image segmentation can benefit from utilizing multilevel cues. The paper focuses on utilizing the FCNN-based dense semantic predictions in the bottom-up image segmentation, arguing to take semantic cues into account from the very beginning. By this we can avoid merging regions of similar appearance but distinct semantic categories as possible. The semantic inefficiency problem is handled. We also propose a straightforward way to use the contour cues to suppress the noise in multilevel cues, thus to improve the segmentation robustness. The evaluation on the BSDS500 shows that we obtain the competitive region and boundary performance. Furthermore, since all individual regions can be assigned with appropriate semantic labels during the computation, we are capable of extracting the adjusted semantic segmentations. The experiment on Pascal VOC 2012 shows our improvement to the original semantic segmentations which derives directly from the dense predictions.
Many deep Convolutional Neural Networks (CNN) make incorrect predictions on adversarial samples obtained by imperceptible perturbations of clean samples. We hypothesize that this is caused by a failure to suppress unusual signals within network layers. As remedy we propose the use of Symmetric Activation Functions (SAF) in non-linear signal transducer units. These units suppress signals of exceptional magnitude. We prove that SAF networks can perform classification tasks to arbitrary precision in a simplified situation. In practice, rather than use SAFs alone, we add them into CNNs to improve their robustness. The modified CNNs can be easily trained using popular strategies with the moderate training load. Our experiments on MNIST and CIFAR-10 show that the modified CNNs perform similarly to plain ones on clean samples, and are remarkably more robust against adversarial and nonsense samples.
Color image segmentation is an important topic in the image processing field. MRF-MAP is often adopted in the unsupervised segmentation methods, but their performance are far behind recent interactive segmentation tools supervised by user inputs. Furthermore, the existing related unsupervised methods also suffer from the low efficiency, and high risk of being trapped in the local optima, because MRF-MAP is currently solved by iterative frameworks with inaccurate initial color distribution models. To address these problems, the letter designs an efficient method to calculate the energy functions approximately in the non-iteration style, and proposes a new binary segmentation algorithm based on the slightly tuned Lanczos eigensolver. The experiments demonstrate that the new algorithm achieves competitive performance compared with two state-of-art segmentation methods.