Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yao Sun

School of Aeronautical Engineering, Air Force Engineering University, Xi'an, China

QoE Optimization for Semantic Self-Correcting Video Transmission in Multi-UAV Networks

Jul 09, 2025

Xuyang Chen, Chong Huang, Daquan Feng, Lei Luo, Yao Sun, Xiang-Gen Xia

Abstract:Real-time unmanned aerial vehicle (UAV) video streaming is essential for time-sensitive applications, including remote surveillance, emergency response, and environmental monitoring. However, it faces challenges such as limited bandwidth, latency fluctuations, and high packet loss. To address these issues, we propose a novel semantic self-correcting video transmission framework with ultra-fine bitrate granularity (SSCV-G). In SSCV-G, video frames are encoded into a compact semantic codebook space, and the transmitter adaptively sends a subset of semantic indices based on bandwidth availability, enabling fine-grained bitrate control for improved bandwidth efficiency. At the receiver, a spatio-temporal vision transformer (ST-ViT) performs multi-frame joint decoding to reconstruct dropped semantic indices by modeling intra- and inter-frame dependencies. To further improve performance under dynamic network conditions, we integrate a multi-user proximal policy optimization (MUPPO) reinforcement learning scheme that jointly optimizes communication resource allocation and semantic bitrate selection to maximize user Quality of Experience (QoE). Extensive experiments demonstrate that the proposed SSCV-G significantly outperforms state-of-the-art video codecs in coding efficiency, bandwidth adaptability, and packet loss robustness. Moreover, the proposed MUPPO-based QoE optimization consistently surpasses existing benchmarks.

* 13 pages

Via

Access Paper or Ask Questions

Building Floor Number Estimation from Crowdsourced Street-Level Images: Munich Dataset and Baseline Method

May 23, 2025

Yao Sun, Sining Chen, Yifan Tian, Xiao Xiang Zhu

Abstract:Accurate information on the number of building floors, or above-ground storeys, is essential for household estimation, utility provision, risk assessment, evacuation planning, and energy modeling. Yet large-scale floor-count data are rarely available in cadastral and 3D city databases. This study proposes an end-to-end deep learning framework that infers floor numbers directly from unrestricted, crowdsourced street-level imagery, avoiding hand-crafted features and generalizing across diverse facade styles. To enable benchmarking, we release the Munich Building Floor Dataset, a public set of over 6800 geo-tagged images collected from Mapillary and targeted field photography, each paired with a verified storey label. On this dataset, the proposed classification-regression network attains 81.2% exact accuracy and predicts 97.9% of buildings within +/-1 floor. The method and dataset together offer a scalable route to enrich 3D city models with vertical information and lay a foundation for future work in urban informatics, remote sensing, and geographic information science. Source code and data will be released under an open license at https://github.com/ya0-sun/Munich-SVI-Floor-Benchmark.

* Code and data: https://github.com/ya0-sun/Munich-SVI-Floor-Benchmark

Via

Access Paper or Ask Questions

TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset

May 13, 2025

Olaf Wysocki, Benedikt Schwab, Manoj Kumar Biswanath, Michael Greza, Qilin Zhang, Jingwei Zhu, Thomas Froech, Medhini Heeramaglore, Ihab Hijazi, Khaoula Kanna(+24 more)

Abstract:Urban Digital Twins (UDTs) have become essential for managing cities and integrating complex, heterogeneous data from diverse sources. Creating UDTs involves challenges at multiple process stages, including acquiring accurate 3D source data, reconstructing high-fidelity 3D models, maintaining models' updates, and ensuring seamless interoperability to downstream tasks. Current datasets are usually limited to one part of the processing chain, hampering comprehensive UDTs validation. To address these challenges, we introduce the first comprehensive multimodal Urban Digital Twin benchmark dataset: TUM2TWIN. This dataset includes georeferenced, semantically aligned 3D models and networks along with various terrestrial, mobile, aerial, and satellite observations boasting 32 data subsets over roughly 100,000 $m^2$ and currently 767 GB of data. By ensuring georeferenced indoor-outdoor acquisition, high accuracy, and multimodal data integration, the benchmark supports robust analysis of sensors and the development of advanced reconstruction methods. Additionally, we explore downstream tasks demonstrating the potential of TUM2TWIN, including novel view synthesis of NeRF and Gaussian Splatting, solar potential analysis, point cloud semantic segmentation, and LoD3 building reconstruction. We are convinced this contribution lays a foundation for overcoming current limitations in UDT creation, fostering new research directions and practical solutions for smarter, data-driven urban environments. The project is available under: https://tum2t.win

* Submitted to the ISPRS Journal of Photogrammetry and Remote Sensing

Via

Access Paper or Ask Questions

Joint Knowledge and Power Management for Secure Semantic Communication Networks

Apr 21, 2025

Xuesong Liu, Yansong Liu, Haoyu Tang, Fangzhou Zhao, Le Xia, Yao Sun

Abstract:Recently, semantic communication (SemCom) has shown its great superiorities in resource savings and information exchanges. However, while its unique background knowledge guarantees accurate semantic reasoning and recovery, semantic information security-related concerns are introduced at the same time. Since the potential eavesdroppers may have the same background knowledge to accurately decrypt the private semantic information transmitted between legal SemCom users, this makes the knowledge management in SemCom networks rather challenging in joint consideration with the power control. To this end, this paper focuses on jointly addressing three core issues of power allocation, knowledge base caching (KBC), and device-to-device (D2D) user pairing (DUP) in secure SemCom networks. We first develop a novel performance metric, namely semantic secrecy throughput (SST), to quantify the information security level that can be achieved at each pair of D2D SemCom users. Next, an SST maximization problem is formulated subject to secure SemCom-related delay and reliability constraints. Afterward, we propose a security-aware resource management solution using the Lagrange primal-dual method and a two-stage method. Simulation results demonstrate our proposed solution nearly doubles the SST performance and realizes less than half of the queuing delay performance compared to different benchmarks.

Via

Access Paper or Ask Questions

Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation

Apr 08, 2025

Xiao Zhang, Xiangyu Han, Xiwen Lai, Yao Sun, Pei Zhang, Konrad Kording

Abstract:Today's unsupervised image segmentation algorithms often segment suboptimally. Modern graph-cut based approaches rely on high-dimensional attention maps from Transformer-based foundation models, typically employing a relaxed Normalized Cut solved recursively via the Fiedler vector (the eigenvector of the second smallest eigenvalue). Consequently, they still lag behind supervised methods in both mask generation speed and segmentation accuracy. We present a regularized fractional alternating cut (Falcon), an optimization-based K-way Normalized Cut without relying on recursive eigenvector computations, achieving substantially improved speed and accuracy. Falcon operates in two stages: (1) a fast K-way Normalized Cut solved by extending into a fractional quadratic transformation, with an alternating iterative procedure and regularization to avoid local minima; and (2) refinement of the resulting masks using complementary low-level information, producing high-quality pixel-level segmentations. Experiments show that Falcon not only surpasses existing state-of-the-art methods by an average of 2.5% across six widely recognized benchmarks (reaching up to 4.3\% improvement on Cityscapes), but also reduces runtime by around 30% compared to prior graph-based approaches. These findings demonstrate that the semantic information within foundation-model attention can be effectively harnessed by a highly parallelizable graph cut framework. Consequently, Falcon can narrow the gap between unsupervised and supervised segmentation, enhancing scalability in real-world applications and paving the way for dense prediction-based vision pre-training in various downstream tasks. The code is released in https://github.com/KordingLab/Falcon.

Via

Access Paper or Ask Questions

A semantic communication-based workload-adjustable transceiver for wireless AI-generated content (AIGC) delivery

Mar 24, 2025

Runze Cheng, Yao Sun, Lan Zhang, Lei Feng, Lei Zhang, Muhammad Ali Imran

Abstract:With the significant advances in generative AI (GAI) and the proliferation of mobile devices, providing high-quality AI-generated content (AIGC) services via wireless networks is becoming the future direction. However, the primary challenges of AIGC service delivery in wireless networks lie in unstable channels, limited bandwidth resources, and unevenly distributed computational resources. In this paper, we employ semantic communication (SemCom) in diffusion-based GAI models to propose a Resource-aware wOrkload-adjUstable TransceivEr (ROUTE) for AIGC delivery in dynamic wireless networks. Specifically, to relieve the communication resource bottleneck, SemCom is utilized to prioritize semantic information of the generated content. Then, to improve computational resource utilization in both edge and local and reduce AIGC semantic distortion in transmission, modified diffusion-based models are applied to adjust the computing workload and semantic density in cooperative content generation. Simulations verify the superiority of our proposed ROUTE in terms of latency and content quality compared to conventional AIGC approaches.

Via

Access Paper or Ask Questions

Semantic Communication for the Internet of Sounds: Architecture, Design Principles, and Challenges

Jul 16, 2024

Chengsi Liang, Yao Sun, Christo Kurisummoottil Thomas, Lina Mohjazi, Walid Saad

Figure 1 for Semantic Communication for the Internet of Sounds: Architecture, Design Principles, and Challenges

Figure 2 for Semantic Communication for the Internet of Sounds: Architecture, Design Principles, and Challenges

Figure 3 for Semantic Communication for the Internet of Sounds: Architecture, Design Principles, and Challenges

Figure 4 for Semantic Communication for the Internet of Sounds: Architecture, Design Principles, and Challenges

Abstract:The Internet of Sounds (IoS) combines sound sensing, processing, and transmission techniques, enabling collaboration among diverse sound devices. To achieve perceptual quality of sound synchronization in the IoS, it is necessary to precisely synchronize three critical factors: sound quality, timing, and behavior control. However, conventional bit-oriented communication, which focuses on bit reproduction, may not be able to fulfill these synchronization requirements under dynamic channel conditions. One promising approach to address the synchronization challenges of the IoS is through the use of semantic communication (SC) that can capture and leverage the logical relationships in its source data. Consequently, in this paper, we propose an IoS-centric SC framework with a transceiver design. The designed encoder extracts semantic information from diverse sources and transmits it to IoS listeners. It can also distill important semantic information to reduce transmission latency for timing synchronization. At the receiver's end, the decoder employs context- and knowledge-based reasoning techniques to reconstruct and integrate sounds, which achieves sound quality synchronization across diverse communication environments. Moreover, by periodically sharing knowledge, SC models of IoS devices can be updated to optimize their synchronization behavior. Finally, we explore several open issues on mathematical models, resource allocation, and cross-layer protocols.

Via

Access Paper or Ask Questions

S-RAN: Semantic-Aware Radio Access Networks

Jul 15, 2024

Yao Sun, Lan Zhang, Linke Guo, Jian Li, Dusit Niyato, Yuguang Fang

Abstract:Semantic communication (SemCom) has been a transformative paradigm, emphasizing the precise exchange of meaningful information over traditional bit-level transmissions. However, existing SemCom research, primarily centered on simplified scenarios like single-pair transmissions with direct wireless links, faces significant challenges when applied to real-world radio access networks (RANs). This article introduces a Semantic-aware Radio Access Network (S-RAN), offering a holistic systematic view of SemCom beyond single-pair transmissions. We begin by outlining the S-RAN architecture, introducing new physical components and logical functions along with key design challenges. We then present transceiver design for end-to-end transmission to overcome conventional SemCom transceiver limitations, including static channel conditions, oversimplified background knowledge models, and hardware constraints. Later, we delve into the discussion on radio resource management for multiple users, covering semantic channel modeling, performance metrics, resource management algorithms, and a case study, to elaborate distinctions from resource management for legacy RANs. Finally, we highlight open research challenges and potential solutions. The objective of this article is to serve as a basis for advancing SemCom research into practical wireless systems.

Via

Access Paper or Ask Questions

Fighter flight trajectory prediction based on spatio-temporal graphcial attention network

May 13, 2024

Yao Sun, Tengyu Jing, Jiapeng Wang, Wei Wang

Figure 1 for Fighter flight trajectory prediction based on spatio-temporal graphcial attention network

Figure 2 for Fighter flight trajectory prediction based on spatio-temporal graphcial attention network

Figure 3 for Fighter flight trajectory prediction based on spatio-temporal graphcial attention network

Figure 4 for Fighter flight trajectory prediction based on spatio-temporal graphcial attention network

Abstract:Quickly and accurately predicting the flight trajectory of a blue army fighter in close-range air combat helps a red army fighter gain a dominant situation, which is the winning factor in later air combat. However,due to the high speed and even hypersonic capabilities of advanced fighters, the diversity of tactical maneuvers,and the instantaneous nature of situational transitions,it is difficult to meet the requirements of practical combat applications in terms of prediction accuracy.To improve prediction accuracy,this paper proposes a spatio-temporal graph attention network (ST-GAT) using encoding and decoding structures to predict the flight trajectory. The encoder adopts a parallel structure of Transformer and GAT branches embedded with the multi-head self-attention mechanism in each front end. The Transformer branch network is used to extract the temporal characteristics of historical trajectories and capture the impact of the fighter's historical state on future trajectories, while the GAT branch network is used to extract spatial features in historical trajectories and capture potential spatial correlations between fighters.Then we concatenate the outputs of the two branches into a new feature vector and input it into a decoder composed of a fully connected network to predict the future position coordinates of the blue army fighter.The computer simulation results show that the proposed network significantly improves the prediction accuracy of flight trajectories compared to the enhanced CNN-LSTM network (ECNN-LSTM), with improvements of 47% and 34% in both ADE and FDE indicators,providing strong support for subsequent autonomous combat missions.

Via

Access Paper or Ask Questions

Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach

Apr 30, 2024

Shuai Lyu, Yao Sun, Linke Guo, Xiaoyong Yuan, Fang Fang, Lan Zhang, Xianbin Wang

Figure 1 for Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach

Figure 2 for Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach

Figure 3 for Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach

Abstract:Task-oriented semantic communications (TSC) enhance radio resource efficiency by transmitting task-relevant semantic information. However, current research often overlooks the inherent semantic distinctions among encoded features. Due to unavoidable channel variations from time and frequency-selective fading, semantically sensitive feature units could be more susceptible to erroneous inference if corrupted by dynamic channels. Therefore, this letter introduces a unified channel-resilient TSC framework via information bottleneck. This framework complements existing TSC approaches by controlling information flow to capture fine-grained feature-level semantic robustness. Experiments on a case study for real-time subchannel allocation validate the framework's effectiveness.

* This work has been submitted to the IEEE Communications Letters

Via

Access Paper or Ask Questions