Abstract:Rydberg atomic receivers offer a quantum-native alternative to conventional RF front-ends by directly detecting electromagnetic fields via highly excited atomic states. While their quantum-limited sensitivity and hardware simplicity make them promising for future wireless systems, extending their use to scalable multi-antenna and multi-carrier configurations, termed Scalable Atomic-MIMO (SA-MIMO), remains largely unexplored. This paper introduces a novel RF transmitter-atomic receiver architecture that addresses this gap. The core idea lies in a novel modulation technique called Phase-Rotated Symbol Spreading (PRSS), which transforms the nonlinear phase retrieval problem inherent to atomic detection into a tractable linear demultiplexing task. PRSS enables efficient signal processing and supports scalable MUX/DeMUX operations in both atomic MIMO and atomic OFDM systems. Simulation results show that the proposed system achieves up to 2.5 dB gain under optimal maximum-likelihood detection and over 10 dB under suboptimal detection in MIMO settings. These results establish PRSS assisted SA-MIMO as a promising architecture for realizing high-sensitivity, interference-resilient atomic wireless communication.
Abstract:Multi-view understanding, the ability to reconcile visual information across diverse viewpoints for effective navigation, manipulation, and 3D scene comprehension, is a fundamental challenge in Multi-Modal Large Language Models (MLLMs) to be used as embodied agents. While recent MLLMs have shown impressive advances in high-level reasoning and planning, they frequently fall short when confronted with multi-view geometric consistency and cross-view correspondence. To comprehensively evaluate the challenges of MLLMs in multi-view scene reasoning, we propose All-Angles Bench, a benchmark of over 2,100 human carefully annotated multi-view question-answer pairs across 90 diverse real-world scenes. Our six tasks (counting, attribute identification, relative distance, relative direction, object manipulation, and camera pose estimation) specifically test model's geometric correspondence and the capacity to align information consistently across views. Our extensive experiments, benchmark on 27 representative MLLMs including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o against human evaluators reveals a substantial performance gap, indicating that current MLLMs remain far from human-level proficiency. Through in-depth analysis, we show that MLLMs are particularly underperforming under two aspects: (1) cross-view correspondence for partially occluded views and (2) establishing the coarse camera poses. These findings highlight the necessity of domain-specific refinements or modules that embed stronger multi-view awareness. We believe that our All-Angles Bench offers valuable insights and contribute to bridging the gap between MLLMs and human-level multi-view understanding. The project and benchmark are publicly available at https://danielchyeh.github.io/All-Angles-Bench/.
Abstract:The optimizations of both memory depth and kernel functions are critical for wideband digital pre-distortion (DPD). However, the memory depth is usually determined via exhaustive search over a wide range for the sake of linearization optimality, followed by the kernel selection of each memory depth, yielding excessive computational cost. In this letter, we aim to provide an efficient solution that jointly optimizes the memory depth and kernels while preserving reasonable linearization performance. Specifically, we propose to formulate this optimization as a blockweighted least absolute shrinkage and selection operator (Lasso) problem, where kernels are assigned regularization weights based on their polynomial orders. Then, a block coordinate descent algorithm is introduced to solve the block-weighted Lasso problem. Measurement results on a generalized memory polynomial (GMP) model demonstrates that our proposed solution reduces memory depth by 31.6% and kernel count by 85% compared to the full GMP, while achieving -46.4 dB error vector magnitude (EVM) for signals of 80 MHz bandwidth. In addition, the proposed solution outperforms both the full GMP and the GMP pruned by standard Lasso by at least 0.7 dB in EVM.
Abstract:Life-transformative applications such as immersive extended reality are revolutionizing wireless communications and computer vision (CV). This paper presents a novel framework for importance-aware adaptive data transmissions, designed specifically for real-time CV applications where task-specific fidelity is critical. A novel importance-weighted mean square error (IMSE) metric is introduced as a task-oriented measure of reconstruction quality, considering sub-pixel-level importance (SP-I) and semantic segment-level importance (SS-I) models. To minimize IMSE under total power constraints, data-importance-aware waterfilling approaches are proposed to optimally allocate transmission power according to data importance and channel conditions, prioritizing sub-streams with high importance. Simulation results demonstrate that the proposed approaches significantly outperform margin-adaptive waterfilling and equal power allocation strategies. The data partitioning that combines both SP-I and SS-I models is shown to achieve the most significant improvements, with normalized IMSE gains exceeding $7\,$dB and $10\,$dB over the baselines at high SNRs ($>10\,$dB). These substantial gains highlight the potential of the proposed framework to enhance data efficiency and robustness in real-time CV applications, especially in bandwidth-limited and resource-constrained environments.
Abstract:This paper presents a novel framework for importance-aware adaptive data transmission, designed specifically for real-time computer vision (CV) applications where task-specific fidelity is critical. An importance-weighted mean square error (IMSE) metric is introduced, assigning data importance based on bit positions within pixels and semantic relevance within visual segments, thus providing a task-oriented measure of reconstruction quality.To minimize IMSE under the total power constraint, a data-importance-aware waterfilling approach is proposed to optimally allocate transmission power according to data importance and channel conditions. Simulation results demonstrate that the proposed approach significantly outperforms margin-adaptive waterfilling and equal power allocation strategies, achieving more than $7$ dB and $10$ dB gains in normalized IMSE at high SNRs ($> 10$ dB), respectively. These results highlight the potential of the proposed framework to enhance data efficiency and robustness in real-time CV applications, especially in bandwidth-limited and resource-constrained environments.
Abstract:This paper explores the concept of information importance in multi-modal task-oriented semantic communication systems, emphasizing the need for high accuracy and efficiency to fulfill task-specific objectives. At the transmitter, generative AI (GenAI) is employed to partition visual data objects into semantic segments, each representing distinct, task-relevant information. These segments are subsequently encoded into tokens, enabling precise and adaptive transmission control. Building on this frame work, we present importance-aware source and channel coding strategies that dynamically adjust to varying levels of significance at the segment, token, and bit levels. The proposed strategies prioritize high fidelity for essential information while permitting controlled distortion for less critical elements, optimizing overall resource utilization. Furthermore, we address the source-channel coding challenge in semantic multiuser systems, particularly in multicast scenarios, where segment importance varies among receivers. To tackle these challenges, we propose solutions such as rate-splitting coded progressive transmission, ensuring flexibility and robustness in task-specific semantic communication.
Abstract:In this paper, it is identified that lowering the reference level at the vector signal analyzer can significantly improve the performance of iterative learning control (ILC). We present a mathematical explanation for this phenomenon, where the signals experience logarithmic transform prior to analogue-to-digital conversion, resulting in non-uniform quantization. This process reduces the quantization noise of low-amplitude signals that constitute a substantial portion of orthogonal frequency division multiplexing (OFDM) signals, thereby improving ILC performance. Measurement results show that compared to setting the reference level to the peak amplitude, lowering the reference level achieves 3 dB improvement on error vector magnitude (EVM) and 15 dB improvement on normalized mean square error (NMSE) for 320 MHz WiFi OFDM signals.
Abstract:Insulators are crucial insulation components and structural supports in power grids, playing a vital role in the transmission lines. Due to temperature fluctuations, internal stress, or damage from hail, insulators are prone to injury. Automatic detection of damaged insulators faces challenges such as diverse types, small defect targets, and complex backgrounds and shapes. Most research for detecting insulator defects has focused on a single defect type or a specific material. However, the insulators in the grid's transmission lines have different colors and materials. Various insulator defects coexist, and the existing methods have difficulty meeting the practical application requirements. Current methods suffer from low detection accuracy and mAP0.5 cannot meet application requirements. This paper proposes an improved YOLOv7 model for multi-type insulator defect detection. First, our model replaces the SPPCSPC module with the RFB module to enhance the network's feature extraction capability. Second, a CA mechanism is introduced into the head part to enhance the network's feature representation ability and to improve detection accuracy. Third, a WIoU loss function is employed to address the low-quality samples hindering model generalization during training, thereby improving the model's overall performance. The experimental results indicate that the proposed model exhibits enhancements across various performance metrics. Specifically, there is a 1.6% advancement in mAP_0.5, a corresponding 1.6% enhancement in mAP_0.5:0.95, a 1.3% elevation in precision, and a 1% increase in recall. Moreover, the model achieves parameter reduction by 3.2 million, leading to a decrease of 2.5 GFLOPS in computational cost. Notably, there is also an improvement of 2.81 milliseconds in single-image detection speed.
Abstract:The safe operation of high-voltage transmission lines ensures the power grid's security. Various foreign objects attached to the transmission lines, such as balloons, kites and nesting birds, can significantly affect the safe and stable operation of high-voltage transmission lines. With the advancement of computer vision technology, periodic automatic inspection of foreign objects is efficient and necessary. Existing detection methods have low accuracy because foreign objects at-tached to the transmission lines are complex, including occlusions, diverse object types, significant scale variations, and complex backgrounds. In response to the practical needs of the Yunnan Branch of China Southern Power Grid Co., Ltd., this paper proposes an improved YOLOv8m-based model for detecting foreign objects on transmission lines. Experiments are conducted on a dataset collected from Yunnan Power Grid. The proposed model enhances the original YOLOv8m by in-corporating a Global Attention Module (GAM) into the backbone to focus on occluded foreign objects, replacing the SPPF module with the SPPCSPC module to augment the model's multiscale feature extraction capability, and introducing the Focal-EIoU loss function to address the issue of high- and low-quality sample imbalances. These improvements accelerate model convergence and enhance detection accuracy. The experimental results demonstrate that our proposed model achieves a 2.7% increase in mAP_0.5, a 4% increase in mAP_0.5:0.95, and a 6% increase in recall.
Abstract:High-voltage transmission lines are located far from the road, resulting in inconvenient inspection work and rising maintenance costs. Intelligent inspection of power transmission lines has become increasingly important. However, subsequent intelligent inspection relies on accurately detecting various key components. Due to the low detection accuracy of key components in transmission line image inspection, this paper proposed an improved object detection model based on the YOLOv5s (You Only Look Once Version 5 Small) model to improve the detection accuracy of key components of transmission lines. According to the characteristics of the power grid inspection image, we first modify the distance measurement in the k-means clustering to improve the anchor matching of the YOLOv5s model. Then, we add the convolutional block attention module (CBAM) attention mechanism to the backbone network to improve accuracy. Finally, we apply the focal loss function to reduce the impact of class imbalance. Our improved method's mAP (mean average precision) reached 98.1%, the precision reached 97.5%, the recall reached 94.4%, and the detection rate reached 84.8 FPS (frames per second). The experimental results show that our improved model improves detection accuracy and has performance advantages over other models.