Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Ma

ResiTok: A Resilient Tokenization-Enabled Framework for Ultra-Low-Rate and Robust Image Transmission

May 03, 2025

Zhenyu Liu, Yi Ma, Rahim Tafazolli

Abstract:Real-time transmission of visual data over wireless networks remains highly challenging, even when leveraging advanced deep neural networks, particularly under severe channel conditions such as limited bandwidth and weak connectivity. In this paper, we propose a novel Resilient Tokenization-Enabled (ResiTok) framework designed for ultra-low-rate image transmission that achieves exceptional robustness while maintaining high reconstruction quality. By reorganizing visual information into hierarchical token groups consisting of essential key tokens and supplementary detail tokens, ResiTok enables progressive encoding and graceful degradation of visual quality under constrained channel conditions. A key contribution is our resilient 1D tokenization method integrated with a specialized zero-out training strategy, which systematically simulates token loss during training, empowering the neural network to effectively compress and reconstruct images from incomplete token sets. Furthermore, the channel-adaptive coding and modulation design dynamically allocates coding resources according to prevailing channel conditions, yielding superior semantic fidelity and structural consistency even at extremely low channel bandwidth ratios. Evaluation results demonstrate that ResiTok outperforms state-of-the-art methods in both semantic similarity and visual quality, with significant advantages under challenging channel conditions.

Via

Access Paper or Ask Questions

SA-MIMO: Scalable Quantum-Based Wireless Communications

Apr 27, 2025

Jiuyu Liu, Yi Ma, Rahim Tafazolli

Figure 1 for SA-MIMO: Scalable Quantum-Based Wireless Communications

Figure 2 for SA-MIMO: Scalable Quantum-Based Wireless Communications

Figure 3 for SA-MIMO: Scalable Quantum-Based Wireless Communications

Figure 4 for SA-MIMO: Scalable Quantum-Based Wireless Communications

Abstract:Rydberg atomic receivers offer a quantum-native alternative to conventional RF front-ends by directly detecting electromagnetic fields via highly excited atomic states. While their quantum-limited sensitivity and hardware simplicity make them promising for future wireless systems, extending their use to scalable multi-antenna and multi-carrier configurations, termed Scalable Atomic-MIMO (SA-MIMO), remains largely unexplored. This paper introduces a novel RF transmitter-atomic receiver architecture that addresses this gap. The core idea lies in a novel modulation technique called Phase-Rotated Symbol Spreading (PRSS), which transforms the nonlinear phase retrieval problem inherent to atomic detection into a tractable linear demultiplexing task. PRSS enables efficient signal processing and supports scalable MUX/DeMUX operations in both atomic MIMO and atomic OFDM systems. Simulation results show that the proposed system achieves up to 2.5 dB gain under optimal maximum-likelihood detection and over 10 dB under suboptimal detection in MIMO settings. These results establish PRSS assisted SA-MIMO as a promising architecture for realizing high-sensitivity, interference-resilient atomic wireless communication.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

Apr 21, 2025

Chun-Hsiao Yeh, Chenyu Wang, Shengbang Tong, Ta-Ying Cheng, Rouyu Wang, Tianzhe Chu, Yuexiang Zhai, Yubei Chen, Shenghua Gao, Yi Ma

Figure 1 for Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

Figure 2 for Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

Figure 3 for Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

Figure 4 for Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

Abstract:Multi-view understanding, the ability to reconcile visual information across diverse viewpoints for effective navigation, manipulation, and 3D scene comprehension, is a fundamental challenge in Multi-Modal Large Language Models (MLLMs) to be used as embodied agents. While recent MLLMs have shown impressive advances in high-level reasoning and planning, they frequently fall short when confronted with multi-view geometric consistency and cross-view correspondence. To comprehensively evaluate the challenges of MLLMs in multi-view scene reasoning, we propose All-Angles Bench, a benchmark of over 2,100 human carefully annotated multi-view question-answer pairs across 90 diverse real-world scenes. Our six tasks (counting, attribute identification, relative distance, relative direction, object manipulation, and camera pose estimation) specifically test model's geometric correspondence and the capacity to align information consistently across views. Our extensive experiments, benchmark on 27 representative MLLMs including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o against human evaluators reveals a substantial performance gap, indicating that current MLLMs remain far from human-level proficiency. Through in-depth analysis, we show that MLLMs are particularly underperforming under two aspects: (1) cross-view correspondence for partially occluded views and (2) establishing the coarse camera poses. These findings highlight the necessity of domain-specific refinements or modules that embed stronger multi-view awareness. We believe that our All-Angles Bench offers valuable insights and contribute to bridging the gap between MLLMs and human-level multi-view understanding. The project and benchmark are publicly available at https://danielchyeh.github.io/All-Angles-Bench/.

* Project page: https://danielchyeh.github.io/All-Angles-Bench/

Via

Access Paper or Ask Questions

Block-Weighted Lasso for Joint Optimization of Memory Depth and Kernels in Wideband DPD

Apr 18, 2025

Jinfei Wang, Yi Ma, Fei Tong, Ziming He

Abstract:The optimizations of both memory depth and kernel functions are critical for wideband digital pre-distortion (DPD). However, the memory depth is usually determined via exhaustive search over a wide range for the sake of linearization optimality, followed by the kernel selection of each memory depth, yielding excessive computational cost. In this letter, we aim to provide an efficient solution that jointly optimizes the memory depth and kernels while preserving reasonable linearization performance. Specifically, we propose to formulate this optimization as a blockweighted least absolute shrinkage and selection operator (Lasso) problem, where kernels are assigned regularization weights based on their polynomial orders. Then, a block coordinate descent algorithm is introduced to solve the block-weighted Lasso problem. Measurement results on a generalized memory polynomial (GMP) model demonstrates that our proposed solution reduces memory depth by 31.6% and kernel count by 85% compared to the full GMP, while achieving -46.4 dB error vector magnitude (EVM) for signals of 80 MHz bandwidth. In addition, the proposed solution outperforms both the full GMP and the GMP pruned by standard Lasso by at least 0.7 dB in EVM.

* 4 pages, 1 figure

Via

Access Paper or Ask Questions

Data-Importance-Aware Power Allocation for Adaptive Real-Time Communication in Computer Vision Applications

Apr 11, 2025

Chunmei Xu, Yi Ma, Rahim Tafazolli, Jiangzhou Wang

Abstract:Life-transformative applications such as immersive extended reality are revolutionizing wireless communications and computer vision (CV). This paper presents a novel framework for importance-aware adaptive data transmissions, designed specifically for real-time CV applications where task-specific fidelity is critical. A novel importance-weighted mean square error (IMSE) metric is introduced as a task-oriented measure of reconstruction quality, considering sub-pixel-level importance (SP-I) and semantic segment-level importance (SS-I) models. To minimize IMSE under total power constraints, data-importance-aware waterfilling approaches are proposed to optimally allocate transmission power according to data importance and channel conditions, prioritizing sub-streams with high importance. Simulation results demonstrate that the proposed approaches significantly outperform margin-adaptive waterfilling and equal power allocation strategies. The data partitioning that combines both SP-I and SS-I models is shown to achieve the most significant improvements, with normalized IMSE gains exceeding $7\,$dB and $10\,$dB over the baselines at high SNRs ($>10\,$dB). These substantial gains highlight the potential of the proposed framework to enhance data efficiency and robustness in real-time CV applications, especially in bandwidth-limited and resource-constrained environments.

* Submitted to JSAC

Via

Access Paper or Ask Questions

Data-Importance-Aware Waterfilling for Adaptive Real-Time Communication in Computer Vision Applications

Feb 28, 2025

Chunmei Xu, Yi Ma, Rahim Tafazolli

Figure 1 for Data-Importance-Aware Waterfilling for Adaptive Real-Time Communication in Computer Vision Applications

Figure 2 for Data-Importance-Aware Waterfilling for Adaptive Real-Time Communication in Computer Vision Applications

Figure 3 for Data-Importance-Aware Waterfilling for Adaptive Real-Time Communication in Computer Vision Applications

Figure 4 for Data-Importance-Aware Waterfilling for Adaptive Real-Time Communication in Computer Vision Applications

Abstract:This paper presents a novel framework for importance-aware adaptive data transmission, designed specifically for real-time computer vision (CV) applications where task-specific fidelity is critical. An importance-weighted mean square error (IMSE) metric is introduced, assigning data importance based on bit positions within pixels and semantic relevance within visual segments, thus providing a task-oriented measure of reconstruction quality.To minimize IMSE under the total power constraint, a data-importance-aware waterfilling approach is proposed to optimally allocate transmission power according to data importance and channel conditions. Simulation results demonstrate that the proposed approach significantly outperforms margin-adaptive waterfilling and equal power allocation strategies, achieving more than $7$ dB and $10$ dB gains in normalized IMSE at high SNRs ($> 10$ dB), respectively. These results highlight the potential of the proposed framework to enhance data efficiency and robustness in real-time CV applications, especially in bandwidth-limited and resource-constrained environments.

* Accepted in IEEE ICC2025

Via

Access Paper or Ask Questions

Importance-Aware Source-Channel Coding for Multi-Modal Task-Oriented Semantic Communication

Feb 22, 2025

Yi Ma, Chunmei Xu, Zhenyu Liu, Siqi Zhang, Rahim Tafazolli

Abstract:This paper explores the concept of information importance in multi-modal task-oriented semantic communication systems, emphasizing the need for high accuracy and efficiency to fulfill task-specific objectives. At the transmitter, generative AI (GenAI) is employed to partition visual data objects into semantic segments, each representing distinct, task-relevant information. These segments are subsequently encoded into tokens, enabling precise and adaptive transmission control. Building on this frame work, we present importance-aware source and channel coding strategies that dynamically adjust to varying levels of significance at the segment, token, and bit levels. The proposed strategies prioritize high fidelity for essential information while permitting controlled distortion for less critical elements, optimizing overall resource utilization. Furthermore, we address the source-channel coding challenge in semantic multiuser systems, particularly in multicast scenarios, where segment importance varies among receivers. To tackle these challenges, we propose solutions such as rate-splitting coded progressive transmission, ensuring flexibility and robustness in task-specific semantic communication.

* Accepted by IEEE ICMLCN 2025

Via

Access Paper or Ask Questions

Exploiting Non-uniform Quantization for Enhanced ILC in Wideband Digital Pre-distortion

Feb 12, 2025

Jinfei Wang, Yi Ma, Fei Tong, Ziming He

Abstract:In this paper, it is identified that lowering the reference level at the vector signal analyzer can significantly improve the performance of iterative learning control (ILC). We present a mathematical explanation for this phenomenon, where the signals experience logarithmic transform prior to analogue-to-digital conversion, resulting in non-uniform quantization. This process reduces the quantization noise of low-amplitude signals that constitute a substantial portion of orthogonal frequency division multiplexing (OFDM) signals, thereby improving ILC performance. Measurement results show that compared to setting the reference level to the peak amplitude, lowering the reference level achieves 3 dB improvement on error vector magnitude (EVM) and 15 dB improvement on normalized mean square error (NMSE) for 320 MHz WiFi OFDM signals.

* 4 pages, 7 figures

Via

Access Paper or Ask Questions

Improved YOLOv7 model for insulator defect detection

Feb 11, 2025

Zhenyue Wang, Guowu Yuan, Hao Zhou, Yi Ma, Yutang Ma, Dong Chen

Abstract:Insulators are crucial insulation components and structural supports in power grids, playing a vital role in the transmission lines. Due to temperature fluctuations, internal stress, or damage from hail, insulators are prone to injury. Automatic detection of damaged insulators faces challenges such as diverse types, small defect targets, and complex backgrounds and shapes. Most research for detecting insulator defects has focused on a single defect type or a specific material. However, the insulators in the grid's transmission lines have different colors and materials. Various insulator defects coexist, and the existing methods have difficulty meeting the practical application requirements. Current methods suffer from low detection accuracy and mAP0.5 cannot meet application requirements. This paper proposes an improved YOLOv7 model for multi-type insulator defect detection. First, our model replaces the SPPCSPC module with the RFB module to enhance the network's feature extraction capability. Second, a CA mechanism is introduced into the head part to enhance the network's feature representation ability and to improve detection accuracy. Third, a WIoU loss function is employed to address the low-quality samples hindering model generalization during training, thereby improving the model's overall performance. The experimental results indicate that the proposed model exhibits enhancements across various performance metrics. Specifically, there is a 1.6% advancement in mAP_0.5, a corresponding 1.6% enhancement in mAP_0.5:0.95, a 1.3% elevation in precision, and a 1% increase in recall. Moreover, the model achieves parameter reduction by 3.2 million, leading to a decrease of 2.5 GFLOPS in computational cost. Notably, there is also an improvement of 2.81 milliseconds in single-image detection speed.

* 19 pages, 13 figures

Via

Access Paper or Ask Questions

Foreign-Object Detection in High-Voltage Transmission Line Based on Improved YOLOv8m

Feb 11, 2025

Zhenyue Wang, Guowu Yuan, Hao Zhou, Yi Ma, Yutang Ma

Abstract:The safe operation of high-voltage transmission lines ensures the power grid's security. Various foreign objects attached to the transmission lines, such as balloons, kites and nesting birds, can significantly affect the safe and stable operation of high-voltage transmission lines. With the advancement of computer vision technology, periodic automatic inspection of foreign objects is efficient and necessary. Existing detection methods have low accuracy because foreign objects at-tached to the transmission lines are complex, including occlusions, diverse object types, significant scale variations, and complex backgrounds. In response to the practical needs of the Yunnan Branch of China Southern Power Grid Co., Ltd., this paper proposes an improved YOLOv8m-based model for detecting foreign objects on transmission lines. Experiments are conducted on a dataset collected from Yunnan Power Grid. The proposed model enhances the original YOLOv8m by in-corporating a Global Attention Module (GAM) into the backbone to focus on occluded foreign objects, replacing the SPPF module with the SPPCSPC module to augment the model's multiscale feature extraction capability, and introducing the Focal-EIoU loss function to address the issue of high- and low-quality sample imbalances. These improvements accelerate model convergence and enhance detection accuracy. The experimental results demonstrate that our proposed model achieves a 2.7% increase in mAP_0.5, a 4% increase in mAP_0.5:0.95, and a 6% increase in recall.

* 24 pages, 16 figures

Via

Access Paper or Ask Questions