Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chenxiao Zhang

Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

Dec 17, 2025

Chenxiao Zhang, Runshi Zhang, Junchen Wang

Figure 1 for Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

Figure 2 for Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

Figure 3 for Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

Figure 4 for Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

Abstract:Medical ultrasound videos are widely used for medical inspections, disease diagnosis and surgical planning. High-fidelity lesion area and target organ segmentation constitutes a key component of the computer-assisted surgery workflow. The low contrast levels and noisy backgrounds of ultrasound videos cause missegmentation of organ boundary, which may lead to small object losses and increase boundary segmentation errors. Object tracking in long videos also remains a significant research challenge. To overcome these challenges, we propose a memory bank-based wavelet filtering and fusion network, which adopts an encoder-decoder structure to effectively extract fine-grained detailed spatial features and integrate high-frequency (HF) information. Specifically, memory-based wavelet convolution is presented to simultaneously capture category, detailed information and utilize adjacent information in the encoder. Cascaded wavelet compression is used to fuse multiscale frequency-domain features and expand the receptive field within each convolutional layer. A long short-term memory bank using cross-attention and memory compression mechanisms is designed to track objects in long video. To fully utilize the boundary-sensitive HF details of feature maps, an HF-aware feature fusion module is designed via adaptive wavelet filters in the decoder. In extensive benchmark tests conducted on four ultrasound video datasets (two thyroid nodule, the thyroid gland, the heart datasets) compared with the state-of-the-art methods, our method demonstrates marked improvements in segmentation metrics. In particular, our method can more accurately segment small thyroid nodules, demonstrating its effectiveness for cases involving small ultrasound objects in long video. The code is available at https://github.com/XiAooZ/MWNet.

* Medical Image Analysis 2026
* Chenxiao Zhang and Runshi Zhang contributed equally to this work. 14 pages, 11 figures

Via

Access Paper or Ask Questions

RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI

Dec 11, 2025

Weifan Guan, Huasen Xi, Chenxiao Zhang, Aosheng Li, Qinghao Hu, Jian Cheng

Figure 1 for RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI

Figure 2 for RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI

Figure 3 for RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI

Figure 4 for RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI

Abstract:Current embodied AI systems face severe engineering impediments, primarily characterized by poor cross-scenario adaptability, rigid inter-module coupling, and fragmented inference acceleration. To overcome these limitations, we propose RoboNeuron, a universal deployment framework for embodied intelligence. RoboNeuron is the first framework to deeply integrate the cognitive capabilities of Large Language Models (LLMs) and Vision-Language-Action (VLA) models with the real-time execution backbone of the Robot Operating System (ROS). We utilize the Model Context Protocol (MCP) as a semantic bridge, enabling the LLM to dynamically orchestrate underlying robotic tools. The framework establishes a highly modular architecture that strictly decouples sensing, reasoning, and control by leveraging ROS's unified communication interfaces. Crucially, we introduce an automated tool to translate ROS messages into callable MCP functions, significantly streamlining development. RoboNeuron significantly enhances cross-scenario adaptability and component flexibility, while establishing a systematic platform for horizontal performance benchmarking, laying a robust foundation for scalable real-world embodied applications.

Via

Access Paper or Ask Questions

Efficient Remote Sensing Segmentation With Generative Adversarial Transformer

Oct 02, 2023

Luyi Qiu, Dayu Yu, Xiaofeng Zhang, Chenxiao Zhang

Abstract:Most deep learning methods that achieve high segmentation accuracy require deep network architectures that are too heavy and complex to run on embedded devices with limited storage and memory space. To address this issue, this paper proposes an efficient Generative Adversarial Transfomer (GATrans) for achieving high-precision semantic segmentation while maintaining an extremely efficient size. The framework utilizes a Global Transformer Network (GTNet) as the generator, efficiently extracting multi-level features through residual connections. GTNet employs global transformer blocks with progressively linear computational complexity to reassign global features based on a learnable similarity function. To focus on object-level and pixel-level information, the GATrans optimizes the objective function by combining structural similarity losses. We validate the effectiveness of our approach through extensive experiments on the Vaihingen dataset, achieving an average F1 score of 90.17% and an overall accuracy of 91.92%.

Via

Access Paper or Ask Questions