Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhuo Wang

Iterative Distortion Cancellation Algorithms for Single-Sideband Systems

Aug 12, 2025

Jun Dong, Tianwai Bo, Zhuo Wang, Haolei Gao, Zhongwei Tan, Yi Dong

Figure 1 for Iterative Distortion Cancellation Algorithms for Single-Sideband Systems

Figure 2 for Iterative Distortion Cancellation Algorithms for Single-Sideband Systems

Figure 3 for Iterative Distortion Cancellation Algorithms for Single-Sideband Systems

Figure 4 for Iterative Distortion Cancellation Algorithms for Single-Sideband Systems

Abstract:We propose an iterative distortion cancellation algorithm to digitally mitigate the impact of double-sideband dither signal amplitude from the automatic bias control module on Kramers-Kronig receivers without modifying physical layer structures. The algorithm utilizes the KK relation for initial signal decisions and reconstructs the distortion caused by dither signals. Experimental tests in back-to-back showed it improved tolerance to dither amplitudes up to 10% V{\pi}. For 80-km fiber transmission, the algorithm increased the receiver sensitivity by more than 1 dB, confirming the effectiveness of the proposed distortion cancellation method.

Via

Access Paper or Ask Questions

Test-Time Scaling with Reflective Generative Model

Jul 02, 2025

Zixiao Wang, Yuxin Wang, Xiaorui Wang, Mengting Xing, Jie Gao, Jianjun Xu, Guangcan Liu, Chenhui Jin, Zhuo Wang, Shengzhuo Zhang(+1 more)

Abstract:We introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3's performance via the self-supervised process reward model (SPRM). Through sharing the backbone network and using task-specific heads for next token prediction and process scoring respectively, SPRM successfully integrates the policy model and process reward model(PRM) into a unified interface without extra process annotation, reducing over 99% PRM parameters for efficient reasoning. Equipped with SPRM, MetaStone-S1 is naturally suitable for test time scaling (TTS), and we provide three reasoning effort modes (low, medium, and high), based on the controllable thinking length. Moreover, we empirically establish a scaling law that reveals the relationship between total thinking computation and TTS performance. Experiments demonstrate that our MetaStone-S1 achieves comparable performance to OpenAI-o3-mini's series with only 32B parameter size. To support the research community, we have open-sourced MetaStone-S1 at https://github.com/MetaStone-AI/MetaStone-S1.

Via

Access Paper or Ask Questions

FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making

Jun 10, 2025

Jiaxiang Chen, Mingxi Zou, Zhuo Wang, Qifan Wang, Dongning Sun, Chi Zhang, Zenglin Xu

Abstract:Financial decision-making presents unique challenges for language models, demanding temporal reasoning, adaptive risk assessment, and responsiveness to dynamic events. While large language models (LLMs) show strong general reasoning capabilities, they often fail to capture behavioral patterns central to human financial decisions-such as expert reliance under information asymmetry, loss-averse sensitivity, and feedback-driven temporal adjustment. We propose FinHEAR, a multi-agent framework for Human Expertise and Adaptive Risk-aware reasoning. FinHEAR orchestrates specialized LLM-based agents to analyze historical trends, interpret current events, and retrieve expert-informed precedents within an event-centric pipeline. Grounded in behavioral economics, it incorporates expert-guided retrieval, confidence-adjusted position sizing, and outcome-based refinement to enhance interpretability and robustness. Empirical results on curated financial datasets show that FinHEAR consistently outperforms strong baselines across trend prediction and trading tasks, achieving higher accuracy and better risk-adjusted returns.

Via

Access Paper or Ask Questions

Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation

Jun 09, 2025

Jiaxiang CHen, Zhuo Wang, Mingxi Zou, Qifan Wang, Zenglin Xu

Figure 1 for Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation

Figure 2 for Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation

Figure 3 for Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation

Figure 4 for Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation

Abstract:Human reasoning is flexible, adaptive, and grounded in prior experience-qualities that large language models (LLMs) still struggle to emulate. Existing methods either explore diverse reasoning paths at inference time or search for optimal workflows through expensive operations, but both fall short in leveraging multiple reusable strategies in a structured, efficient manner. We propose Guideline Forest, a framework that enhances LLMs reasoning by inducing structured reasoning strategies-called guidelines-from verified examples and executing them via step-wise aggregation. Unlike test-time search or single-path distillation, our method draws on verified reasoning experiences by inducing reusable guidelines and expanding each into diverse variants. Much like human reasoning, these variants reflect alternative thought patterns, are executed in parallel, refined via self-correction, and aggregated step by step-enabling the model to adaptively resolve uncertainty and synthesize robust solutions.We evaluate Guideline Forest on four benchmarks-GSM8K, MATH-500, MBPP, and HumanEval-spanning mathematical and programmatic reasoning. Guideline Forest consistently outperforms strong baselines, including CoT, ReAct, ToT, FoT, and AFlow. Ablation studies further highlight the effectiveness of multi-path reasoning and stepwise aggregation, underscoring the Guideline Forest's adaptability and generalization potential.

Via

Access Paper or Ask Questions

AutoMR: A Universal Time Series Motion Recognition Pipeline

Feb 21, 2025

Likun Zhang, Sicheng Yang, Zhuo Wang, Haining Liang, Junxiao Shen

Figure 1 for AutoMR: A Universal Time Series Motion Recognition Pipeline

Figure 2 for AutoMR: A Universal Time Series Motion Recognition Pipeline

Figure 3 for AutoMR: A Universal Time Series Motion Recognition Pipeline

Figure 4 for AutoMR: A Universal Time Series Motion Recognition Pipeline

Abstract:In this paper, we present an end-to-end automated motion recognition (AutoMR) pipeline designed for multimodal datasets. The proposed framework seamlessly integrates data preprocessing, model training, hyperparameter tuning, and evaluation, enabling robust performance across diverse scenarios. Our approach addresses two primary challenges: 1) variability in sensor data formats and parameters across datasets, which traditionally requires task-specific machine learning implementations, and 2) the complexity and time consumption of hyperparameter tuning for optimal model performance. Our library features an all-in-one solution incorporating QuartzNet as the core model, automated hyperparameter tuning, and comprehensive metrics tracking. Extensive experiments demonstrate its effectiveness on 10 diverse datasets, achieving state-of-the-art performance. This work lays a solid foundation for deploying motion-capture solutions across varied real-world applications.

* 5 figures

Via

Access Paper or Ask Questions

Duo Streamers: A Streaming Gesture Recognition Framework

Feb 17, 2025

Boxuan Zhu, Sicheng Yang, Zhuo Wang, Haining Liang, Junxiao Shen

Figure 1 for Duo Streamers: A Streaming Gesture Recognition Framework

Figure 2 for Duo Streamers: A Streaming Gesture Recognition Framework

Figure 3 for Duo Streamers: A Streaming Gesture Recognition Framework

Figure 4 for Duo Streamers: A Streaming Gesture Recognition Framework

Abstract:Gesture recognition in resource-constrained scenarios faces significant challenges in achieving high accuracy and low latency. The streaming gesture recognition framework, Duo Streamers, proposed in this paper, addresses these challenges through a three-stage sparse recognition mechanism, an RNN-lite model with an external hidden state, and specialized training and post-processing pipelines, thereby making innovative progress in real-time performance and lightweight design. Experimental results show that Duo Streamers matches mainstream methods in accuracy metrics, while reducing the real-time factor by approximately 92.3%, i.e., delivering a nearly 13-fold speedup. In addition, the framework shrinks parameter counts to 1/38 (idle state) and 1/9 (busy state) compared to mainstream models. In summary, Duo Streamers not only offers an efficient and practical solution for streaming gesture recognition in resource-constrained devices but also lays a solid foundation for extended applications in multimodal and diverse scenarios.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

Singing Voice Conversion with Accompaniment Using Self-Supervised Representation-Based Melody Features

Feb 07, 2025

Wei Chen, Binzhu Sha, Jing Yang, Zhuo Wang, Fan Fan, Zhiyong Wu

Abstract:Melody preservation is crucial in singing voice conversion (SVC). However, in many scenarios, audio is often accompanied with background music (BGM), which can cause audio distortion and interfere with the extraction of melody and other key features, significantly degrading SVC performance. Previous methods have attempted to address this by using more robust neural network-based melody extractors, but their performance drops sharply in the presence of complex accompaniment. Other approaches involve performing source separation before conversion, but this often introduces noticeable artifacts, leading to a significant drop in conversion quality and increasing the user's operational costs. To address these issues, we introduce a novel SVC method that uses self-supervised representation-based melody features to improve melody modeling accuracy in the presence of BGM. In our experiments, we compare the effectiveness of different self-supervised learning (SSL) models for melody extraction and explore for the first time how SSL benefits the task of melody extraction. The experimental results demonstrate that our proposed SVC model significantly outperforms existing baseline methods in terms of melody accuracy and shows higher similarity and naturalness in both subjective and objective evaluations across noisy and clean audio environments.

* Accepted by ICASSP2025

Via

Access Paper or Ask Questions

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Jan 26, 2025

Junan Zhang, Jing Yang, Zihao Fang, Yuancheng Wang, Zehua Zhang, Zhuo Wang, Fan Fan, Zhizheng Wu

Figure 1 for AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Figure 2 for AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Figure 3 for AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Figure 4 for AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Abstract:We introduce AnyEnhance, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, AnyEnhance is capable of handling both speech and singing voices, supporting a wide range of enhancement tasks including denoising, dereverberation, declipping, super-resolution, and target speaker extraction, all simultaneously and without fine-tuning. AnyEnhance introduces a prompt-guidance mechanism for in-context learning, which allows the model to natively accept a reference speaker's timbre. In this way, it could boost enhancement performance when a reference audio is available and enable the target speaker extraction task without altering the underlying architecture. Moreover, we also introduce a self-critic mechanism into the generative process for masked generative models, yielding higher-quality outputs through iterative self-assessment and refinement. Extensive experiments on various enhancement tasks demonstrate AnyEnhance outperforms existing methods in terms of both objective metrics and subjective listening tests. Demo audios are publicly available at https://amphionspace.github.io/anyenhance/.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection

Aug 06, 2024

Sen Nie, Zhuo Wang, Xinxin Wang, Kun He

Abstract:Recent studies emphasize the crucial role of data augmentation in enhancing the performance of object detection models. However,existing methodologies often struggle to effectively harmonize dataset diversity with semantic coordination.To bridge this gap, we introduce an innovative augmentation technique leveraging pre-trained conditional diffusion models to mediate this balance. Our approach encompasses the development of a Category Affinity Matrix, meticulously designed to enhance dataset diversity, and a Surrounding Region Alignment strategy, which ensures the preservation of semantic coordination in the augmented images. Extensive experimental evaluations confirm the efficacy of our method in enriching dataset diversity while seamlessly maintaining semantic coordination. Our method yields substantial average improvements of +1.4AP, +0.9AP, and +3.4AP over existing alternatives on three distinct object detection models, respectively.

* 15 pages, 7 figures, ICPR2024

Via

Access Paper or Ask Questions

68-Channel Highly-Integrated Neural Signal Processing PSoC with On-Chip Feature Extraction, Compression, and Hardware Accelerators for Neuroprosthetics in 22nm FDSOI

Jul 12, 2024

Liyuan Guo, Annika Weiße, Seyed Mohammad Ali Zeinolabedin, Franz Marcus Schüffny, Marco Stolba, Qier Ma, Zhuo Wang, Stefan Scholze, Andreas Dixius, Marc Berthel(+6 more)

Figure 1 for 68-Channel Highly-Integrated Neural Signal Processing PSoC with On-Chip Feature Extraction, Compression, and Hardware Accelerators for Neuroprosthetics in 22nm FDSOI

Figure 2 for 68-Channel Highly-Integrated Neural Signal Processing PSoC with On-Chip Feature Extraction, Compression, and Hardware Accelerators for Neuroprosthetics in 22nm FDSOI

Figure 3 for 68-Channel Highly-Integrated Neural Signal Processing PSoC with On-Chip Feature Extraction, Compression, and Hardware Accelerators for Neuroprosthetics in 22nm FDSOI

Figure 4 for 68-Channel Highly-Integrated Neural Signal Processing PSoC with On-Chip Feature Extraction, Compression, and Hardware Accelerators for Neuroprosthetics in 22nm FDSOI

Abstract:Multi-channel electrophysiology systems for recording of neuronal activity face significant data throughput limitations, hampering real-time, data-informed experiments. These limitations impact both experimental neurobiology research and next-generation neuroprosthetics. We present a novel solution that leverages the high integration density of 22nm FDSOI CMOS technology to address these challenges. The proposed highly integrated programmable System-on-Chip comprises 68-channel 0.41 \textmu W/Ch recording frontends, spike detectors, 16-channel 0.87-4.39 \textmu W/Ch action potential and 8-channel 0.32 \textmu W/Ch local field potential codecs, as well as a MAC-assisted power-efficient processor operating at 25 MHz (5.19 \textmu W/MHz). The system supports on-chip training processes for compression, training and inference for neural spike sorting. The spike sorting achieves an average accuracy of 91.48% or 94.12% depending on the utilized features. The proposed PSoC is optimized for reduced area (9 mm2) and power. On-chip processing and compression capabilities free up the data bottlenecks in data transmission (up to 91% space saving ratio), and moreover enable a fully autonomous yet flexible processor-driven operation. Combined, these design considerations overcome data-bottlenecks by allowing on-chip feature extraction and subsequent compression.

* 26 pages, 14 figures, 1 table, Journal

Via

Access Paper or Ask Questions