Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jing Yang

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Jan 26, 2025

Junan Zhang, Jing Yang, Zihao Fang, Yuancheng Wang, Zehua Zhang, Zhuo Wang, Fan Fan, Zhizheng Wu

Figure 1 for AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Figure 2 for AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Figure 3 for AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Figure 4 for AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Abstract:We introduce AnyEnhance, a unified generative model for voice enhancement that processes both speech and singing voices. Based on a masked generative model, AnyEnhance is capable of handling both speech and singing voices, supporting a wide range of enhancement tasks including denoising, dereverberation, declipping, super-resolution, and target speaker extraction, all simultaneously and without fine-tuning. AnyEnhance introduces a prompt-guidance mechanism for in-context learning, which allows the model to natively accept a reference speaker's timbre. In this way, it could boost enhancement performance when a reference audio is available and enable the target speaker extraction task without altering the underlying architecture. Moreover, we also introduce a self-critic mechanism into the generative process for masked generative models, yielding higher-quality outputs through iterative self-assessment and refinement. Extensive experiments on various enhancement tasks demonstrate AnyEnhance outperforms existing methods in terms of both objective metrics and subjective listening tests. Demo audios are publicly available at https://amphionspace.github.io/anyenhance/.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Average Reward Reinforcement Learning for Wireless Radio Resource Management

Jan 12, 2025

Kun Yang, Jing Yang, Cong Shen

Figure 1 for Average Reward Reinforcement Learning for Wireless Radio Resource Management

Figure 2 for Average Reward Reinforcement Learning for Wireless Radio Resource Management

Figure 3 for Average Reward Reinforcement Learning for Wireless Radio Resource Management

Figure 4 for Average Reward Reinforcement Learning for Wireless Radio Resource Management

Abstract:In this paper, we address a crucial but often overlooked issue in applying reinforcement learning (RL) to radio resource management (RRM) in wireless communications: the mismatch between the discounted reward RL formulation and the undiscounted goal of wireless network optimization. To the best of our knowledge, we are the first to systematically investigate this discrepancy, starting with a discussion of the problem formulation followed by simulations that quantify the extent of the gap. To bridge this gap, we introduce the use of average reward RL, a method that aligns more closely with the long-term objectives of RRM. We propose a new method called the Average Reward Off policy Soft Actor Critic (ARO SAC) is an adaptation of the well known Soft Actor Critic algorithm in the average reward framework. This new method achieves significant performance improvement our simulation results demonstrate a 15% gain in the system performance over the traditional discounted reward RL approach, underscoring the potential of average reward RL in enhancing the efficiency and effectiveness of wireless network optimization.

* Accepted by Asilomar 2024

Via

Access Paper or Ask Questions

Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Jan 07, 2025

Kam Woh Ng, Jing Yang, Jia Wei Sii, Jiankang Deng, Chee Seng Chan, Yi-Zhe Song, Tao Xiang, Xiatian Zhu

Figure 1 for Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Figure 2 for Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Figure 3 for Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Figure 4 for Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation

Abstract:In this paper, we push the boundaries of fine-grained 3D generation into truly creative territory. Current methods either lack intricate details or simply mimic existing objects -- we enable both. By lifting 2D fine-grained understanding into 3D through multi-view diffusion and modeling part latents as continuous distributions, we unlock the ability to generate entirely new, yet plausible parts through interpolation and sampling. A self-supervised feature consistency loss further ensures stable generation of these unseen parts. The result is the first system capable of creating novel 3D objects with species-specific details that transcend existing examples. While we demonstrate our approach on birds, the underlying framework extends beyond things that can chirp! Code will be released at https://github.com/kamwoh/chirpy3d.

* 20 pages

Via

Access Paper or Ask Questions

MarsSQE: Stereo Quality Enhancement for Martian Images Using Bi-level Cross-view Attention

Dec 30, 2024

Mai Xu, Yinglin Zhu, Qunliang Xing, Jing Yang, Xin Zou

Figure 1 for MarsSQE: Stereo Quality Enhancement for Martian Images Using Bi-level Cross-view Attention

Figure 2 for MarsSQE: Stereo Quality Enhancement for Martian Images Using Bi-level Cross-view Attention

Figure 3 for MarsSQE: Stereo Quality Enhancement for Martian Images Using Bi-level Cross-view Attention

Figure 4 for MarsSQE: Stereo Quality Enhancement for Martian Images Using Bi-level Cross-view Attention

Abstract:Stereo images captured by Mars rovers are transmitted after lossy compression due to the limited bandwidth between Mars and Earth. Unfortunately, this process results in undesirable compression artifacts. In this paper, we present a novel stereo quality enhancement approach for Martian images, named MarsSQE. First, we establish the first dataset of stereo Martian images. Through extensive analysis of this dataset, we observe that cross-view correlations in Martian images are notably high. Leveraging this insight, we design a bi-level cross-view attention-based quality enhancement network that fully exploits these inherent cross-view correlations. Specifically, our network integrates pixel-level attention for precise matching and patch-level attention for broader contextual information. Experimental results demonstrate the effectiveness of our MarsSQE approach.

Via

Access Paper or Ask Questions

RFPPO: Motion Dynamic RRT based Fluid Field - PPO for Dynamic TF/TA Routing Planning

Dec 28, 2024

Rongkun Xue, Jing Yang, Yuyang Jiang, Yiming Feng, Zi Yang

Figure 1 for RFPPO: Motion Dynamic RRT based Fluid Field - PPO for Dynamic TF/TA Routing Planning

Figure 2 for RFPPO: Motion Dynamic RRT based Fluid Field - PPO for Dynamic TF/TA Routing Planning

Figure 3 for RFPPO: Motion Dynamic RRT based Fluid Field - PPO for Dynamic TF/TA Routing Planning

Figure 4 for RFPPO: Motion Dynamic RRT based Fluid Field - PPO for Dynamic TF/TA Routing Planning

Abstract:Existing local dynamic route planning algorithms, when directly applied to terrain following/terrain avoidance, or dynamic obstacle avoidance for large and medium-sized fixed-wing aircraft, fail to simultaneously meet the requirements of real-time performance, long-distance planning, and the dynamic constraints of large and medium-sized aircraft. To deal with this issue, this paper proposes the Motion Dynamic RRT based Fluid Field - PPO for dynamic TF/TA routing planning. Firstly, the action and state spaces of the proximal policy gradient algorithm are redesigned using disturbance flow fields and artificial potential field algorithms, establishing an aircraft dynamics model, and designing a state transition process based on this model. Additionally, a reward function is designed to encourage strategies for obstacle avoidance, terrain following, terrain avoidance, and safe flight. Experimental results on real DEM data demonstrate that our algorithm can complete long-distance flight tasks through collision-free trajectory planning that complies with dynamic constraints, without the need for prior global planning.

* 2024 IEEE Intelligent Vehicles Symposium

Via

Access Paper or Ask Questions

Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields

Dec 13, 2024

Jing Yang, Pratusha Bhuvana Prasad, Qing Zhang, Yajie Zhao

Figure 1 for Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields

Figure 2 for Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields

Figure 3 for Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields

Figure 4 for Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields

Abstract:Accurately measuring the geometry and spatially-varying reflectance of real-world objects is a complex task due to their intricate shapes formed by concave features, hollow engravings and diverse surfaces, resulting in inter-reflection and occlusion when photographed. Moreover, issues like lens flare and overexposure can arise from interference from secondary reflections and limitations of hardware even in professional studios. In this paper, we propose a novel approach using polarized reflectance field capture and a comprehensive statistical analysis algorithm to obtain highly accurate surface normals (within 0.1mm/px) and spatially-varying reflectance data, including albedo, specular separation, roughness, and anisotropy parameters for realistic rendering and analysis. Our algorithm removes image artifacts via analytical modeling and further employs both an initial step and an optimization step computed on the whole image collection to further enhance the precision of per-pixel surface reflectance and normal measurement. We showcase the captured shapes and reflectance of diverse objects with a wide material range, spanning from highly diffuse to highly glossy - a challenge unaddressed by prior techniques. Our approach enhances downstream applications by offering precise measurements for realistic rendering and provides a valuable training dataset for emerging research in inverse rendering. We will release the polarized reflectance fields of several captured objects with this work.

Via

Access Paper or Ask Questions

FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning

Dec 05, 2024

Jiayu Liu, Yong Wang, Nianbin Wang, Jing Yang, Xiaohui Tao

Figure 1 for FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning

Figure 2 for FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning

Figure 3 for FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning

Figure 4 for FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning

Abstract:Federated Learning (FL) is an innovative distributed machine learning paradigm that enables neural network training across devices without centralizing data. While this addresses issues of information sharing and data privacy, challenges arise from data heterogeneity across clients and increasing network scale, leading to impacts on model performance and training efficiency. Previous research shows that in IID environments, the parameter structure of the model is expected to adhere to certain specific consistency principles. Thus, identifying and regularizing these consistencies can mitigate issues from heterogeneous data. We found that both soft labels derived from knowledge distillation and the classifier head parameter matrix, when multiplied by their own transpose, capture the intrinsic relationships between data classes. These shared relationships suggest inherent consistency. Therefore, the work in this paper identifies the consistency between the two and leverages it to regulate training, underpinning our proposed FedDW framework. Experimental results show FedDW outperforms 10 state-of-the-art FL methods, improving accuracy by an average of 3% in highly heterogeneous settings. Additionally, we provide a theoretical proof that FedDW offers higher efficiency, with the additional computational load from backpropagation being negligible. The code is available at https://github.com/liuvvvvv1/FedDW.

Via

Access Paper or Ask Questions

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective

Dec 02, 2024

Jinouwen Zhang, Rongkun Xue, Yazhe Niu, Yun Chen, Jing Yang, Hongsheng Li, Yu Liu

Abstract:Generative models, particularly diffusion models, have achieved remarkable success in density estimation for multimodal data, drawing significant interest from the reinforcement learning (RL) community, especially in policy modeling in continuous action spaces. However, existing works exhibit significant variations in training schemes and RL optimization objectives, and some methods are only applicable to diffusion models. In this study, we compare and analyze various generative policy training and deployment techniques, identifying and validating effective designs for generative policy algorithms. Specifically, we revisit existing training objectives and classify them into two categories, each linked to a simpler approach. The first approach, Generative Model Policy Optimization (GMPO), employs a native advantage-weighted regression formulation as the training objective, which is significantly simpler than previous methods. The second approach, Generative Model Policy Gradient (GMPG), offers a numerically stable implementation of the native policy gradient method. We introduce a standardized experimental framework named GenerativeRL. Our experiments demonstrate that the proposed methods achieve state-of-the-art performance on various offline-RL datasets, offering a unified and practical guideline for training and deploying generative policies.

Via

Access Paper or Ask Questions

Pretrained Reversible Generation as Unsupervised Visual Representation Learning

Nov 29, 2024

Rongkun Xue, Jinouwen Zhang, Yazhe Niu, Dazhong Shen, Bingqi Ma, Yu Liu, Jing Yang

Abstract:Recent generative models based on score matching and flow matching have significantly advanced generation tasks, but their potential in discriminative tasks remains underexplored. Previous approaches, such as generative classifiers, have not fully leveraged the capabilities of these models for discriminative tasks due to their intricate designs. We propose Pretrained Reversible Generation (PRG), which extracts unsupervised representations by reversing the generative process of a pretrained continuous flow model. PRG effectively reuses unsupervised generative models, leveraging their high capacity to serve as robust and generalizable feature extractors for downstream tasks. Our method consistently outperforms prior approaches across multiple benchmarks, achieving state-of-the-art performance among generative model-based methods, including 78\% top-1 accuracy on ImageNet. Extensive ablation studies further validate the effectiveness of our approach.

Via

Access Paper or Ask Questions

Decision Feedback In-Context Symbol Detection over Block-Fading Channels

Nov 12, 2024

Li Fan, Jing Yang, Cong Shen

Figure 1 for Decision Feedback In-Context Symbol Detection over Block-Fading Channels

Figure 2 for Decision Feedback In-Context Symbol Detection over Block-Fading Channels

Figure 3 for Decision Feedback In-Context Symbol Detection over Block-Fading Channels

Abstract:Pre-trained Transformers, through in-context learning (ICL), have demonstrated exceptional capabilities to adapt to new tasks using example prompts \textit{without model update}. Transformer-based wireless receivers, where prompts consist of the pilot data in the form of transmitted and received signal pairs, have shown high estimation accuracy when pilot data are abundant. However, pilot information is often costly and limited in practice. In this work, we propose the \underline{DE}cision \underline{F}eedback \underline{IN}-Cont\underline{E}xt \underline{D}etection (DEFINED) solution as a new wireless receiver design, which bypasses channel estimation and directly performs symbol detection using the (sometimes extremely) limited pilot data. The key innovation in DEFINED is the proposed decision feedback mechanism in ICL, where we sequentially incorporate the detected symbols into the prompts to improve the detections for subsequent symbols. Extensive experiments across a broad range of wireless communication settings demonstrate that DEFINED achieves significant performance improvements, in some cases only needing a single pilot pair.

Via

Access Paper or Ask Questions