Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongjin Song

Joint Residual Reweighting for Classifier Free Guidance in Flow-Matching Zero-Shot TTS

Jun 24, 2026

Runwu Shi, Yujin Wang, Hongjin Song, Chunxiang Jin

Abstract:Classifier-free guidance (CFG) is widely used in flow-matching-based zero-shot text-to-speech (TTS), where generation is typically controlled by two conditions: the target text and a prompt speech signal. Standard CFG strengthens these conditions jointly, while recent branch-selective guidance methods attempt to enhance text or speaker conditioning separately, often leading to a trade-off between text correctness and speaker similarity. In this paper, we revisit the CFG under independently masked text and speech-prompt conditions, and decompose the guidance field into text, speaker, and joint residuals. We show that conventional speaker-selective guidance entangles the speaker residual with the joint residual, which may disturb text-related generation. Based on this observation, we propose joint residual reweighting, which independently controls the speaker and joint residuals within the standard CFG framework. Experiments on F5-TTS and CosyVoice2 show that the proposed method improves speaker similarity while maintaining competitive text correctness, demonstrating the usefulness of the joint residual for balancing speaker fidelity and text accuracy in zero-shot TTS.

Via

Access Paper or Ask Questions

Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power Forecasting

Jan 28, 2025

Hongjin Song, Qianrun Chen, Tianqi Jiang, Yongfeng Li, Xusheng Li, Wenjun Xi, Songtao Huang

Figure 1 for Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power Forecasting

Figure 2 for Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power Forecasting

Figure 3 for Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power Forecasting

Figure 4 for Applying Ensemble Models based on Graph Neural Network and Reinforcement Learning for Wind Power Forecasting

Abstract:Accurately predicting the wind power output of a wind farm across various time scales utilizing Wind Power Forecasting (WPF) is a critical issue in wind power trading and utilization. The WPF problem remains unresolved due to numerous influencing variables, such as wind speed, temperature, latitude, and longitude. Furthermore, achieving high prediction accuracy is crucial for maintaining electric grid stability and ensuring supply security. In this paper, we model all wind turbines within a wind farm as graph nodes in a graph built by their geographical locations. Accordingly, we propose an ensemble model based on graph neural networks and reinforcement learning (EMGRL) for WPF. Our approach includes: (1) applying graph neural networks to capture the time-series data from neighboring wind farms relevant to the target wind farm; (2) establishing a general state embedding that integrates the target wind farm's data with the historical performance of base models on the target wind farm; (3) ensembling and leveraging the advantages of all base models through an actor-critic reinforcement learning framework for WPF.

Via

Access Paper or Ask Questions

DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

Apr 18, 2024

Songtao Huang, Hongjin Song, Tianqi Jiang, Akbar Telikani, Jun Shen, Qingguo Zhou, Binbin Yong, Qiang Wu

Figure 1 for DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

Figure 2 for DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

Figure 3 for DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

Figure 4 for DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

Abstract:Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel in-depth feature representation, called Dynamic Spatio-Temporal (Dyn-ST) features, is introduced, which encapsulates spatial characteristics across varying times. Moreover, a Dynamic Spatio-Temporal Graph Transformer Network (DST-GTN) is proposed by capturing Dyn-ST features and other dynamic adjacency relations between intersections. The DST-GTN can model dynamic ST relationships between nodes accurately and refine the representation of global and local ST characteristics by adopting adaptive weights in low-pass and all-pass filters, enabling the extraction of Dyn-ST features from traffic time-series data. Through numerical experiments on public datasets, the DST-GTN achieves state-of-the-art performance for a range of traffic forecasting tasks and demonstrates enhanced stability.

Via

Access Paper or Ask Questions