Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qijun Huang

Towards Hallucination-Free Music: A Reinforcement Learning Preference Optimization Framework for Reliable Song Generation

Aug 07, 2025

Huaicheng Zhang, Wei Tan, Guangzheng Li, Yixuan Zhang, Hangting Chen, Shun Lei, Chenyu Yang, Zhiyong Wu, Shuai Wang, Qijun Huang(+1 more)

Abstract:Recent advances in audio-based generative language models have accelerated AI-driven lyric-to-song generation. However, these models frequently suffer from content hallucination, producing outputs misaligned with the input lyrics and undermining musical coherence. Current supervised fine-tuning (SFT) approaches, limited by passive label-fitting, exhibit constrained self-improvement and poor hallucination mitigation. To address this core challenge, we propose a novel reinforcement learning (RL) framework leveraging preference optimization for hallucination control. Our key contributions include: (1) Developing a robust hallucination preference dataset constructed via phoneme error rate (PER) computation and rule-based filtering to capture alignment with human expectations; (2) Implementing and evaluating three distinct preference optimization strategies within the RL framework: Direct Preference Optimization (DPO), Proximal Policy Optimization (PPO), and Group Relative Policy Optimization (GRPO). DPO operates off-policy to enhance positive token likelihood, achieving a significant 7.4% PER reduction. PPO and GRPO employ an on-policy approach, training a PER-based reward model to iteratively optimize sequences via reward maximization and KL-regularization, yielding PER reductions of 4.9% and 4.7%, respectively. Comprehensive objective and subjective evaluations confirm that our methods effectively suppress hallucinations while preserving musical quality. Crucially, this work presents a systematic, RL-based solution to hallucination control in lyric-to-song generation. The framework's transferability also unlocks potential for music style adherence and musicality enhancement, opening new avenues for future generative song research.

Via

Access Paper or Ask Questions

A Novel Field-Free SOT Magnetic Tunnel Junction With Local VCMA-Induced Switching

Dec 24, 2023

Rui Zhou, Haiyang Zhang, Hao Wang, Jin He, Qijun Huang, Sheng Chang

Abstract:By integrating the local voltage-controlled magnetic anisotropy (VCMA) effect, Dzyaloshinskii-Moriya interaction (DMI) effect, and spin-orbit torque (SOT) effect, we propose a novel device structure for field-free magnetic tunnel junction (MTJ). Micromagnetic simulation shows that the device utilizes the chiral symmetry breaking caused by the DMI effect to induce a non-collinear spin texture under the influence of SOT current. This, combined with the perpendicular magnetic anisotropy (PMA) gradient generated by the local VCMA effect, enables deterministic switching of the MTJ state without an external field. The impact of variations in DMI strength and PMA gradient on the magnetization dynamics is analyzed.

Via

Access Paper or Ask Questions

Non-Spike Timing-Dependent Plasticity based Unsupervised Memristive Neural Networks with High Hardware Compatibility

Jul 23, 2019

Zhiri Tang, Yanhua Chen, Hao Wang, Jin He, Qijun Huang, Sheng Chang

Figure 1 for Non-Spike Timing-Dependent Plasticity based Unsupervised Memristive Neural Networks with High Hardware Compatibility

Figure 2 for Non-Spike Timing-Dependent Plasticity based Unsupervised Memristive Neural Networks with High Hardware Compatibility

Figure 3 for Non-Spike Timing-Dependent Plasticity based Unsupervised Memristive Neural Networks with High Hardware Compatibility

Figure 4 for Non-Spike Timing-Dependent Plasticity based Unsupervised Memristive Neural Networks with High Hardware Compatibility

Abstract:With the development of research on memristor, memristive neural networks (MNNs) have become a hot research topic recently. Because memristor can mimic the spike timing-dependent plasticity (STDP), the research on STDP based MNNs is rapidly increasing. However, although state-of-the-art works on STDP based MNNs have many applications such as pattern recognition, STDP mechanism brings relatively complex hardware framework and low processing speed, which block MNNs' hardware realization. A non-STDP based unsupervised MNN is constructed in this paper. Through the comparison with STDP method on the basis of two common structures including feedforward and crossbar, non-STDP based MNNs not only remain the same advantages as STDP based MNNs including high accuracy and convergence speed in pattern recognition, but also better hardware performance as few hardware resources and higher processing speed. By virtue of the combination of memristive character and simple mechanism, non-STDP based MNNs have better hardware compatibility, which may give a new viewpoint for memristive neural networks' engineering applications.

* 9 pages, 10 figures

Via

Access Paper or Ask Questions

The MBPEP: a deep ensemble pruning algorithm providing high quality uncertainty prediction

Feb 25, 2019

Ruihan Hu, Qijun Huang, Sheng Chang, Hao Wang, Jin He

Figure 1 for The MBPEP: a deep ensemble pruning algorithm providing high quality uncertainty prediction

Figure 2 for The MBPEP: a deep ensemble pruning algorithm providing high quality uncertainty prediction

Figure 3 for The MBPEP: a deep ensemble pruning algorithm providing high quality uncertainty prediction

Figure 4 for The MBPEP: a deep ensemble pruning algorithm providing high quality uncertainty prediction

Abstract:Machine learning algorithms have been effectively applied into various real world tasks. However, it is difficult to provide high-quality machine learning solutions to accommodate an unknown distribution of input datasets; this difficulty is called the uncertainty prediction problems. In this paper, a margin-based Pareto deep ensemble pruning (MBPEP) model is proposed. It achieves the high-quality uncertainty estimation with a small value of the prediction interval width (MPIW) and a high confidence of prediction interval coverage probability (PICP) by using deep ensemble networks. In addition to these networks, unique loss functions are proposed, and these functions make the sub-learners available for standard gradient descent learning. Furthermore, the margin criterion fine-tuning-based Pareto pruning method is introduced to optimize the ensembles. Several experiments including predicting uncertainties of classification and regression are conducted to analyze the performance of MBPEP. The experimental results show that MBPEP achieves a small interval width and a low learning error with an optimal number of ensembles. For the real-world problems, MBPEP performs well on input datasets with unknown distributions datasets incomings and improves learning performance on a multi task problem when compared to that of each single model.

* Applied Intelligence(2019)
* 20 pages, 7 figures

Via

Access Paper or Ask Questions

A Hardware Friendly Unsupervised Memristive Neural Network with Weight Sharing Mechanism

Jan 01, 2019

Zhiri Tang, Ruohua Zhu, Peng Lin, Jin He, Hao Wang, Qijun Huang, Sheng Chang, Qiming Ma

Figure 1 for A Hardware Friendly Unsupervised Memristive Neural Network with Weight Sharing Mechanism

Figure 2 for A Hardware Friendly Unsupervised Memristive Neural Network with Weight Sharing Mechanism

Figure 3 for A Hardware Friendly Unsupervised Memristive Neural Network with Weight Sharing Mechanism

Figure 4 for A Hardware Friendly Unsupervised Memristive Neural Network with Weight Sharing Mechanism

Abstract:Memristive neural networks (MNNs), which use memristors as neurons or synapses, have become a hot research topic recently. However, most memristors are not compatible with mainstream integrated circuit technology and their stabilities in large-scale are not very well so far. In this paper, a hardware friendly MNN circuit is introduced, in which the memristive characteristics are implemented by digital integrated circuit. Through this method, spike timing dependent plasticity (STDP) and unsupervised learning are realized. A weight sharing mechanism is proposed to bridge the gap of network scale and hardware resource. Experiment results show the hardware resource is significantly saved with it, maintaining good recognition accuracy and high speed. Moreover, the tendency of resource increase is slower than the expansion of network scale, which infers our method's potential on large scale neuromorphic network's realization.

* Neurocomputing 2019
* 10 pages, 11 figures

Via

Access Paper or Ask Questions