Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuze Zhou

Preconditioned Discrete-HAMS: A Second-order Irreversible Discrete Sampler

Jul 29, 2025

Yuze Zhou, Zhiqiang Tan

Abstract:Gradient-based Markov Chain Monte Carlo methods have recently received much attention for sampling discrete distributions, with notable examples such as Norm Constrained Gradient (NCG), Auxiliary Variable Gradient (AVG), and Discrete Hamiltonian Assisted Metropolis Sampling (DHAMS). In this work, we propose the Preconditioned Discrete-HAMS (PDHAMS) algorithm, which extends DHAMS by incorporating a second-order, quadratic approximation of the potential function, and uses Gaussian integral trick to avoid directly sampling a pairwise Markov random field. The PDHAMS sampler not only satisfies generalized detailed balance, hence enabling irreversible sampling, but also is a rejection-free property for a target distribution with a quadratic potential function. In various numerical experiments, PDHAMS algorithms consistently yield superior performance compared with other methods.

Via

Access Paper or Ask Questions

Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Jul 16, 2025

Yichen Han, Xiaoyang Hao, Keming Chen, Weibo Xiong, Jun He, Ruonan Zhang, Junjie Cao, Yue Liu, Bowen Li, Dongrui Zhang(+18 more)

Figure 1 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Figure 2 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Figure 3 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Figure 4 for Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Abstract:Text-to-speech (TTS) synthesis has seen renewed progress under the discrete modeling paradigm. Existing autoregressive approaches often rely on single-codebook representations, which suffer from significant information loss. Even with post-hoc refinement techniques such as flow matching, these methods fail to recover fine-grained details (e.g., prosodic nuances, speaker-specific timbres), especially in challenging scenarios like singing voice or music synthesis. We propose QTTS, a novel TTS framework built upon our new audio codec, QDAC. The core innovation of QDAC lies in its end-to-end training of an ASR-based auto-regressive network with a GAN, which achieves superior semantic feature disentanglement for scalable, near-lossless compression. QTTS models these discrete codes using two innovative strategies: the Hierarchical Parallel architecture, which uses a dual-AR structure to model inter-codebook dependencies for higher-quality synthesis, and the Delay Multihead approach, which employs parallelized prediction with a fixed delay to accelerate inference speed. Our experiments demonstrate that the proposed framework achieves higher synthesis quality and better preserves expressive content compared to baseline. This suggests that scaling up compression via multi-codebook modeling is a promising direction for high-fidelity, general-purpose speech and audio generation.

Via

Access Paper or Ask Questions