Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fares Schulz

Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Oct 05, 2025

Christian Limberg, Fares Schulz, Zhe Zhang, Stefan Weinzierl

Figure 1 for Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Figure 2 for Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Figure 3 for Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Figure 4 for Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Abstract:This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework capable of generating pitch-accurate, high-quality music samples from an expressive timbre latent space. Existing approaches that achieve sufficient quality for music production often rely on high-dimensional latent representations that are difficult to navigate and provide unintuitive user experiences. We address this limitation through a two-stage training paradigm: first, we train a pitch-timbre disentangled 2D representation of audio samples using a Variational Autoencoder; second, we use this representation as conditioning input for a Transformer-based generative model. The learned 2D latent space serves as an intuitive interface for navigating and exploring the sound landscape. We demonstrate that the proposed method effectively learns a disentangled timbre space, enabling expressive and controllable audio generation with reliable pitch conditioning. Experimental results show the model's ability to capture subtle variations in timbre while maintaining a high degree of pitch accuracy. The usability of our method is demonstrated in an interactive web application, highlighting its potential as a step towards future music production environments that are both intuitive and creatively empowering: https://pgesam.faresschulz.com

* 8 pages, accepted to the Proceedings of the 28-th Int. Conf. on Digital Audio Effects (DAFx25) - demo: https://pgesam.faresschulz.com

Via

Access Paper or Ask Questions

ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications

Jun 14, 2025

Valentin Ackva, Fares Schulz

Abstract:Numerous tools for neural network inference are currently available, yet many do not meet the requirements of real-time audio applications. In response, we introduce anira, an efficient cross-platform library. To ensure compatibility with a broad range of neural network architectures and frameworks, anira supports ONNX Runtime, LibTorch, and TensorFlow Lite as backends. Each inference engine exhibits real-time violations, which anira mitigates by decoupling the inference from the audio callback to a static thread pool. The library incorporates built-in latency management and extensive benchmarking capabilities, both crucial to ensure a continuous signal flow. Three different neural network architectures for audio effect emulation are then subjected to benchmarking across various configurations. Statistical modeling is employed to identify the influence of various factors on performance. The findings indicate that for stateless models, ONNX Runtime exhibits the lowest runtimes. For stateful models, LibTorch demonstrates the fastest performance. Our results also indicate that for certain model-engine combinations, the initial inferences take longer, particularly when these inferences exhibit a higher incidence of real-time violations.

* 8 pages, accepted to the Proceedings of the 5th IEEE International Symposium on the Internet of Sounds (2024) - repository: github.com/anira-project/anira

Via

Access Paper or Ask Questions