Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Silvan Peter

How to Infer Repeat Structures in MIDI Performances

May 08, 2025

Silvan Peter, Patricia Hu, Gerhard Widmer

Figure 1 for How to Infer Repeat Structures in MIDI Performances

Figure 2 for How to Infer Repeat Structures in MIDI Performances

Abstract:MIDI performances are generally expedient in performance research and music information retrieval, and even more so if they can be connected to a score. This connection is usually established by means of alignment, linking either notes or time points between the score and the performance. The first obstacle when trying to establish such an alignment is that a performance realizes one (out of many) structural versions of the score that can plausibly result from instructions such as repeats, variations, and navigation markers like 'dal segno/da capo al coda'. A score needs to be unfolded, that is, its repeats and navigation markers need to be explicitly written out to create a single timeline without jumps matching the performance, before alignment algorithms can be applied. In the curation of large performance corpora this process is carried out manually, as no tools are available to infer the repeat structure of the performance. To ease this process, we develop a method to automatically infer the repeat structure of a MIDI performance, given a symbolically encoded score including repeat and navigation markers. The intuition guiding our design is: 1) local alignment of every contiguous section of the score with a section of a performance containing the same material should receive high alignment gain, whereas local alignment with any other performance section should accrue a low or zero gain. And 2) stitching local alignments together according to a valid structural version of the score should result in an approximate full alignment and correspondingly high global accumulated gain if the structural version corresponds to the performance, and low gain for all other, ill-fitting structural versions.

* 3 pages, 1 figure, 1 table, to be published in the Music Encoding Conference 2025

Via

Access Paper or Ask Questions

Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following

May 08, 2025

Silvan Peter, Patricia Hu, Gerhard Widmer

Figure 1 for Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following

Figure 2 for Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following

Figure 3 for Pairing Real-Time Piano Transcription with Symbol-level Tracking for Precise and Robust Score Following

Abstract:Real-time music tracking systems follow a musical performance and at any time report the current position in a corresponding score. Most existing methods approach this problem exclusively in the audio domain, typically using online time warping (OLTW) techniques on incoming audio and an audio representation of the score. Audio OLTW techniques have seen incremental improvements both in features and model heuristics which reached a performance plateau in the past ten years. We argue that converting and representing the performance in the symbolic domain -- thereby transforming music tracking into a symbolic task -- can be a more effective approach, even when the domain transformation is imperfect. Our music tracking system combines two real-time components: one handling audio-to-note transcription and the other a novel symbol-level tracker between transcribed input and score. We compare the performance of this mixed audio-symbolic approach with its equivalent audio-only counterpart, and demonstrate that our method outperforms the latter in terms of both precision, i.e., absolute tracking error, and robustness, i.e., tracking success.

* 5 pages, 3 tables, 2 pseudocodes, to be published at the Sound and Music Computing Conference 2025

Via

Access Paper or Ask Questions

Discrete Diffusion Probabilistic Models for Symbolic Music Generation

May 16, 2023

Matthias Plasser, Silvan Peter, Gerhard Widmer

Figure 1 for Discrete Diffusion Probabilistic Models for Symbolic Music Generation

Figure 2 for Discrete Diffusion Probabilistic Models for Symbolic Music Generation

Figure 3 for Discrete Diffusion Probabilistic Models for Symbolic Music Generation

Figure 4 for Discrete Diffusion Probabilistic Models for Symbolic Music Generation

Abstract:Denoising Diffusion Probabilistic Models (DDPMs) have made great strides in generating high-quality samples in both discrete and continuous domains. However, Discrete DDPMs (D3PMs) have yet to be applied to the domain of Symbolic Music. This work presents the direct generation of Polyphonic Symbolic Music using D3PMs. Our model exhibits state-of-the-art sample quality, according to current quantitative evaluation metrics, and allows for flexible infilling at the note level. We further show, that our models are accessible to post-hoc classifier guidance, widening the scope of possible applications. However, we also cast a critical view on quantitative evaluation of music sample quality via statistical metrics, and present a simple algorithm that can confound our metrics with completely spurious, non-musical samples.

* In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23), Macau, China

Via

Access Paper or Ask Questions

The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist

Apr 24, 2023

Carlos Cancino-Chacón, Silvan Peter, Patricia Hu, Emmanouil Karystinaios, Florian Henkel, Francesco Foscarin, Nimrod Varga, Gerhard Widmer

Abstract:This paper introduces the ACCompanion, an expressive accompaniment system. Similarly to a musician who accompanies a soloist playing a given musical piece, our system can produce a human-like rendition of the accompaniment part that follows the soloist's choices in terms of tempo, dynamics, and articulation. The ACCompanion works in the symbolic domain, i.e., it needs a musical instrument capable of producing and playing MIDI data, with explicitly encoded onset, offset, and pitch for each played note. We describe the components that go into such a system, from real-time score following and prediction to expressive performance generation and online adaptation to the expressive choices of the human player. Based on our experience with repeated live demonstrations in front of various audiences, we offer an analysis of the challenges of combining these components into a system that is highly reactive and precise, while still a reliable musical partner, robust to possible performance errors and responsive to expressive variations.

* Accepted for the Arts and Creativity track at the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23)

Via

Access Paper or Ask Questions