Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pau Torras

A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

May 18, 2026

Pau Torras, Jiří Mayer, Carles Badal, Martina Dvořáková, Markéta Herzanová Vlková, Gerard Asbert, Vojtěch Dvořák, Samuel Šomorjai, Jan Hajič, Alicia Fornés

Abstract:A large amount of musical heritage has been digitised by memory institutions: libraries, museums, and archives. Nevertheless, the field of Optical Music Recognition (OMR) has struggled with making this music machine-readable, despite advances in deep learning, mostly because no datasets for training systems in realistic conditions were available. The MusiCorpus dataset aims to remedy this situation by providing 1,309 pages of historical sheet music, primarily handwritten, with MusicXML transcriptions and symbol annotations. It is the largest dataset of handwritten music to date and the first dataset containing a realistic and representative sample of musical document collections from memory institutions, suitable for training and evaluating both end-to-end and object detection-based OMR systems and comparing their performance.

* Under review at Scientific Data

Via

Access Paper or Ask Questions

Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers

Oct 29, 2024

Martín Méndez, Pau Torras, Adrià Molina, Jialuo Chen, Oriol Ramos-Terrades, Alicia Fornés

Figure 1 for Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers

Figure 2 for Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers

Figure 3 for Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers

Figure 4 for Structured Analysis and Comparison of Alphabets in Historical Handwritten Ciphers

Abstract:Historical ciphered manuscripts are documents that were typically used in sensitive communications within military and diplomatic contexts or among members of secret societies. These secret messages were concealed by inventing a method of writing employing symbols from diverse sources such as digits, alchemy signs and Latin or Greek characters. When studying a new, unseen cipher, the automatic search and grouping of ciphers with a similar alphabet can aid the scholar in its transcription and cryptanalysis because it indicates a probability that the underlying cipher is similar. In this study, we address this need by proposing the CSI metric, a novel way of comparing pairs of ciphered documents. We assess their effectiveness in an unsupervised clustering scenario utilising visual features, including SIFT, pre-trained learnt embeddings, and OCR descriptors.

* Acccepted at ECCV24 Workshop AI4DH

Via

Access Paper or Ask Questions

The Common Optical Music Recognition Evaluation Framework

Dec 20, 2023

Pau Torras, Sanket Biswas, Alicia Fornés

Abstract:The quality of Optical Music Recognition (OMR) systems is a rather difficult magnitude to measure. There is no lingua franca shared among OMR datasets that allows to compare systems' performance on equal grounds, since most of them are specialised on certain approaches. As a result, most state-of-the-art works currently report metrics that cannot be compared directly. In this paper we identify the need of a common music representation language and propose the Music Tree Notation (MTN) format, thanks to which the definition of standard metrics is possible. This format represents music as a set of primitives that group together into higher-abstraction nodes, a compromise between the expression of fully graph-based and sequential notation formats. We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.

* 18 pages, 4 figures, 3 tables, submitted (under review) for the International Journal in Document Analysis and Recognition

Via

Access Paper or Ask Questions