Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Caterina Mauri

Coconstructions in spoken data: UD annotation guidelines and first results

Mar 30, 2026

Ludovica Pannitto, Sylvain Kahane, Kaja Dobrovoljc, Elena Battaglia, Bruno Guillaume, Caterina Mauri, Eleonora Zucchini

Abstract:The paper proposes annotation guidelines for syntactic dependencies that span across speaker turns - including collaborative coconstructions proper, wh-question answers, and backchannels - in spoken language treebanks within the Universal Dependencies framework. Two representations are proposed: a speaker-based representation following the segmentation into speech turns, and a dependency-based representation with dependencies across speech turns. New propositions are also put forward to distinguish between reformulations and repairs, and to promote elements in unfinished phrases.

Via

Access Paper or Ask Questions

Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus

Mar 17, 2026

Martina Simonotti, Ludovica Pannitto, Eleonora Zucchini, Silvia Ballarè, Caterina Mauri

Abstract:This paper analyses the implementation of Automatic Speech Recognition (ASR) into the transcription workflow of the KIParla corpus, a resource of spoken Italian. Through a two-phase experiment, 11 expert and novice transcribers produced both manual and ASR-assisted transcriptions of identical audio segments across three different types of conversation, which were subsequently analyzed through a combination of statistical modeling, word-level alignment and a series of annotation-based metrics. Results show that ASR-assisted workflows can increase transcription speed but do not consistently improve overall accuracy, with effects depending on multiple factors such as workflow configuration, conversation type and annotator experience. Analyses combining alignment-based metrics, descriptive statistics and statistical modeling provide a systematic framework to monitor transcription behavior across annotators and workflows. Despite limitations, ASR-assisted transcription, potentially supported by task-specific fine-tuning, could be integrated into the KIParla transcription workflow to accelerate corpus creation without compromising transcription quality.

Via

Access Paper or Ask Questions

The KIPARLA Forest treebank of spoken Italian: an overview of initial design choices

Nov 10, 2024

Ludovica Pannitto, Caterina Mauri

Abstract:The paper presents an overview of initial design choices discussed towards the creation of a treebank for the Italian KIParla corpus

Via

Access Paper or Ask Questions

Did somebody say "Gest-IT"? A pilot exploration of multimodal data management

Oct 21, 2024

Ludovica Pannitto, Lorenzo Albanesi, Laura Marion, Federica Maria Martines, Carmelo Caruso, Claudia S. Bianchini, Francesca Masini, Caterina Mauri

Figure 1 for Did somebody say "Gest-IT"? A pilot exploration of multimodal data management

Figure 2 for Did somebody say "Gest-IT"? A pilot exploration of multimodal data management

Figure 3 for Did somebody say "Gest-IT"? A pilot exploration of multimodal data management

Figure 4 for Did somebody say "Gest-IT"? A pilot exploration of multimodal data management

Abstract:The paper presents a pilot exploration of the construction, management and analysis of a multimodal corpus. Through a three-layer annotation that provides orthographic, prosodic, and gestural transcriptions, the Gest-IT resource allows to investigate the variation of gesture-making patterns in conversations between sighted people and people with visual impairment. After discussing the transcription methods and technical procedures employed in our study, we propose a unified CoNLL-U corpus and indicate our future steps

Via

Access Paper or Ask Questions