Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maureen de Seyssel

Toward Machine Interpreting: Lessons from Human Interpreting Studies

Aug 11, 2025

Matthias Sperber, Maureen de Seyssel, Jiajun Bao, Matthias Paulik

Abstract:Current speech translation systems, while having achieved impressive accuracies, are rather static in their behavior and do not adapt to real-world situations in ways human interpreters do. In order to improve their practical usefulness and enable interpreting-like experiences, a precise understanding of the nature of human interpreting is crucial. To this end, we discuss human interpreting literature from the perspective of the machine translation field, while considering both operational and qualitative aspects. We identify implications for the development of speech translation systems and argue that there is great potential to adopt many human interpreting principles using recent modeling techniques. We hope that our findings provide inspiration for closing the perceived usability gap, and can motivate progress toward true machine interpreting.

Via

Access Paper or Ask Questions

Assessing the Role of Data Quality in Training Bilingual Language Models

Jun 15, 2025

Skyler Seto, Maartje ter Hoeve, Maureen de Seyssel, David Grangier

Abstract:Bilingual and multilingual language models offer a promising path toward scaling NLP systems across diverse languages and users. However, their performance often varies wildly between languages as prior works show that adding more languages can degrade performance for some languages (such as English), while improving others (typically more data constrained languages). In this work, we investigate causes of these inconsistencies by comparing bilingual and monolingual language models. Our analysis reveals that unequal data quality, not just data quantity, is a major driver of performance degradation in bilingual settings. We propose a simple yet effective data filtering strategy to select higher-quality bilingual training data with only high quality English data. Applied to French, German, and Chinese, our approach improves monolingual performance by 2-4% and reduces bilingual model performance gaps to 1%. These results highlight the overlooked importance of data quality in multilingual pretraining and offer a practical recipe for balancing performance.

* 26 pages, 18 figures, 25 tables

Via

Access Paper or Ask Questions

Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks

May 23, 2025

Maureen de Seyssel, Jie Chi, Skyler Seto, Maartje ter Hoeve, Masha Fedzechkina, Natalie Schluter

Abstract:We introduce a set of training-free ABX-style discrimination tasks to evaluate how multilingual language models represent language identity (form) and semantic content (meaning). Inspired from speech processing, these zero-shot tasks measure whether minimal differences in representation can be reliably detected. This offers a flexible and interpretable alternative to probing. Applied to XLM-R (Conneau et al, 2020) across pretraining checkpoints and layers, we find that language discrimination declines over training and becomes concentrated in lower layers, while meaning discrimination strengthens over time and stabilizes in deeper layers. We then explore probing tasks, showing some alignment between our metrics and linguistic learning performance. Our results position ABX tasks as a lightweight framework for analyzing the structure of multilingual representations.

Via

Access Paper or Ask Questions

The Role of Prosody in Spoken Question Answering

Feb 08, 2025

Jie Chi, Maureen de Seyssel, Natalie Schluter

Figure 1 for The Role of Prosody in Spoken Question Answering

Figure 2 for The Role of Prosody in Spoken Question Answering

Figure 3 for The Role of Prosody in Spoken Question Answering

Figure 4 for The Role of Prosody in Spoken Question Answering

Abstract:Spoken language understanding research to date has generally carried a heavy text perspective. Most datasets are derived from text, which is then subsequently synthesized into speech, and most models typically rely on automatic transcriptions of speech. This is to the detriment of prosody--additional information carried by the speech signal beyond the phonetics of the words themselves and difficult to recover from text alone. In this work, we investigate the role of prosody in Spoken Question Answering. By isolating prosodic and lexical information on the SLUE-SQA-5 dataset, which consists of natural speech, we demonstrate that models trained on prosodic information alone can perform reasonably well by utilizing prosodic cues. However, we find that when lexical information is available, models tend to predominantly rely on it. Our findings suggest that while prosodic cues provide valuable supplementary information, more effective integration methods are required to ensure prosody contributes more significantly alongside lexical features.

* accepted to NAACL 2025 Findings

Via

Access Paper or Ask Questions

EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

Dec 21, 2023

Maureen de Seyssel, Antony D'Avirro, Adina Williams, Emmanuel Dupoux

Abstract:We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level.

Via

Access Paper or Ask Questions

ProsAudit, a prosodic benchmark for self-supervised speech models

Feb 24, 2023

Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, Emmanuel Dupoux

Figure 1 for ProsAudit, a prosodic benchmark for self-supervised speech models

Figure 2 for ProsAudit, a prosodic benchmark for self-supervised speech models

Figure 3 for ProsAudit, a prosodic benchmark for self-supervised speech models

Figure 4 for ProsAudit, a prosodic benchmark for self-supervised speech models

Abstract:We present ProsAudit, a benchmark in English to assess structural prosodic knowledge in self-supervised learning (SSL) speech models. It consists of two subtasks, their corresponding metrics, an evaluation dataset. In the protosyntax task, the model must correctly identify strong versus weak prosodic boundaries. In the lexical task, the model needs to correctly distinguish between pauses inserted between words and within words. We also provide human evaluation scores on this benchmark. We evaluated a series of SSL models and found that they were all able to perform above chance on both tasks, even when trained on an unseen language. However, non-native models performed significantly worse than native ones on the lexical task, highlighting the importance of lexical knowledge in this task. We also found a clear effect of size with models trained on more data performing better in the two subtasks.

* 4 pages + references, 1 figure

Via

Access Paper or Ask Questions

Are word boundaries useful for unsupervised language learning?

Oct 06, 2022

Tu Anh Nguyen, Maureen de Seyssel, Robin Algayres, Patricia Roze, Ewan Dunbar, Emmanuel Dupoux

Figure 1 for Are word boundaries useful for unsupervised language learning?

Figure 2 for Are word boundaries useful for unsupervised language learning?

Figure 3 for Are word boundaries useful for unsupervised language learning?

Figure 4 for Are word boundaries useful for unsupervised language learning?

Abstract:Word or word-fragment based Language Models (LM) are typically preferred over character-based ones in many downstream applications. This may not be surprising as words seem more linguistically relevant units than characters. Words provide at least two kinds of relevant information: boundary information and meaningful units. However, word boundary information may be absent or unreliable in the case of speech input (word boundaries are not marked explicitly in the speech stream). Here, we systematically compare LSTMs as a function of the input unit (character, phoneme, word, word part), with or without gold boundary information. We probe linguistic knowledge in the networks at the lexical, syntactic and semantic levels using three speech-adapted black box NLP psycholinguistically-inpired benchmarks (pWUGGY, pBLIMP, pSIMI). We find that the absence of boundaries costs between 2\% and 28\% in relative performance depending on the task. We show that gold boundaries can be replaced by automatically found ones obtained with an unsupervised segmentation algorithm, and that even modest segmentation performance gives a gain in performance on two of the three tasks compared to basic character/phone based models without boundary information.

* This is an archived version from September 2020

Via

Access Paper or Ask Questions

Is the Language Familiarity Effect gradual? A computational modelling approach

Jun 27, 2022

Maureen de Seyssel, Guillaume Wisniewski, Emmanuel Dupoux

Figure 1 for Is the Language Familiarity Effect gradual? A computational modelling approach

Figure 2 for Is the Language Familiarity Effect gradual? A computational modelling approach

Figure 3 for Is the Language Familiarity Effect gradual? A computational modelling approach

Figure 4 for Is the Language Familiarity Effect gradual? A computational modelling approach

Abstract:According to the Language Familiarity Effect (LFE), people are better at discriminating between speakers of their native language. Although this cognitive effect was largely studied in the literature, experiments have only been conducted on a limited number of language pairs and their results only show the presence of the effect without yielding a gradual measure that may vary across language pairs. In this work, we show that the computational model of LFE introduced by Thorburn, Feldmand and Schatz (2019) can address these two limitations. In a first experiment, we attest to this model's capacity to obtain a gradual measure of the LFE by replicating behavioural findings on native and accented speech. In a second experiment, we evaluate LFE on a large number of language pairs, including many which have never been tested on humans. We show that the effect is replicated across a wide array of languages, providing further evidence of its universality. Building on the gradual measure of LFE, we also show that languages belonging to the same family yield smaller scores, supporting the idea of an effect of language distance on LFE.

* 8 pages, 2 figures, accepted at CogSci 2022

Via

Access Paper or Ask Questions

Probing phoneme, language and speaker information in unsupervised speech representations

Mar 30, 2022

Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski

Figure 1 for Probing phoneme, language and speaker information in unsupervised speech representations

Figure 2 for Probing phoneme, language and speaker information in unsupervised speech representations

Figure 3 for Probing phoneme, language and speaker information in unsupervised speech representations

Figure 4 for Probing phoneme, language and speaker information in unsupervised speech representations

Abstract:Unsupervised models of representations based on Contrastive Predictive Coding (CPC)[1] are primarily used in spoken language modelling in that they encode phonetic information. In this study, we ask what other types of information are present in CPC speech representations. We focus on three categories: phone class, gender and language, and compare monolingual and bilingual models. Using qualitative and quantitative tools, we find that both gender and phone class information are present in both types of models. Language information, however, is very salient in the bilingual model only, suggesting CPC models learn to discriminate languages when trained on multiple languages. Some language information can also be retrieved from monolingual models, but it is more diffused across all features. These patterns hold when analyses are carried on the discrete units from a downstream clustering model. However, although there is no effect of the number of target clusters on phone class and language information, more gender information is encoded with more clusters. Finally, we find that there is some cost to being exposed to two languages on a downstream phoneme discrimination task.

* Submitted to INTERSPEECH 2022, 5 pages, 2 figures

Via

Access Paper or Ask Questions

The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling

Apr 29, 2021

Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux

Figure 1 for The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling

Figure 2 for The Interspeech Zero Resource Speech Challenge 2021: Spoken language modelling

Abstract:We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels. The challenge is based on the Libri-light dataset, which provides up to 60k hours of audio from English audio books without any associated text. We provide a pipeline baseline system consisting on an encoder based on contrastive predictive coding (CPC), a quantizer ($k$-means) and a standard language model (BERT or LSTM). The metrics evaluate the learned representations at the acoustic (ABX discrimination), lexical (spot-the-word), syntactic (acceptability judgment) and semantic levels (similarity judgment). We present an overview of the eight submitted systems from four groups and discuss the main results.

* Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.11588

Via

Access Paper or Ask Questions