Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Teodora Răgman

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools for Romanian

Mar 25, 2026

Teodora Răgman, Adrian Bogdan Stânea, Horia Cucu, Adriana Stan

Abstract:Open-source text-to-speech (TTS) frameworks have emerged as highly adaptable platforms for developing speech synthesis systems across a wide range of languages. However, their applicability is not uniform -- particularly when the target language is under-resourced or when computational resources are constrained. In this study, we systematically assess the feasibility of building novel TTS models using four widely adopted open-source architectures: FastPitch, VITS, Grad-TTS, and Matcha-TTS. Our evaluation spans multiple dimensions, including qualitative aspects such as ease of installation, dataset preparation, and hardware requirements, as well as quantitative assessments of synthesis quality for Romanian. We employ both objective metrics and subjective listening tests to evaluate intelligibility, speaker similarity, and naturalness of the generated speech. The results reveal significant challenges in tool chain setup, data preprocessing, and computational efficiency, which can hinder adoption in low-resource contexts. By grounding the analysis in reproducible protocols and accessible evaluation criteria, this work aims to inform best practices and promote more inclusive, language-diverse TTS development. All information needed to reproduce this study (i.e. code and data) are available in our git repository: https://gitlab.com/opentts_ragman/OpenTTS

* Published in IEEE Access

Via

Access Paper or Ask Questions

Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch

Oct 09, 2024

Teodora Răgman, Adriana Stan

Figure 1 for Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch

Figure 2 for Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch

Figure 3 for Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch

Figure 4 for Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch

Abstract:This paper focuses on adapting the functionalities of the FastPitch model to the Romanian language; extending the set of speakers from one to eighteen; synthesising speech using an anonymous identity; and replicating the identities of new, unseen speakers. During this work, the effects of various configurations and training strategies were tested and discussed, along with their advantages and weaknesses. Finally, we settled on a new configuration, built on top of the FastPitch architecture, capable of producing natural speech synthesis, for both known (identities from the training dataset) and unknown (identities learnt through short reference samples) speakers. The anonymous speaker can be used for text-to-speech synthesis, if one wants to cancel out the identity information while keeping the semantic content whole and clear. At last, we discussed possible limitations of our work, which will form the basis for future investigations and advancements.

* Accepted at 2024 IEEE 20th International Conference on Intelligent Computer Communication and Processing (ICCP 2024)

Via

Access Paper or Ask Questions