The Irish language is rich in its diversity of dialects and accents. This compounds the difficulty of creating a speech recognition system for the low-resource language, as such a system must contend with a high degree of variability with limited corpora. A recent study investigating dialect bias in Irish ASR found that balanced training corpora gave rise to unequal dialect performance, with performance for the Ulster dialect being consistently worse than for the Connacht or Munster dialects. Motivated by this, the present experiments investigate spoken dialect identification of Irish, with a view to incorporating such a system into the speech recognition pipeline. Two acoustic classification models are tested, XLS-R and ECAPA-TDNN, in conjunction with a text-based classifier using a pretrained Irish-language BERT model. The ECAPA-TDNN, particularly a model pretrained for language identification on the VoxLingua107 dataset, performed best overall, with an accuracy of 73%. This was further improved to 76% by fusing the model's outputs with the text-based model. The Ulster dialect was most accurately identified, with an accuracy of 94%, however the model struggled to disambiguate between the Connacht and Munster dialects, suggesting a more nuanced approach may be necessary to robustly distinguish between the dialects of Irish.
ASR systems are generally built for the spoken 'standard', and their performance declines for non-standard dialects/varieties. This is a problem for a language like Irish, where there is no single spoken standard, but rather three major dialects: Ulster (Ul), Connacht (Co) and Munster (Mu). As a diagnostic to quantify the effect of the speaker's dialect on recognition performance, 12 ASR systems were trained, firstly using baseline dialect-balanced training corpora, and then using modified versions of the baseline corpora, where dialect-specific materials were either subtracted or added. Results indicate that dialect-balanced corpora do not yield a similar performance across the dialects: the Ul dialect consistently underperforms, whereas Mu yields lowest WERs. There is a close relationship between Co and Mu dialects, but one that is not symmetrical. These results will guide future corpus collection and system building strategies to optimise for cross-dialect performance equity.