Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Antonios Anastasopoulos

Archimedes, Athena Research Center, Greece, Department of Computer Science, George Mason University

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Sep 27, 2023

Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe

Figure 1 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Figure 2 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Figure 3 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Figure 4 for Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

Abstract:Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation. In this work, we instead propose a ST/MT multi-tasking framework with hard parameter sharing in which all model parameters are shared cross-modally. Our method reduces the speech-text modality gap via a pre-processing stage which converts speech and text inputs into two discrete token sequences of similar length -- this allows models to indiscriminately process both modalities simply using a joint vocabulary. With experiments on MuST-C, we demonstrate that our multi-tasking framework improves attentional encoder-decoder, Connectionist Temporal Classification (CTC), transducer, and joint CTC/attention models by an average of +0.5 BLEU without any external MT data. Further, we show that this framework incorporates external MT data, yielding +0.8 BLEU, and also improves transfer learning from pre-trained textual models, yielding +1.8 BLEU.

Via

Access Paper or Ask Questions

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Sep 27, 2023

Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur

Figure 1 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Figure 2 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Figure 3 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Figure 4 for Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

Abstract:Incorporating longer context has been shown to benefit machine translation, but the inclusion of context in end-to-end speech translation (E2E-ST) remains under-studied. To bridge this gap, we introduce target language context in E2E-ST, enhancing coherence and overcoming memory constraints of extended audio segments. Additionally, we propose context dropout to ensure robustness to the absence of context, and further improve performance by adding speaker information. Our proposed contextual E2E-ST outperforms the isolated utterance-based E2E-ST approach. Lastly, we demonstrate that in conversational speech, contextual information primarily contributes to capturing context style, as well as resolving anaphora and named entities.

Via

Access Paper or Ask Questions

Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

Jun 13, 2023

Claytone Sikasote, Kalinda Siaminwe, Stanly Mwape, Bangiwe Zulu, Mofya Phiri, Martin Phiri, David Zulu, Mayumbo Nyirenda, Antonios Anastasopoulos

Figure 1 for Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

Figure 2 for Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

Figure 3 for Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

Figure 4 for Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages

Abstract:This work introduces Zambezi Voice, an open-source multilingual speech resource for Zambian languages. It contains two collections of datasets: unlabelled audio recordings of radio news and talk shows programs (160 hours) and labelled data (over 80 hours) consisting of read speech recorded from text sourced from publicly available literature books. The dataset is created for speech recognition but can be extended to multilingual speech processing research for both supervised and unsupervised learning approaches. To our knowledge, this is the first multilingual speech dataset created for Zambian languages. We exploit pretraining and cross-lingual transfer learning by finetuning the Wav2Vec2.0 large-scale multilingual pre-trained model to build end-to-end (E2E) speech recognition models for our baseline models. The dataset is released publicly under a Creative Commons BY-NC-ND 4.0 license and can be accessed via https://github.com/unza-speech-lab/zambezi-voice .

* Accepted at INTERSPEECH 2023. This pre-print version differs slightly from the version accepted to INTERSPEECH 2023: Figure 1 is not included in INTERSPEECH 2023!

Via

Access Paper or Ask Questions

BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

May 26, 2023

Claytone Sikasote, Eunice Mukonde, Md Mahfuz Ibn Alam, Antonios Anastasopoulos

Figure 1 for BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

Figure 2 for BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

Figure 3 for BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

Figure 4 for BIG-C: a Multimodal Multi-Purpose Dataset for Bemba

Abstract:We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language processing research almost impossible. The dataset is comprised of multi-turn dialogues between Bemba speakers based on images, transcribed and translated into English. There are more than 92,000 utterances/sentences, amounting to more than 180 hours of audio data with corresponding transcriptions and English translations. We also provide baselines on speech recognition (ASR), machine translation (MT) and speech translation (ST) tasks, and sketch out other potential future multimodal uses of our dataset. We hope that by making the dataset available to the research community, this work will foster research and encourage collaboration across the language, speech, and vision communities especially for languages outside the "traditionally" used high-resourced ones. All data and code are publicly available: https://github.com/csikasote/bigc.

* accepted to ACL 2023

Via

Access Paper or Ask Questions

CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

May 26, 2023

Md Mahfuz Ibn Alam, Sina Ahmadi, Antonios Anastasopoulos

Figure 1 for CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Figure 2 for CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Figure 3 for CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Figure 4 for CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Abstract:Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release \dataset, a contrastive dialectal benchmark encompassing 882 different variations from nine different languages. We also quantitatively demonstrate the challenges large MT models face in effectively translating dialectal variants. We are releasing all code and data.

Via

Access Paper or Ask Questions

Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities

May 25, 2023

Sina Ahmadi, Antonios Anastasopoulos

Abstract:The wide accessibility of social media has provided linguistically under-represented communities with an extraordinary opportunity to create content in their native languages. This, however, comes with certain challenges in script normalization, particularly where the speakers of a language in a bilingual community rely on another script or orthography to write their native language. This paper addresses the problem of script normalization for several such languages that are mainly written in a Perso-Arabic script. Using synthetic data with various levels of noise and a transformer-based model, we demonstrate that the problem can be effectively remediated. We conduct a small-scale evaluation of real data as well. Our experiments indicate that script normalization is also beneficial to improve the performance of downstream tasks such as machine translation and language identification.

* To appear in the proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)

Via

Access Paper or Ask Questions

GlobalBench: A Benchmark for Global Progress in Natural Language Processing

May 24, 2023

Yueqi Song, Catherine Cui, Simran Khanuja, Pengfei Liu, Fahim Faisal, Alissa Ostapenko, Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Yulia Tsvetkov(+2 more)

Figure 1 for GlobalBench: A Benchmark for Global Progress in Natural Language Processing

Figure 2 for GlobalBench: A Benchmark for Global Progress in Natural Language Processing

Figure 3 for GlobalBench: A Benchmark for Global Progress in Natural Language Processing

Figure 4 for GlobalBench: A Benchmark for Global Progress in Natural Language Processing

Abstract:Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist. Arguably, these are due to uneven resource allocation and sub-optimal incentives to work on less resourced languages. To track and further incentivize the global development of equitable language technology, we introduce GlobalBench. Prior multilingual benchmarks are static and have focused on a limited number of tasks and languages. In contrast, GlobalBench is an ever-expanding collection that aims to dynamically track progress on all NLP datasets in all languages. Rather than solely measuring accuracy, GlobalBench also tracks the estimated per-speaker utility and equity of technology across all languages, providing a multi-faceted view of how language technology is serving people of the world. Furthermore, GlobalBench is designed to identify the most under-served languages, and rewards research efforts directed towards those languages. At present, the most under-served languages are the ones with a relatively high population, but nonetheless overlooked by composite multilingual benchmarks (like Punjabi, Portuguese, and Wu Chinese). Currently, GlobalBench covers 966 datasets in 190 languages, and has 1,128 system submissions spanning 62 languages.

* Preprint, 9 pages

Via

Access Paper or Ask Questions

LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages

May 23, 2023

Milind Agarwal, Md Mahfuz Ibn Alam, Antonios Anastasopoulos

Abstract:Knowing the language of an input text/audio is a necessary first step for using almost every natural language processing (NLP) tool such as taggers, parsers, or translation systems. Language identification is a well-studied problem, sometimes even considered solved; in reality, most of the world's 7000 languages are not supported by current systems. This lack of representation affects large-scale data mining efforts and further exacerbates data shortage for low-resource languages. We take a step towards tackling the data bottleneck by compiling a corpus of over 50K parallel children's stories in 350+ languages and dialects, and the computation bottleneck by building lightweight hierarchical models for language identification. Our data can serve as benchmark data for language identification of short texts and for understudied translation directions such as those between Indian or African languages. Our proposed method, Hierarchical LIMIT, uses limited computation to expand coverage into excluded languages while maintaining prediction quality.

* 25 pages, 2 figures, 13 tables

Via

Access Paper or Ask Questions

GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters

Apr 25, 2023

Md Mahfuz Ibn Alam, Ruoyu Xie, Fahim Faisal, Antonios Anastasopoulos

Figure 1 for GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters

Figure 2 for GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters

Figure 3 for GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters

Figure 4 for GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters

Abstract:This report describes GMU's sentiment analysis system for the SemEval-2023 shared task AfriSenti-SemEval. We participated in all three sub-tasks: Monolingual, Multilingual, and Zero-Shot. Our approach uses models initialized with AfroXLMR-large, a pre-trained multilingual language model trained on African languages and fine-tuned correspondingly. We also introduce augmented training data along with original training data. Alongside finetuning, we perform phylogeny-based adapter tuning to create several models and ensemble the best models for the final submission. Our system achieves the best F1-score on track 5: Amharic, with 6.2 points higher F1-score than the second-best performing system on this track. Overall, our system ranks 5th among the 10 systems participating in all 15 tracks.

* Accepted at SemEval Workshop at ACL 2023

Via

Access Paper or Ask Questions

PALI: A Language Identification Benchmark for Perso-Arabic Scripts

Apr 03, 2023

Sina Ahmadi, Milind Agarwal, Antonios Anastasopoulos

Figure 1 for PALI: A Language Identification Benchmark for Perso-Arabic Scripts

Figure 2 for PALI: A Language Identification Benchmark for Perso-Arabic Scripts

Figure 3 for PALI: A Language Identification Benchmark for Perso-Arabic Scripts

Figure 4 for PALI: A Language Identification Benchmark for Perso-Arabic Scripts

Abstract:The Perso-Arabic scripts are a family of scripts that are widely adopted and used by various linguistic communities around the globe. Identifying various languages using such scripts is crucial to language technologies and challenging in low-resource setups. As such, this paper sheds light on the challenges of detecting languages using Perso-Arabic scripts, especially in bilingual communities where ``unconventional'' writing is practiced. To address this, we use a set of supervised techniques to classify sentences into their languages. Building on these, we also propose a hierarchical model that targets clusters of languages that are more often confused by the classifiers. Our experiment results indicate the effectiveness of our solutions.

* 13 pages - accepted at VarDial at EACL 2023

Via

Access Paper or Ask Questions