Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pierrette Bouillon

ISSCO, University of Geneva

Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation

Feb 18, 2026

Jonathan Mutal, Perla Al Almaoui, Simon Hengchen, Pierrette Bouillon

Abstract:Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap by enabling Arabic to be modeled as a pluricentric language rather than a monolithic system. This paper presents Aladdin-FTI, our submission to the AMIYA shared task. The proposed system is designed to both generate and translate dialectal Arabic (DA). Specifically, the model supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects, as well as bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. The code and trained model are publicly available.

* 13 pages, Paper submitted to the AMIYA shared task at the VarDial workshop, co-located with EACL 2026

Via

Access Paper or Ask Questions

Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

Feb 28, 2025

Perla Al Almaoui, Pierrette Bouillon, Simon Hengchen

Figure 1 for Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

Figure 2 for Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

Figure 3 for Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

Figure 4 for Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

Abstract:In this era of rapid technological advancements, communication continues to evolve as new linguistic phenomena emerge. Among these is Arabizi, a hybrid form of Arabic that incorporates Latin characters and numbers to represent the spoken dialects of Arab communities. Arabizi is widely used on social media and allows people to communicate in an informal and dynamic way, but it poses significant challenges for machine translation due to its lack of formal structure and deeply embedded cultural nuances. This case study arises from a growing need to translate Arabizi for gisting purposes. It evaluates the capacity of different LLMs to decode and translate Arabizi, focusing on multiple Arabic dialects that have rarely been studied up until now. Using a combination of human evaluators and automatic metrics, this research project investigates the model's performance in translating Arabizi into both Modern Standard Arabic and English. Key questions explored include which dialects are translated most effectively and whether translations into English surpass those into Arabic.

* Submitted to MT Summit 2025

Via

Access Paper or Ask Questions

Helping Domain Experts Build Speech Translation Systems

Oct 07, 2015

Manny Rayner, Alejandro Armando, Pierrette Bouillon, Sarah Ebling, Johanna Gerlach, Sonia Halimi, Irene Strasly, Nikos Tsourakis

Figure 1 for Helping Domain Experts Build Speech Translation Systems

Figure 2 for Helping Domain Experts Build Speech Translation Systems

Abstract:We present a new platform, "Regulus Lite", which supports rapid development and web deployment of several types of phrasal speech translation systems using a minimal formalism. A distinguishing feature is that most development work can be performed directly by domain experts. We motivate the need for platforms of this type and discuss three specific cases: medical speech translation, speech-to-sign-language translation and voice questionnaires. We briefly describe initial experiences in developing practical systems.

* 12 pages, 1 figure, to appear in Proc. Future and Emerging Trends in Language Technology 2015, Seville, Spain

Via

Access Paper or Ask Questions

Mental State Adjectives: the Perspective of Generative Lexicon

Jul 15, 1996

Pierrette Bouillon

Abstract:This paper focusses on mental state adjectives and offers a unified analysis in the theory of Generative Lexicon (Pustejovsky, 1991, 1995). We show that, instead of enumerating the various syntactic constructions they enter into, with the different senses which arise, it is possible to give them a rich typed semantic representation which will explain both their semantic and syntactic polymorphism.

* 6 pages, uses colap.sty. tar gzip uuencode. To appear in Proceedings of COLING-96

Via

Access Paper or Ask Questions

Adapting the Core Language Engine to French and Spanish

May 10, 1996

Manny Rayner, David Carter, Pierrette Bouillon

Abstract:We describe how substantial domain-independent language-processing systems for French and Spanish were quickly developed by manually adapting an existing English-language system, the SRI Core Language Engine. We explain the adaptation process in detail, and argue that it provides a fairly general recipe for converting a grammar-based system for English into a corresponding one for a Romance language.

* 9 pages, aclap.sty; to appear in NLP+IA 96; see also http://www.cam.sri.com/

Via

Access Paper or Ask Questions

Hybrid Transfer in an English-French Spoken Language Translator

May 26, 1995

Manny Rayner, Pierrette Bouillon

Abstract:The paper argues the importance of high-quality translation for spoken language translation systems. It describes an architecture suitable for rapid development of high-quality limited-domain translation systems, which has been implemented within an advanced prototype English to French spoken language translator. The focus of the paper is the hybrid transfer model which combines unification-based rules and a set of trainable statistical preferences; roughly, rules encode domain-independent grammatical information and preferences encode domain-dependent distributional information. The preferences are trained from sets of examples produced by the system, which have been annotated by human judges as correct or incorrect. An experiment is described in which the model was tested on a 2000 utterance sample of previously unseen data.

* 7 pages, LaTeX (2.09 preferred); eaclap.sty; Procs of IA '95 (Montpellier, France)

Via

Access Paper or Ask Questions