Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Goran Glavaš

One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer

Oct 16, 2023

Fabian David Schmidt, Ivan Vulić, Goran Glavaš

Figure 1 for One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer

Figure 2 for One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer

Figure 3 for One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer

Figure 4 for One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging For Cross-Lingual Transfer

Abstract:Multilingual language models enable zero-shot cross-lingual transfer (ZS-XLT): fine-tuned on sizable source-language task data, they perform the task in target languages without labeled instances. The effectiveness of ZS-XLT hinges on the linguistic proximity between languages and the amount of pretraining data for a language. Because of this, model selection based on source-language validation is unreliable: it picks model snapshots with suboptimal target-language performance. As a remedy, some work optimizes ZS-XLT by extensively tuning hyperparameters: the follow-up work then routinely struggles to replicate the original results. Other work searches over narrower hyperparameter grids, reporting substantially lower performance. In this work, we therefore propose an unsupervised evaluation protocol for ZS-XLT that decouples performance maximization from hyperparameter tuning. As a robust and more transparent alternative to extensive hyperparameter tuning, we propose to accumulatively average snapshots from different runs into a single model. We run broad ZS-XLT experiments on both higher-level semantic tasks (NLI, extractive QA) and a lower-level token classification task (NER) and find that conventional model selection based on source-language validation quickly plateaus to suboptimal ZS-XLT performance. On the other hand, our accumulative run-by-run averaging of models trained with different hyperparameters boosts ZS-XLT performance and closely correlates with "oracle" ZS-XLT, i.e., model selection based on target-language validation performance.

* Accepted to findings of EMNLP 2023

Via

Access Paper or Ask Questions

NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation

Oct 02, 2023

Andreea Iana, Goran Glavaš, Heiko Paulheim

Figure 1 for NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation

Figure 2 for NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation

Figure 3 for NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation

Figure 4 for NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation

Abstract:NewsRecLib is an open-source library based on Pytorch-Lightning and Hydra developed for training and evaluating neural news recommendation models. The foremost goals of NewsRecLib are to promote reproducible research and rigorous experimental evaluation by (i) providing a unified and highly configurable framework for exhaustive experimental studies and (ii) enabling a thorough analysis of the performance contribution of different model architecture components and training regimes. NewsRecLib is highly modular, allows specifying experiments in a single configuration file, and includes extensive logging facilities. Moreover, NewsRecLib provides out-of-the-box implementations of several prominent neural models, training methods, standard evaluation benchmarks, and evaluation metrics for news recommendation.

* Accepted at the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

Via

Access Paper or Ask Questions

Train Once, Use Flexibly: A Modular Framework for Multi-Aspect Neural News Recommendation

Jul 29, 2023

Andreea Iana, Goran Glavaš, Heiko Paulheim

Figure 1 for Train Once, Use Flexibly: A Modular Framework for Multi-Aspect Neural News Recommendation

Figure 2 for Train Once, Use Flexibly: A Modular Framework for Multi-Aspect Neural News Recommendation

Figure 3 for Train Once, Use Flexibly: A Modular Framework for Multi-Aspect Neural News Recommendation

Figure 4 for Train Once, Use Flexibly: A Modular Framework for Multi-Aspect Neural News Recommendation

Abstract:Recent neural news recommenders (NNR) extend content-based recommendation by (1) aligning additional aspects such as topic or sentiment between the candidate news and user history or (2) diversifying recommendations w.r.t. these aspects. This customization is achieved by ``hardcoding'' additional constraints into NNR's architecture and/or training objectives: any change in the desired recommendation behavior thus requires the model to be retrained with a modified objective, impeding wide adoption of multi-aspect news recommenders. In this work, we introduce MANNeR, a modular framework for flexible multi-aspect (neural) news recommendation that supports ad-hoc customization over individual aspects at inference time. With metric-based learning at its core, MANNeR obtains aspect-specialized news encoders and then flexibly combines aspect-specific similarity scores for final ranking. Evaluation on two standard news recommendation benchmarks (one in English, one in Norwegian) shows that MANNeR consistently outperforms state-of-the-art NNRs on both standard content-based recommendation and single- and multi-aspect customization. Moreover, with MANNeR we can trivially scale the importance and find the optimal trade-off between content-based recommendation performance and aspect-based diversity of recommendations. Finally, we show that both MANNeR's content-based recommendation and aspect customization are robust to domain- and language transfer.

Via

Access Paper or Ask Questions

mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

Jul 13, 2023

Gregor Geigle, Abhay Jain, Radu Timofte, Goran Glavaš

Figure 1 for mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

Figure 2 for mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

Figure 3 for mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

Figure 4 for mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs

Abstract:Modular vision-language models (Vision-LLMs) align pretrained image encoders with (pretrained) large language models (LLMs), representing a computationally much more efficient alternative to end-to-end training of large vision-language models from scratch, which is prohibitively expensive for most. Vision-LLMs instead post-hoc condition LLMs to `understand' the output of an image encoder. With the abundance of readily available high-quality English image-text data as well as monolingual English LLMs, the research focus has been on English-only Vision-LLMs. Multilingual vision-language models are still predominantly obtained via expensive end-to-end pretraining, resulting in comparatively smaller models, trained on limited multilingual image data supplemented with text-only multilingual corpora. In this work, we present mBLIP, the first multilingual Vision-LLM, which we obtain in a computationally efficient manner -- on consumer hardware using only a few million training examples -- by leveraging a pretrained multilingual LLM. To this end, we \textit{re-align} an image encoder previously tuned to an English LLM to a new, multilingual LLM -- for this, we leverage multilingual data from a mix of vision-and-language tasks, which we obtain by machine-translating high-quality English data to 95 languages. On the IGLUE benchmark, mBLIP yields results competitive with state-of-the-art models. Moreover, in image captioning on XM3600, mBLIP (zero-shot) even outperforms PaLI-X (a model with 55B parameters). Compared to these very large multilingual vision-language models trained from scratch, we obtain mBLIP by training orders of magnitude fewer parameters on magnitudes less data. We release our model and code at \url{https://github.com/gregor-ge/mBLIP}.

Via

Access Paper or Ask Questions

Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

Jun 14, 2023

Gregor Geigle, Radu Timofte, Goran Glavaš

Figure 1 for Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

Figure 2 for Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

Figure 3 for Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

Figure 4 for Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

Abstract:Vision-and-language (VL) models with separate encoders for each modality (e.g., CLIP) have become the go-to models for zero-shot image classification and image-text retrieval. The bulk of the evaluation of these models is, however, performed with English text only: the costly creation of language-specific image-caption datasets has limited multilingual VL benchmarks to a handful of high-resource languages. In this work, we introduce Babel-ImageNet, a massively multilingual benchmark that offers (partial) translations of 1000 ImageNet labels to 92 languages, built without resorting to machine translation (MT) or requiring manual annotation. We instead automatically obtain reliable translations of ImageNext concepts by linking them -- via shared WordNet synsets -- to BabelNet, a massively multilingual lexico-semantic network. We evaluate 8 different publicly available multilingual CLIP models on zero-shot image classification (ZS-IC) for each of the 92 Babel-ImageNet languages, demonstrating a significant gap between English ImageNet performance and that of high-resource languages (e.g., German or Chinese), and an even bigger gap for low-resource languages (e.g., Sinhala or Lao). Crucially, we show that the models' ZS-IC performance on Babel-ImageNet highly correlates with their performance in image-text retrieval, validating that Babel-ImageNet is suitable for estimating the quality of the multilingual VL representation spaces for the vast majority of languages that lack gold image-text data. Finally, we show that the performance of multilingual CLIP for low-resource languages can be drastically improved via cheap, parameter-efficient language-specific training. We make our code and data publicly available: \url{https://github.com/gregor-ge/Babel-ImageNet}

Via

Access Paper or Ask Questions

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

May 26, 2023

Fabian David Schmidt, Ivan Vulić, Goran Glavaš

Abstract:Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where models fine-tuned on task data in a source language are transferred without any or with only a few annotated instances to the target language(s). However, current work typically overestimates model performance as fine-tuned models are frequently evaluated at model checkpoints that generalize best to validation instances in the target languages. This effectively violates the main assumptions of "true" ZS-XLT and FS-XLT. Such XLT setups require robust methods that do not depend on labeled target language data for validation and model selection. In this work, aiming to improve the robustness of "true" ZS-XLT and FS-XLT, we propose a simple and effective method that averages different checkpoints (i.e., model snapshots) during task fine-tuning. We conduct exhaustive ZS-XLT and FS-XLT experiments across higher-level semantic tasks (NLI, extractive QA) and lower-level token classification tasks (NER, POS). The results indicate that averaging model checkpoints yields systematic and consistent performance gains across diverse target languages in all tasks. Importantly, it simultaneously substantially desensitizes XLT to varying hyperparameter choices in the absence of target language validation. We also show that checkpoint averaging benefits performance when further combined with run averaging (i.e., averaging the parameters of models fine-tuned over independent runs).

* Accepted To Appear In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics

Via

Access Paper or Ask Questions

Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

May 23, 2023

David Dukić, Kiril Gashteovski, Goran Glavaš, Jan Šnajder

Figure 1 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Figure 2 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Figure 3 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Figure 4 for Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer

Abstract:Event detection is a crucial information extraction task in many domains, such as Wikipedia or news. The task typically relies on trigger detection (TD) -- identifying token spans in the text that evoke specific events. While the notion of triggers should ideally be universal across domains, domain transfer for TD from high- to low-resource domains results in significant performance drops. We address the problem of negative transfer for TD by coupling triggers between domains using subject-object relations obtained from a rule-based open information extraction (OIE) system. We demonstrate that relations injected through multi-task training can act as mediators between triggers in different domains, enhancing zero- and few-shot TD domain transfer and reducing negative transfer, in particular when transferring from a high-resource source Wikipedia domain to a low-resource target news domain. Additionally, we combine the extracted relations with masked language modeling on the target domain and obtain further TD performance gains. Finally, we demonstrate that the results are robust to the choice of the OIE system.

Via

Access Paper or Ask Questions

A General-Purpose Multilingual Document Encoder

May 11, 2023

Onur Galoğlu, Robert Litschko, Goran Glavaš

Abstract:Massively multilingual pretrained transformers (MMTs) have tremendously pushed the state of the art on multilingual NLP and cross-lingual transfer of NLP models in particular. While a large body of work leveraged MMTs to mine parallel data and induce bilingual document embeddings, much less effort has been devoted to training general-purpose (massively) multilingual document encoder that can be used for both supervised and unsupervised document-level tasks. In this work, we pretrain a massively multilingual document encoder as a hierarchical transformer model (HMDE) in which a shallow document transformer contextualizes sentence representations produced by a state-of-the-art pretrained multilingual sentence encoder. We leverage Wikipedia as a readily available source of comparable documents for creating training data, and train HMDE by means of a cross-lingual contrastive objective, further exploiting the category hierarchy of Wikipedia for creation of difficult negatives. We evaluate the effectiveness of HMDE in two arguably most common and prominent cross-lingual document-level tasks: (1) cross-lingual transfer for topical document classification and (2) cross-lingual document retrieval. HMDE is significantly more effective than (i) aggregations of segment-based representations and (ii) multilingual Longformer. Crucially, owing to its massively multilingual lower transformer, HMDE successfully generalizes to languages unseen in document-level pretraining. We publicly release our code and models at https://github.com/ogaloglu/pre-training-multilingual-document-encoders .

Via

Access Paper or Ask Questions

Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese

Apr 18, 2023

Vésteinn Snæbjarnarson, Annika Simonsen, Goran Glavaš, Ivan Vulić

Abstract:Multilingual language models have pushed state-of-the-art in cross-lingual NLP transfer. The majority of zero-shot cross-lingual transfer, however, use one and the same massively multilingual transformer (e.g., mBERT or XLM-R) to transfer to all target languages, irrespective of their typological, etymological, and phylogenetic relations to other languages. In particular, readily available data and models of resource-rich sibling languages are often ignored. In this work, we empirically show, in a case study for Faroese -- a low-resource language from a high-resource language family -- that by leveraging the phylogenetic information and departing from the 'one-size-fits-all' paradigm, one can improve cross-lingual transfer to low-resource languages. In particular, we leverage abundant resources of other Scandinavian languages (i.e., Danish, Norwegian, Swedish, and Icelandic) for the benefit of Faroese. Our evaluation results show that we can substantially improve the transfer performance to Faroese by exploiting data and models of closely-related high-resource languages. Further, we release a new web corpus of Faroese and Faroese datasets for named entity recognition (NER), semantic text similarity (STS), and new language models trained on all Scandinavian languages.

Via

Access Paper or Ask Questions

Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

Apr 06, 2023

Andreea Iana, Goran Glavaš, Heiko Paulheim

Figure 1 for Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

Figure 2 for Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

Figure 3 for Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

Figure 4 for Simplifying Content-Based Neural News Recommendation: On User Modeling and Training Objectives

Abstract:The advent of personalized news recommendation has given rise to increasingly complex recommender architectures. Most neural news recommenders rely on user click behavior and typically introduce dedicated user encoders that aggregate the content of clicked news into user embeddings (early fusion). These models are predominantly trained with standard point-wise classification objectives. The existing body of work exhibits two main shortcomings: (1) despite general design homogeneity, direct comparisons between models are hindered by varying evaluation datasets and protocols; (2) it leaves alternative model designs and training objectives vastly unexplored. In this work, we present a unified framework for news recommendation, allowing for a systematic and fair comparison of news recommenders across several crucial design dimensions: (i) candidate-awareness in user modeling, (ii) click behavior fusion, and (iii) training objectives. Our findings challenge the status quo in neural news recommendation. We show that replacing sizable user encoders with parameter-efficient dot products between candidate and clicked news embeddings (late fusion) often yields substantial performance gains. Moreover, our results render contrastive training a viable alternative to point-wise classification objectives.

* Accepted at the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

Via

Access Paper or Ask Questions