Picture for Nikola Ljubešić

Nikola Ljubešić

Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines

Add code
May 12, 2024
Viaarxiv icon

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Add code
Apr 08, 2024
Viaarxiv icon

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation

Add code
Mar 26, 2024
Viaarxiv icon

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Add code
Mar 13, 2024
Viaarxiv icon

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Add code
Nov 15, 2023
Viaarxiv icon

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Add code
Sep 18, 2023
Viaarxiv icon

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

Add code
Aug 11, 2023
Viaarxiv icon

Findings of the VarDial Evaluation Campaign 2023

Add code
May 31, 2023
Viaarxiv icon

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

Add code
Mar 08, 2023
Viaarxiv icon

Geographic Adaptation of Pretrained Language Models

Add code
Mar 16, 2022
Figure 1 for Geographic Adaptation of Pretrained Language Models
Figure 2 for Geographic Adaptation of Pretrained Language Models
Figure 3 for Geographic Adaptation of Pretrained Language Models
Figure 4 for Geographic Adaptation of Pretrained Language Models
Viaarxiv icon