Alert button
Picture for Nikola Ljubešić

Nikola Ljubešić

Alert button

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Add code
Bookmark button
Alert button
Apr 08, 2024
Nikola Ljubešić, Vít Suchomel, Peter Rupnik, Taja Kuzman, Rik van Noord

Viaarxiv icon

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation

Add code
Bookmark button
Alert button
Mar 26, 2024
Nikola Ljubešić, Taja Kuzman

Figure 1 for CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Figure 2 for CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Figure 3 for CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Figure 4 for CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Viaarxiv icon

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Add code
Bookmark button
Alert button
Mar 13, 2024
Rik van Noord, Taja Kuzman, Peter Rupnik, Nikola Ljubešić, Miquel Esplà-Gomis, Gema Ramírez-Sánchez, Antonio Toral

Figure 1 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 2 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 3 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 4 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Viaarxiv icon

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Add code
Bookmark button
Alert button
Nov 15, 2023
Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter

Viaarxiv icon

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Add code
Bookmark button
Alert button
Sep 18, 2023
Michal Mochtak, Peter Rupnik, Nikola Ljubešić

Figure 1 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 2 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 3 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 4 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Viaarxiv icon

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

Add code
Bookmark button
Alert button
Aug 11, 2023
Luka Terčon, Nikola Ljubešić

Viaarxiv icon

Findings of the VarDial Evaluation Campaign 2023

Add code
Bookmark button
Alert button
May 31, 2023
Noëmi Aepli, Çağrı Çöltekin, Rob Van Der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubešić, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri

Figure 1 for Findings of the VarDial Evaluation Campaign 2023
Figure 2 for Findings of the VarDial Evaluation Campaign 2023
Figure 3 for Findings of the VarDial Evaluation Campaign 2023
Figure 4 for Findings of the VarDial Evaluation Campaign 2023
Viaarxiv icon

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

Add code
Bookmark button
Alert button
Mar 08, 2023
Taja Kuzman, Igor Mozetič, Nikola Ljubešić

Figure 1 for ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Figure 2 for ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Figure 3 for ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Figure 4 for ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Viaarxiv icon