Picture for Nikola Ljubešić

Nikola Ljubešić

Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines

Add code
May 12, 2024
Figure 1 for Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines
Figure 2 for Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines
Viaarxiv icon

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Add code
Apr 08, 2024
Viaarxiv icon

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation

Add code
Mar 26, 2024
Figure 1 for CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Figure 2 for CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Figure 3 for CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Figure 4 for CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation
Viaarxiv icon

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Add code
Mar 13, 2024
Figure 1 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 2 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 3 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 4 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Viaarxiv icon

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Add code
Nov 15, 2023
Viaarxiv icon

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Add code
Sep 18, 2023
Figure 1 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 2 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 3 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 4 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Viaarxiv icon

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

Add code
Aug 11, 2023
Viaarxiv icon

Findings of the VarDial Evaluation Campaign 2023

Add code
May 31, 2023
Figure 1 for Findings of the VarDial Evaluation Campaign 2023
Figure 2 for Findings of the VarDial Evaluation Campaign 2023
Figure 3 for Findings of the VarDial Evaluation Campaign 2023
Figure 4 for Findings of the VarDial Evaluation Campaign 2023
Viaarxiv icon

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

Add code
Mar 08, 2023
Figure 1 for ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Figure 2 for ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Figure 3 for ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Figure 4 for ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification
Viaarxiv icon

Geographic Adaptation of Pretrained Language Models

Add code
Mar 16, 2022
Figure 1 for Geographic Adaptation of Pretrained Language Models
Figure 2 for Geographic Adaptation of Pretrained Language Models
Figure 3 for Geographic Adaptation of Pretrained Language Models
Figure 4 for Geographic Adaptation of Pretrained Language Models
Viaarxiv icon