Picture for Nikola Ljubešić

Nikola Ljubešić

Identifying Primary Stress Across Related Languages and Dialects with Transformer-based Speech Encoder Models

Add code
May 30, 2025
Viaarxiv icon

CLASSLA-Express: a Train of CLARIN.SI Workshops on Language Resources and Tools with Easily Expanding Route

Add code
Dec 02, 2024
Viaarxiv icon

LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification

Add code
Nov 29, 2024
Viaarxiv icon

Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines

Add code
May 12, 2024
Viaarxiv icon

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Add code
Apr 08, 2024
Viaarxiv icon

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation

Add code
Mar 26, 2024
Viaarxiv icon

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Add code
Mar 13, 2024
Viaarxiv icon

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Add code
Nov 15, 2023
Figure 1 for Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Figure 2 for Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Figure 3 for Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Figure 4 for Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Viaarxiv icon

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Add code
Sep 18, 2023
Viaarxiv icon

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

Add code
Aug 11, 2023
Viaarxiv icon