Alert button
Picture for Peter Rupnik

Peter Rupnik

Alert button

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Add code
Bookmark button
Alert button
Apr 08, 2024
Nikola Ljubešić, Vít Suchomel, Peter Rupnik, Taja Kuzman, Rik van Noord

Viaarxiv icon

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Add code
Bookmark button
Alert button
Mar 13, 2024
Rik van Noord, Taja Kuzman, Peter Rupnik, Nikola Ljubešić, Miquel Esplà-Gomis, Gema Ramírez-Sánchez, Antonio Toral

Figure 1 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 2 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 3 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Figure 4 for Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Viaarxiv icon

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Add code
Bookmark button
Alert button
Sep 18, 2023
Michal Mochtak, Peter Rupnik, Nikola Ljubešić

Figure 1 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 2 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 3 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Figure 4 for The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings
Viaarxiv icon

The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia

Add code
Bookmark button
Alert button
Jun 02, 2022
Michal Mochtak, Peter Rupnik, Nikola Ljubešič

Figure 1 for The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
Figure 2 for The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
Figure 3 for The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
Figure 4 for The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
Viaarxiv icon

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild

Add code
Bookmark button
Alert button
Jan 11, 2022
Taja Kuzman, Peter Rupnik, Nikola Ljubešić

Figure 1 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 2 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 3 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 4 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Viaarxiv icon