Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matea Tashkovska

Towards Open Foundation Language Model and Corpus for Macedonian: A Low-Resource Language

Jun 11, 2025

Stefan Krsteski, Matea Tashkovska, Borjan Sazdov, Hristijan Gjoreski, Branislav Gerazov

Abstract:The increase in technological adoption worldwide comes with demands for novel tools to be used by the general population. Large Language Models (LLMs) provide a great opportunity in this respect, but their capabilities remain limited for low-resource languages, restricting applications in countries where such languages are spoken. We create several resources to facilitate the adoption of LLMs and to support research advancements for Macedonian. We collect the largest Macedonian corpus to date, consisting of 40GB of textual data and totaling 3.5B words. To support conversational applications, we collect a 106k-instance instruction dataset, carefully built to be culturally grounded. For evaluation, we construct a Macedonian evaluation suite covering seven benchmarks. Finally, we train domestic-yak, a state-of-the-art 8B-parameter model, on our curated datasets and evaluate it against eight baseline models using the newly constructed benchmark suite. Our model outperforms all existing models in the 8B parameter range across all benchmarks, and achieves performance comparable to models up to 10x larger. Furthermore, a qualitative analysis with native speakers reveals that our model is preferred over larger counterparts, receiving higher ratings for grammatical correctness and cultural appropriateness. All datasets, code, and model weights are openly released, setting a foundation for advancing LLMs in similarly underrepresented languages. These resources are publicly available at github.com/LVSTCK for source code, and at huggingface.co/LVSTCK for pretrained model weights and data.

* Camera-ready version accepted at SlavNLP-2025@ACL

Via

Access Paper or Ask Questions

A System for Differentiation of Schizophrenia and Bipolar Disorder based on rsfMRI

Jul 01, 2023

Daniela Janeva, Stefan Krsteski, Matea Tashkovska, Nikola Jovanovski, Tomislav Kartalov, Dimitar Taskovski, Zoran Ivanovski, Branislav Gerazov

Figure 1 for A System for Differentiation of Schizophrenia and Bipolar Disorder based on rsfMRI

Figure 2 for A System for Differentiation of Schizophrenia and Bipolar Disorder based on rsfMRI

Figure 3 for A System for Differentiation of Schizophrenia and Bipolar Disorder based on rsfMRI

Figure 4 for A System for Differentiation of Schizophrenia and Bipolar Disorder based on rsfMRI

Abstract:Schizophrenia and bipolar disorder are debilitating psychiatric illnesses that can be challenging to diagnose accurately. The similarities between the diseases make it difficult to differentiate between them using traditional diagnostic tools. Recently, resting-state functional magnetic resonance imaging (rsfMRI) has emerged as a promising tool for the diagnosis of psychiatric disorders. This paper presents several methods for differentiating schizophrenia and bipolar disorder based on features extracted from rsfMRI data. The system that achieved the best results, uses 1D Convolutional Neural Networks to analyze patterns of Intrinsic Connectivity time courses obtained from rsfMRI and potentially identify biomarkers that distinguish between the two disorders. We evaluate the system's performance on a large dataset of patients with schizophrenia and bipolar disorder and demonstrate that the system achieves a 0.7078 Area Under Curve (AUC) score in differentiating patients with these disorders. Our results suggest that rsfMRI-based classification systems have great potential for improving the accuracy of psychiatric diagnoses and may ultimately lead to more effective treatments for patients with this disorder.

Via

Access Paper or Ask Questions