Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Evgeny Tetin

Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM

Dec 25, 2025

Alexander Podolskiy, Semen Molokov, Timofey Gerasin, Maksim Titov, Alexey Rukhovich, Artem Khrapov, Kirill Morozov, Evgeny Tetin, Constantine Korikov, Pavel Efimov(+7 more)

Abstract:We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Gamayun addresses the lack of research on small non-English-centric LLMs by adopting a novel two-stage pre-training strategy: balanced multilingual training for cross-lingual alignment, followed by high-quality English enrichment to transfer performance gains across languages. Our model supports 12 languages, with special focus on Russian. Despite a significantly smaller training budget than comparable models, Gamayun outperforms LLaMA3.2-1B (9T tokens) on all considered benchmarks, and surpasses Qwen2.5-1.5B (18T tokens) on a wide range of English and multilingual tasks. It matches or exceeds Qwen3 (36T tokens) on most tasks outside advanced STEM, achieving state-of-the-art results in Russian, including the MERA benchmark, among the models of comparable size (1-2B parameters).

Via

Access Paper or Ask Questions

RDRN: Recursively Defined Residual Network for Image Super-Resolution

Nov 17, 2022

Alexander Panaetov, Karim Elhadji Daou, Igor Samenko, Evgeny Tetin, Ilya Ivanov

Abstract:Deep convolutional neural networks (CNNs) have obtained remarkable performance in single image super-resolution (SISR). However, very deep networks can suffer from training difficulty and hardly achieve further performance gain. There are two main trends to solve that problem: improving the network architecture for better propagation of features through large number of layers and designing an attention mechanism for selecting most informative features. Recent SISR solutions propose advanced attention and self-attention mechanisms. However, constructing a network to use an attention block in the most efficient way is a challenging problem. To address this issue, we propose a general recursively defined residual block (RDRB) for better feature extraction and propagation through network layers. Based on RDRB we designed recursively defined residual network (RDRN), a novel network architecture which utilizes attention blocks efficiently. Extensive experiments show that the proposed model achieves state-of-the-art results on several popular super-resolution benchmarks and outperforms previous methods by up to 0.43 dB.

* Accepted to ACCV 2022

Via

Access Paper or Ask Questions