Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vladimir Polovnikov

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Nov 19, 2025

Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Viacheslav Vasilev, Alexey Letunovskiy, Maria Kovaleva, Nikolai Vaulin, Ivan Kirillov, Lev Novitskiy(+15 more)

Figure 1 for Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Figure 2 for Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Figure 3 for Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Figure 4 for Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Abstract:This report introduces Kandinsky 5.0, a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesis. The framework comprises three core line-up of models: Kandinsky 5.0 Image Lite - a line-up of 6B parameter image generation models, Kandinsky 5.0 Video Lite - a fast and lightweight 2B parameter text-to-video and image-to-video models, and Kandinsky 5.0 Video Pro - 19B parameter models that achieves superior video generation quality. We provide a comprehensive review of the data curation lifecycle - including collection, processing, filtering and clustering - for the multi-stage training pipeline that involves extensive pre-training and incorporates quality-enhancement techniques such as self-supervised fine-tuning (SFT) and reinforcement learning (RL)-based post-training. We also present novel architectural, training, and inference optimizations that enable Kandinsky 5.0 to achieve high generation speeds and state-of-the-art performance across various tasks, as demonstrated by human evaluation. As a large-scale, publicly available generative framework, Kandinsky 5.0 leverages the full potential of its pre-training and subsequent stages to be adapted for a wide range of generative applications. We hope that this report, together with the release of our open-source code and training checkpoints, will substantially advance the development and accessibility of high-quality generative models for the research community.

* Website: https://kandinskylab.ai/

Via

Access Paper or Ask Questions

$ abla$NABLA: Neighborhood Adaptive Block-Level Attention

Jul 17, 2025

Dmitrii Mikhailov, Aleksey Letunovskiy, Maria Kovaleva, Vladimir Arkhipkin, Vladimir Korviakov, Vladimir Polovnikov, Viacheslav Vasilev, Evelina Sidorova, Denis Dimitrov

Figure 1 for $ abla$NABLA: Neighborhood Adaptive Block-Level Attention

Figure 2 for $ abla$NABLA: Neighborhood Adaptive Block-Level Attention

Figure 3 for $ abla$NABLA: Neighborhood Adaptive Block-Level Attention

Figure 4 for $ abla$NABLA: Neighborhood Adaptive Block-Level Attention

Abstract:Recent progress in transformer-based architectures has demonstrated remarkable success in video generation tasks. However, the quadratic complexity of full attention mechanisms remains a critical bottleneck, particularly for high-resolution and long-duration video sequences. In this paper, we propose NABLA, a novel Neighborhood Adaptive Block-Level Attention mechanism that dynamically adapts to sparsity patterns in video diffusion transformers (DiTs). By leveraging block-wise attention with adaptive sparsity-driven threshold, NABLA reduces computational overhead while preserving generative quality. Our method does not require custom low-level operator design and can be seamlessly integrated with PyTorch's Flex Attention operator. Experiments demonstrate that NABLA achieves up to 2.7x faster training and inference compared to baseline almost without compromising quantitative metrics (CLIP score, VBench score, human evaluation score) and visual quality drop. The code and model weights are available here: https://github.com/gen-ai-team/Wan2.1-NABLA

Via

Access Paper or Ask Questions

ISyNet: Convolutional Neural Networks design for AI accelerator

Sep 04, 2021

Alexey Letunovskiy, Vladimir Korviakov, Vladimir Polovnikov, Anastasiia Kargapoltseva, Ivan Mazurenko, Yepan Xiong

Figure 1 for ISyNet: Convolutional Neural Networks design for AI accelerator

Figure 2 for ISyNet: Convolutional Neural Networks design for AI accelerator

Figure 3 for ISyNet: Convolutional Neural Networks design for AI accelerator

Figure 4 for ISyNet: Convolutional Neural Networks design for AI accelerator

Abstract:In recent years Deep Learning reached significant results in many practical problems, such as computer vision, natural language processing, speech recognition and many others. For many years the main goal of the research was to improve the quality of models, even if the complexity was impractically high. However, for the production solutions, which often require real-time work, the latency of the model plays a very important role. Current state-of-the-art architectures are found with neural architecture search (NAS) taking model complexity into account. However, designing of the search space suitable for specific hardware is still a challenging task. To address this problem we propose a measure of hardware efficiency of neural architecture search space - matrix efficiency measure (MEM); a search space comprising of hardware-efficient operations; a latency-aware scaling method; and ISyNet - a set of architectures designed to be fast on the specialized neural processing unit (NPU) hardware and accurate at the same time. We show the advantage of the designed architectures for the NPU devices on ImageNet and the generalization ability for the downstream classification and detection tasks.

* 13 pages, 5 figures

Via

Access Paper or Ask Questions