Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrei Chernov

Binary Cumulative Encoding meets Time Series Forecasting

May 30, 2025

Andrei Chernov, Vitaliy Pozdnyakov, Ilya Makarov

Abstract:Recent studies in time series forecasting have explored formulating regression via classification task. By discretizing the continuous target space into bins and predicting over a fixed set of classes, these approaches benefit from stable training, robust uncertainty modeling, and compatibility with modern deep learning architectures. However, most existing methods rely on one-hot encoding that ignores the inherent ordinal structure of the underlying values. As a result, they fail to provide information about the relative distance between predicted and true values during training. In this paper, we propose to address this limitation by introducing binary cumulative encoding (BCE), that represents scalar targets into monotonic binary vectors. This encoding implicitly preserves order and magnitude information, allowing the model to learn distance-aware representations while still operating within a classification framework. We propose a convolutional neural network architecture specifically designed for BCE, incorporating residual and dilated convolutions to enable fast and expressive temporal modeling. Through extensive experiments on benchmark forecasting datasets, we show that our approach outperforms widely used methods in both point and probabilistic forecasting, while requiring fewer parameters and enabling faster training.

Via

Access Paper or Ask Questions

Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks

Feb 24, 2025

Andrei Chernov

Abstract:Recently, Large Language Models (LLMs) with Mixture of Experts (MoE) layers have gained significant attention. Currently, state-of-the-art LLMs utilize this architecture. There is a substantial amount of research on how to train such models and how to select hyperparameters for this architecture. However, there is a lack of studies focusing on post-evaluation analysis of MoE layer properties. In this paper, we take a first step toward closing this gap by evaluating expert contributions on the quiz-based MMLU benchmark. We show that most experts were never activated during inference on this benchmark. Additionally, the output distribution of gating networks is much closer to uniform than sparse. Finally, we demonstrate that the average performance of some experts within the same layer varies significantly.

* preprint, short paper

Via

Access Paper or Ask Questions

The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE

Feb 24, 2025

Andrei Chernov, Oleg Novitskij

Abstract:Recent studies have shown that reducing symmetries in neural networks enhances linear mode connectivity between networks without requiring parameter space alignment, leading to improved performance in linearly interpolated neural networks. However, in practical applications, neural network interpolation is rarely used; instead, ensembles of networks are more common. In this paper, we empirically investigate the impact of reducing symmetries on the performance of deep ensembles and Mixture of Experts (MoE) across five datasets. Additionally, to explore deeper linear mode connectivity, we introduce the Mixture of Interpolated Experts (MoIE). Our results show that deep ensembles built on asymmetric neural networks achieve significantly better performance as ensemble size increases compared to their symmetric counterparts. In contrast, our experiments do not provide conclusive evidence on whether reducing symmetries affects both MoE and MoIE architectures.

* preprint

Via

Access Paper or Ask Questions

(GG) MoE vs. MLP on Tabular Data

Feb 05, 2025

Andrei Chernov

Abstract:In recent years, significant efforts have been directed toward adapting modern neural network architectures for tabular data. However, despite their larger number of parameters and longer training and inference times, these models often fail to consistently outperform vanilla multilayer perceptron (MLP) neural networks. Moreover, MLP-based ensembles have recently demonstrated superior performance and efficiency compared to advanced deep learning methods. Therefore, rather than focusing on building deeper and more complex deep learning models, we propose investigating whether MLP neural networks can be replaced with more efficient architectures without sacrificing performance. In this paper, we first introduce GG MoE, a mixture-of-experts (MoE) model with a Gumbel-Softmax gating function. We then demonstrate that GG MoE with an embedding layer achieves the highest performance across $38$ datasets compared to standard MoE and MLP models. Finally, we show that both MoE and GG MoE utilize significantly fewer parameters than MLPs, making them a promising alternative for scaling and ensemble methods.

Via

Access Paper or Ask Questions