Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mikkel N. Schmidt

On Joint Regularization and Calibration in Deep Ensembles

Nov 07, 2025

Laurits Fredsgaard, Mikkel N. Schmidt

Abstract:Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy as a practical compromise between enabling joint evaluation and maximizing the use of data for training. Our results demonstrate that jointly tuning the ensemble generally matches or improves performance, with significant variation in effect size across different tasks and metrics. We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution. We believe our findings provide valuable insights and guidance for practitioners looking to optimize deep ensemble models. Code is available at: https://github.com/lauritsf/ensemble-optimality-gap

* Transactions on Machine Learning Research (2025) ISSN: 2835-8856
* 39 pages, 8 figures, 11 tables

Via

Access Paper or Ask Questions

Equivariant Neural Diffusion for Molecule Generation

Jun 12, 2025

François Cornet, Grigory Bartosh, Mikkel N. Schmidt, Christian A. Naesseth

Abstract:We introduce Equivariant Neural Diffusion (END), a novel diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Compared to current state-of-the-art equivariant diffusion models, the key innovation in END lies in its learnable forward process for enhanced generative modelling. Rather than pre-specified, the forward process is parameterized through a time- and data-dependent transformation that is equivariant to rigid transformations. Through a series of experiments on standard molecule generation benchmarks, we demonstrate the competitive performance of END compared to several strong baselines for both unconditional and conditional generation.

* 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Via

Access Paper or Ask Questions

On Local Posterior Structure in Deep Ensembles

Mar 17, 2025

Mikkel Jordahn, Jonas Vestergaard Jensen, Mikkel N. Schmidt, Michael Riis Andersen

Abstract:Bayesian Neural Networks (BNNs) often improve model calibration and predictive uncertainty quantification compared to point estimators such as maximum-a-posteriori (MAP). Similarly, deep ensembles (DEs) are also known to improve calibration, and therefore, it is natural to hypothesize that deep ensembles of BNNs (DE-BNNs) should provide even further improvements. In this work, we systematically investigate this across a number of datasets, neural network architectures, and BNN approximation methods and surprisingly find that when the ensembles grow large enough, DEs consistently outperform DE-BNNs on in-distribution data. To shine light on this observation, we conduct several sensitivity and ablation studies. Moreover, we show that even though DE-BNNs outperform DEs on out-of-distribution metrics, this comes at the cost of decreased in-distribution performance. As a final contribution, we open-source the large pool of trained models to facilitate further research on this topic.

* Code and models available at https://github.com/jonasvj/OnLocalPosteriorStructureInDeepEnsembles

Via

Access Paper or Ask Questions

Blind Equalization using a Variational Autoencoder with Second Order Volterra Channel Model

Oct 21, 2024

Søren Føns Nielsen, Darko Zibar, Mikkel N. Schmidt

Abstract:Existing communication hardware is being exerted to its limits to accommodate for the ever increasing internet usage globally. This leads to non-linear distortion in the communication link that requires non-linear equalization techniques to operate the link at a reasonable bit error rate. This paper addresses the challenge of blind non-linear equalization using a variational autoencoder (VAE) with a second-order Volterra channel model. The VAE framework's costfunction, the evidence lower bound (ELBO), is derived for real-valued constellations and can be evaluated analytically without resorting to sampling techniques. We demonstrate the effectiveness of our approach through simulations on a synthetic Wiener-Hammerstein channel and a simulated intensity modulated direct detection (IM/DD) optical link. The results show significant improvements in equalization performance, compared to a VAE with linear channel assumptions, highlighting the importance of appropriate channel modeling in unsupervised VAE equalizer frameworks.

* Submitted

Via

Access Paper or Ask Questions

End-to-End Learning of Transmitter and Receiver Filters in Bandwidth Limited Fiber Optic Communication Systems

Sep 18, 2024

Søren Føns Nielsen, Francesco Da Ros, Mikkel N. Schmidt, Darko Zibar

Abstract:This paper investigates the application of end-to-end (E2E) learning for joint optimization of pulse-shaper and receiver filter to reduce intersymbol interference (ISI) in bandwidth-limited communication systems. We investigate this in two numerical simulation models: 1) an additive white Gaussian noise (AWGN) channel with bandwidth limitation and 2) an intensity modulated direct detection (IM/DD) link employing an electro-absorption modulator. For both simulation models, we implement a wavelength division multiplexing (WDM) scheme to ensure that the learned filters adhere to the bandwidth constraints of the WDM channels. Our findings reveal that E2E learning greatly surpasses traditional single-sided transmitter pulse-shaper or receiver filter optimization methods, achieving significant performance gains in terms of symbol error rate with shorter filter lengths. These results suggest that E2E learning can decrease the complexity and enhance the performance of future high-speed optical communication systems.

* Under review

Via

Access Paper or Ask Questions

Explaining time series models using frequency masking

Jun 19, 2024

Thea Brüsch, Kristoffer K. Wickstrøm, Mikkel N. Schmidt, Tommy S. Alstrøm, Robert Jenssen

Abstract:Time series data is fundamentally important for describing many critical domains such as healthcare, finance, and climate, where explainable models are necessary for safe automated decision-making. To develop eXplainable AI (XAI) in these domains therefore implies explaining salient information in the time series. Current methods for obtaining saliency maps assumes localized information in the raw input space. In this paper, we argue that the salient information of a number of time series is more likely to be localized in the frequency domain. We propose FreqRISE, which uses masking based methods to produce explanations in the frequency and time-frequency domain, which shows the best performance across a number of tasks.

* Submitted to the Next Generation of AI Safety workshop at ICML 2024

Via

Access Paper or Ask Questions

End-to-End Learning of Pulse-Shaper and Receiver Filter in the Presence of Strong Intersymbol Interference

May 22, 2024

Søren Føns Nielsen, Francesco Da Ros, Mikkel N. Schmidt, Darko Zibar

Figure 1 for End-to-End Learning of Pulse-Shaper and Receiver Filter in the Presence of Strong Intersymbol Interference

Figure 2 for End-to-End Learning of Pulse-Shaper and Receiver Filter in the Presence of Strong Intersymbol Interference

Figure 3 for End-to-End Learning of Pulse-Shaper and Receiver Filter in the Presence of Strong Intersymbol Interference

Figure 4 for End-to-End Learning of Pulse-Shaper and Receiver Filter in the Presence of Strong Intersymbol Interference

Abstract:We numerically demonstrate that joint optimization of FIR based pulse-shaper and receiver filter results in an improved system performance, and shorter filter lengths (lower complexity), for 4-PAM 100 GBd IM/DD systems.

* 4 pages (3 article pages + 1 page for references) and 5 figures. Submitted to European Conference on Optical Communications (ECOC) 2024

Via

Access Paper or Ask Questions

Coherent energy and force uncertainty in deep learning force fields

Dec 07, 2023

Peter Bjørn Jørgensen, Jonas Busk, Ole Winther, Mikkel N. Schmidt

Figure 1 for Coherent energy and force uncertainty in deep learning force fields

Figure 2 for Coherent energy and force uncertainty in deep learning force fields

Figure 3 for Coherent energy and force uncertainty in deep learning force fields

Figure 4 for Coherent energy and force uncertainty in deep learning force fields

Abstract:In machine learning energy potentials for atomic systems, forces are commonly obtained as the negative derivative of the energy function with respect to atomic positions. To quantify aleatoric uncertainty in the predicted energies, a widely used modeling approach involves predicting both a mean and variance for each energy value. However, this model is not differentiable under the usual white noise assumption, so energy uncertainty does not naturally translate to force uncertainty. In this work we propose a machine learning potential energy model in which energy and force aleatoric uncertainty are linked through a spatially correlated noise process. We demonstrate our approach on an equivariant messages passing neural network potential trained on energies and forces on two out-of-equilibrium molecular datasets. Furthermore, we also show how to obtain epistemic uncertainties in this setting based on a Bayesian interpretation of deep ensemble models.

* Presented at Advancing Molecular Machine Learning - Overcoming Limitations [ML4Molecules], ELLIS workshop, VIRTUAL, December 8, 2023, unofficial NeurIPS 2023 side-event

Via

Access Paper or Ask Questions

Multi-view self-supervised learning for multivariate variable-channel time series

Jul 20, 2023

Thea Brüsch, Mikkel N. Schmidt, Tommy S. Alstrøm

Figure 1 for Multi-view self-supervised learning for multivariate variable-channel time series

Figure 2 for Multi-view self-supervised learning for multivariate variable-channel time series

Figure 3 for Multi-view self-supervised learning for multivariate variable-channel time series

Figure 4 for Multi-view self-supervised learning for multivariate variable-channel time series

Abstract:Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with different sets of input channels. We propose learning one encoder to operate on all input channels individually. We then use a message passing neural network to extract a single representation across channels. We demonstrate the potential of this method by pretraining our model on a dataset with six EEG channels and then fine-tuning it on a dataset with two different EEG channels. We compare models with and without the message passing neural network across different contrastive loss functions. We show that our method, combined with the TS2Vec loss, outperforms all other methods in most settings.

* To appear in proceedings of 2023 IEEE International workshop on Machine Learning for Signal Processing

Via

Access Paper or Ask Questions

Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity

Jun 23, 2023

Bo Li, Yasin Esfandiari, Mikkel N. Schmidt, Tommy S. Alstrøm, Sebastian U. Stich

Abstract:In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We prove that shuffling can quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin.

Via

Access Paper or Ask Questions