Abstract:Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in adopting dropout, a cornerstone of deep learning regularization, making them susceptible to overfitting. To address this research gap, we introduce Continuum Dropout, a universally applicable regularization technique for NDEs built upon the theory of alternating renewal processes. Continuum Dropout formulates the on-off mechanism of dropout as a stochastic process that alternates between active (evolution) and inactive (paused) states in continuous time. This provides a principled approach to prevent overfitting and enhance the generalization capabilities of NDEs. Moreover, Continuum Dropout offers a structured framework to quantify predictive uncertainty via Monte Carlo sampling at test time. Through extensive experiments, we demonstrate that Continuum Dropout outperforms existing regularization methods for NDEs, achieving superior performance on various time series and image classification tasks. It also yields better-calibrated and more trustworthy probability estimates, highlighting its effectiveness for uncertainty-aware modeling.
Abstract:Domain generalization (DG) aims to learn models that can generalize well to unseen domains by training only on a set of source domains. Sharpness-Aware Minimization (SAM) has been a popular approach for this, aiming to find flat minima in the total loss landscape. However, we show that minimizing the total loss sharpness does not guarantee sharpness across individual domains. In particular, SAM can converge to fake flat minima, where the total loss may exhibit flat minima, but sharp minima are present in individual domains. Moreover, the current perturbation update in gradient ascent steps is ineffective in directly updating the sharpness of individual domains. Motivated by these findings, we introduce a novel DG algorithm, Decreased-overhead Gradual Sharpness-Aware Minimization (DGSAM), that applies gradual domain-wise perturbation to reduce sharpness consistently across domains while maintaining computational efficiency. Our experiments demonstrate that DGSAM outperforms state-of-the-art DG methods, achieving improved robustness to domain shifts and better performance across various benchmarks, while reducing computational overhead compared to SAM.