Abstract:Modeling the continuous--time dynamics of probability distributions from time--dependent data samples is a fundamental problem in many fields, including digital health. The aim is to analyze how the distribution of a biomarker, such as glucose, evolves over time and how these changes may reflect the progression of chronic diseases such as diabetes. In this paper, we propose a novel probabilistic model based on a mixture of Gaussian distributions to capture how samples from a continuous-time stochastic process evolve over the time. To model potential distribution shifts over time, we introduce a time-dependent function parameterized by a Neural Ordinary Differential Equation (Neural ODE) and estimate it non--parametrically using the Maximum Mean Discrepancy (MMD). The proposed model is highly interpretable, detects subtle temporal shifts, and remains computationally efficient. Through simulation studies, we show that it performs competitively in terms of estimation accuracy against state-of-the-art, less interpretable methods such as normalized gradient--flows and non--parameteric kernel density estimators. Finally, we demonstrate the utility of our method on digital clinical--trial data, showing how the interventions alters the time-dependent distribution of glucose levels and enabling a rigorous comparison of control and treatment groups from novel mathematical and clinical perspectives.
Abstract:We study an approximate controllability problem for the continuity equation and its application to constructing transport maps with normalizing flows. Specifically, we construct time-dependent controls $\theta=(w, a, b)$ in the vector field $w(a^\top x + b)_+$ to approximately transport a known base density $\rho_{\mathrm{B}}$ to a target density $\rho_*$. The approximation error is measured in relative entropy, and $\theta$ are constructed piecewise constant, with bounds on the number of switches being provided. Our main result relies on an assumption on the relative tail decay of $\rho_*$ and $\rho_{\mathrm{B}}$, and provides hints on characterizing the reachable space of the continuity equation in relative entropy.
Abstract:Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
Abstract:Classification of $N$ points becomes a simultaneous control problem when viewed through the lens of neural ordinary differential equations (neural ODEs), which represent the time-continuous limit of residual networks. For the narrow model, with one neuron per hidden layer, it has been shown that the task can be achieved using $O(N)$ neurons. In this study, we focus on estimating the number of neurons required for efficient cluster-based classification, particularly in the worst-case scenario where points are independently and uniformly distributed in $[0,1]^d$. Our analysis provides a novel method for quantifying the probability of requiring fewer than $O(N)$ neurons, emphasizing the asymptotic behavior as both $d$ and $N$ increase. Additionally, under the sole assumption that the data are in general position, we propose a new constructive algorithm that simultaneously classifies clusters of $d$ points from any initial configuration, effectively reducing the maximal complexity to $O(N/d)$ neurons.