Abstract:Physics-inspired inference often hinges on the ability to construct a likelihood, or the probability of observing a sequence of data given a model. These likelihoods can be directly maximized for parameter estimation, incorporated into Bayesian frameworks, or even used as loss functions in neural networks. Yet, many models, despite being conceptually simple, lack tractable likelihoods. A notable example arises in estimating protein production from snapshot measurements of actively dividing cells. Here, the challenge stems from cell divisions occurring at non-Exponentially distributed intervals with each division stochastically partitioning protein content between daughter cells, making protein counts in any given cell a function of its full division history. Such history dependence precludes a straightforward likelihood based on a (standard Markovian) master equation. Instead, we employ conditional normalizing flows (a class of neural network models designed to learn probability distributions) to approximate otherwise intractable likelihoods from simulated data. As a case study, we examine activation of the \emph{glc3} gene in yeast involved in glycogen synthesis and expressed under nutrient-limiting conditions. We monitor this activity using snapshot fluorescence measurements via flow cytometry, where GFP expression reflects \emph{glc3} promoter activity. A na\"ive analysis of flow cytometry data ignoring cell division suggests many cells are active with low expression. However, fluorescent proteins persist and can be inherited, so cells may appear active from retaining ancestral fluorescence. Explicitly accounting for the (non-Markovian) effects of cell division reveals \emph{glc3} is mostly inactive under stress, showing that while cells occasionally activate it, expression is brief and transient.




Abstract:State space models, such as Mamba, have recently garnered attention in time series forecasting due to their ability to capture sequence patterns. However, in electricity consumption benchmarks, Mamba forecasts exhibit a mean error of approximately 8\%. Similarly, in traffic occupancy benchmarks, the mean error reaches 18\%. This discrepancy leaves us to wonder whether the prediction is simply inaccurate or falls within error given spread in historical data. To address this limitation, we propose a method to quantify the predictive uncertainty of Mamba forecasts. Here, we propose a dual-network framework based on the Mamba architecture for probabilistic forecasting, where one network generates point forecasts while the other estimates predictive uncertainty by modeling variance. We abbreviate our tool, Mamba with probabilistic time series forecasting, as Mamba-ProbTSF and the code for its implementation is available on GitHub (https://github.com/PessoaP/Mamba-ProbTSF). Evaluating this approach on synthetic and real-world benchmark datasets, we find Kullback-Leibler divergence between the learned distributions and the data--which, in the limit of infinite data, should converge to zero if the model correctly captures the underlying probability distribution--reduced to the order of $10^{-3}$ for synthetic data and $10^{-1}$ for real-world benchmark, demonstrating its effectiveness. We find that in both the electricity consumption and traffic occupancy benchmark, the true trajectory stays within the predicted uncertainty interval at the two-sigma level about 95\% of the time. We end with a consideration of potential limitations, adjustments to improve performance, and considerations for applying this framework to processes for purely or largely stochastic dynamics where the stochastic changes accumulate, as observed for example in pure Brownian motion or molecular dynamics trajectories.
Abstract:Across the scientific realm, we find ourselves subtracting or dividing stochastic signals. For instance, consider a stochastic realization, $x$, generated from the addition or multiplication of two stochastic signals $a$ and $b$, namely $x=a+b$ or $x = ab$. For the $x=a+b$ example, $a$ can be fluorescence background and $b$ the signal of interest whose statistics are to be learned from the measured $x$. Similarly, when writing $x=ab$, $a$ can be thought of as the illumination intensity and $b$ the density of fluorescent molecules of interest. Yet dividing or subtracting stochastic signals amplifies noise, and we ask instead whether, using the statistics of $a$ and the measurement of $x$ as input, we can recover the statistics of $b$. Here, we show how normalizing flows can generate an approximation of the probability distribution over $b$, thereby avoiding subtraction or division altogether. This method is implemented in our software package, NFdeconvolve, available on GitHub with a tutorial linked in the main text.