Abstract:Dynamical modeling of multisite human intracranial neural recordings is essential for developing neurotechnologies such as brain-computer interfaces (BCIs). Linear dynamical models are widely used for this purpose due to their interpretability and their suitability for BCIs. In particular, these models enable flexible real-time inference, even in the presence of missing neural samples, which often occur in wireless BCIs. However, neural activity can exhibit nonlinear structure that is not captured by linear models. Furthermore, while recurrent neural network models can capture nonlinearity, their inference does not directly address handling missing observations. To address this gap, recent work introduced DFINE, a deep learning framework that integrates neural networks with linear state-space models to capture nonlinearities while enabling flexible inference. However, DFINE was developed for intracortical recordings that measure localized neuronal populations. Here we extend DFINE to modeling of multisite human intracranial electroencephalography (iEEG) recordings. We find that DFINE significantly outperforms linear state-space models (LSSMs) in forecasting future neural activity. Furthermore, DFINE matches or exceeds the accuracy of a gated recurrent unit (GRU) model in neural forecasting, indicating that a linear dynamical backbone, when paired and jointly trained with nonlinear neural networks, can effectively describe the dynamics of iEEG signals while also enabling flexible inference. Additionally, DFINE handles missing observations more robustly than the baselines, demonstrating its flexible inference and utility for BCIs. Finally, DFINE's advantage over LSSM is more pronounced in high gamma spectral bands. Taken together, these findings highlight DFINE as a strong and flexible framework for modeling human iEEG dynamics, with potential applications in next-generation BCIs.
Abstract:Neural population activity often exhibits regime-dependent non-stationarity in the form of switching dynamics. Learning accurate switching dynamical system models can reveal how behavior is encoded in neural activity. Existing switching approaches have primarily focused on learning models from a single neural modality, either continuous Gaussian signals or discrete Poisson signals. However, multiple neural modalities are often recorded simultaneously to measure different spatiotemporal scales of brain activity, and all these modalities can encode behavior. Moreover, regime labels are typically unavailable in training data, posing a significant challenge for learning models of regime-dependent switching dynamics. To address these challenges, we develop a novel unsupervised learning algorithm that learns the parameters of switching multiscale dynamical system models using only multiscale neural observations. We demonstrate our method using both simulations and two distinct experimental datasets with multimodal spike-LFP observations during different motor tasks. We find that our switching multiscale dynamical system models more accurately decode behavior than switching single-scale dynamical models, showing the success of multiscale neural fusion. Further, our models outperform stationary multiscale models, illustrating the importance of tracking regime-dependent non-stationarity in multimodal neural data. The developed unsupervised learning framework enables more accurate modeling of complex multiscale neural dynamics by leveraging information in multimodal recordings while incorporating regime switches. This approach holds promise for improving the performance and robustness of brain-computer interfaces over time and for advancing our understanding of the neural basis of behavior.
Abstract:Real-time decoding of target variables from multiple simultaneously recorded neural time-series modalities, such as discrete spiking activity and continuous field potentials, is important across various neuroscience applications. However, a major challenge for doing so is that different neural modalities can have different timescales (i.e., sampling rates) and different probabilistic distributions, or can even be missing at some time-steps. Existing nonlinear models of multimodal neural activity do not address different timescales or missing samples across modalities. Further, some of these models do not allow for real-time decoding. Here, we develop a learning framework that can enable real-time recursive decoding while nonlinearly aggregating information across multiple modalities with different timescales and distributions and with missing samples. This framework consists of 1) a multiscale encoder that nonlinearly aggregates information after learning within-modality dynamics to handle different timescales and missing samples in real time, 2) a multiscale dynamical backbone that extracts multimodal temporal dynamics and enables real-time recursive decoding, and 3) modality-specific decoders to account for different probabilistic distributions across modalities. In both simulations and three distinct multiscale brain datasets, we show that our model can aggregate information across modalities with different timescales and distributions and missing samples to improve real-time target decoding. Further, our method outperforms various linear and nonlinear multimodal benchmarks in doing so.
Abstract:Intracranial recordings have opened a unique opportunity to simultaneously measure activity across multiregional networks in the human brain. Recent works have focused on developing transformer-based neurofoundation models of such recordings that can generalize across subjects and datasets. However, these recordings exhibit highly complex spatiotemporal interactions across diverse spatial scales, from the single-channel scale to the scale of brain regions. As such, there remain critical open questions regarding how best to encode spatial information and how to design self-supervision tasks that enable the learning of brain network patterns and enhance downstream decoding performance using such high-dimensional, multiregional recordings. To allow for exploring these questions, we propose a new spatiotemporal transformer model of multiregional neural activity and a corresponding self-supervised masked latent reconstruction task, designed to enable flexibility in the spatial scale used for token encoding and masking. Applying this model on publicly available multiregional intracranial electrophysiology (iEEG) data, we demonstrate that adjusting the spatial scale for both token encoding and masked reconstruction significantly impacts downstream decoding. Further, we find that spatial encoding at larger scales than channel-level encoding, which is commonly used in existing iEEG transformer models, improves downstream decoding performance. Finally, we demonstrate that our method allows for region-level token encoding while also maintaining accurate channel-level neural reconstruction. Taken together, our modeling framework enables exploration of the spatial scales used for token encoding and masking, reveals their importance towards self-supervised pretraining of neurofoundation models of multiregional human brain activity, and enhances downstream decoding performance.
Abstract:Local field potentials (LFPs) can be routinely recorded alongside spiking activity in intracortical neural experiments, measure a larger complementary spatiotemporal scale of brain activity for scientific inquiry, and can offer practical advantages over spikes, including greater long-term stability, robustness to electrode degradation, and lower power requirements. Despite these advantages, recent neural modeling frameworks have largely focused on spiking activity since LFP signals pose inherent modeling challenges due to their aggregate, population-level nature, often leading to lower predictive power for downstream task variables such as motor behavior. To address this challenge, we introduce a cross-modal knowledge distillation framework that transfers high-fidelity representational knowledge from pretrained multi-session spike transformer models to LFP transformer models. Specifically, we first train a teacher spike model across multiple recording sessions using a masked autoencoding objective with a session-specific neural tokenization strategy. We then align the latent representations of the student LFP model to those of the teacher spike model. Our results show that the Distilled LFP models consistently outperform single- and multi-session LFP baselines in both fully unsupervised and supervised settings, and can generalize to other sessions without additional distillation while maintaining superior performance. These findings demonstrate that cross-modal knowledge distillation is a powerful and scalable approach for leveraging high-performing spike models to develop more accurate LFP models.