Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Erdrin Azemi

Learning the relative composition of EEG signals using pairwise relative shift pretraining

Nov 14, 2025

Christopher Sandino, Sayeri Lala, Geeling Chau, Melika Ayoughi, Behrooz Mahasseni, Ellen Zippi, Ali Moin, Erdrin Azemi, Hanlin Goh

Abstract:Self-supervised learning (SSL) offers a promising approach for learning electroencephalography (EEG) representations from unlabeled data, reducing the need for expensive annotations for clinical applications like sleep staging and seizure detection. While current EEG SSL methods predominantly use masked reconstruction strategies like masked autoencoders (MAE) that capture local temporal patterns, position prediction pretraining remains underexplored despite its potential to learn long-range dependencies in neural signals. We introduce PAirwise Relative Shift or PARS pretraining, a novel pretext task that predicts relative temporal shifts between randomly sampled EEG window pairs. Unlike reconstruction-based methods that focus on local pattern recovery, PARS encourages encoders to capture relative temporal composition and long-range dependencies inherent in neural signals. Through comprehensive evaluation on various EEG decoding tasks, we demonstrate that PARS-pretrained transformers consistently outperform existing pretraining strategies in label-efficient and transfer learning settings, establishing a new paradigm for self-supervised EEG representation learning.

* Foundation Models for the Brain and Body NeurIPS 2025 Workshop

Via

Access Paper or Ask Questions

CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals

Sep 04, 2025

Wenhui Cui, Christopher Sandino, Hadi Pouransari, Ran Liu, Juri Minxha, Ellen L. Zippi, Aman Verma, Anna Sedlackova, Behrooz Mahasseni, Erdrin Azemi

Figure 1 for CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals

Figure 2 for CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals

Figure 3 for CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals

Figure 4 for CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals

Abstract:Hand gesture classification using high-quality structured data such as videos, images, and hand skeletons is a well-explored problem in computer vision. Leveraging low-power, cost-effective biosignals, e.g. surface electromyography (sEMG), allows for continuous gesture prediction on wearables. In this paper, we demonstrate that learning representations from weak-modality data that are aligned with those from structured, high-quality data can improve representation quality and enables zero-shot classification. Specifically, we propose a Contrastive Pose-EMG Pre-training (CPEP) framework to align EMG and pose representations, where we learn an EMG encoder that produces high-quality and pose-informative representations. We assess the gesture classification performance of our model through linear probing and zero-shot setups. Our model outperforms emg2pose benchmark models by up to 21% on in-distribution gesture classification and 72% on unseen (out-of-distribution) gesture classification.

Via

Access Paper or Ask Questions

Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

May 29, 2025

Jingping Nie, Dung T. Tran, Karan Thakkar, Vasudha Kowtha, Jon Huang, Carlos Avendano, Erdrin Azemi, Vikramjit Mitra

Figure 1 for Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

Figure 2 for Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

Figure 3 for Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

Figure 4 for Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation

Abstract:Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information. Recently, self-supervised acoustic representation foundation models (FMs) have been proposed to offer insights into acoustics-based vital signs. However, there has been little exploration of the extent to which auscultation is encoded in these pre-trained FM representations. In this work, using a publicly available phonocardiogram (PCG) dataset and a heart rate (HR) estimation model, we conduct a layer-wise investigation of six acoustic representation FMs: HuBERT, wav2vec2, wavLM, Whisper, Contrastive Language-Audio Pretraining (CLAP), and an in-house CLAP model. Additionally, we implement the baseline method from Nie et al., 2024 (which relies on acoustic features) and show that overall, representation vectors from pre-trained foundation models (FMs) offer comparable performance to the baseline. Notably, HR estimation using the representations from the audio encoder of the in-house CLAP model outperforms the results obtained from the baseline, achieving a lower mean absolute error (MAE) across various train/validation/test splits despite the domain mismatch.

* 5 pages, Interspeech 2025 conference

Via

Access Paper or Ask Questions

Promoting cross-modal representations to improve multimodal foundation models for physiological signals

Oct 21, 2024

Ching Fang, Christopher Sandino, Behrooz Mahasseni, Juri Minxha, Hadi Pouransari, Erdrin Azemi, Ali Moin, Ellen Zippi

Figure 1 for Promoting cross-modal representations to improve multimodal foundation models for physiological signals

Figure 2 for Promoting cross-modal representations to improve multimodal foundation models for physiological signals

Figure 3 for Promoting cross-modal representations to improve multimodal foundation models for physiological signals

Figure 4 for Promoting cross-modal representations to improve multimodal foundation models for physiological signals

Abstract:Many healthcare applications are inherently multimodal, involving several physiological signals. As sensors for these signals become more common, improving machine learning methods for multimodal healthcare data is crucial. Pretraining foundation models is a promising avenue for success. However, methods for developing foundation models in healthcare are still in early exploration and it is unclear which pretraining strategies are most effective given the diversity of physiological signals. This is partly due to challenges in multimodal health data: obtaining data across many patients is difficult and costly, there is a lot of inter-subject variability, and modalities are often heterogeneously informative across downstream tasks. Here, we explore these challenges in the PhysioNet 2018 dataset. We use a masked autoencoding objective to pretrain a multimodal model. We show that the model learns representations that can be linearly probed for a diverse set of downstream tasks. We hypothesize that cross-modal reconstruction objectives are important for successful multimodal training, as they encourage the model to integrate information across modalities. We demonstrate that modality dropout in the input space improves performance across downstream tasks. We also find that late-fusion models pretrained with contrastive learning objectives are less effective across multiple tasks. Finally, we analyze the model's representations, showing that attention weights become more cross-modal and temporally aligned with our pretraining strategy. The learned embeddings also become more distributed in terms of the modalities encoded by each unit. Overall, our work demonstrates the utility of multimodal foundation models with health data, even across diverse physiological data sources. We further argue that explicit methods for inducing cross-modality may enhance multimodal pretraining strategies.

* NeurIPS 2024 AIM-FM Workshop

Via

Access Paper or Ask Questions

Generalizable autoregressive modeling of time series through functional narratives

Oct 10, 2024

Ran Liu, Wenrui Ma, Ellen Zippi, Hadi Pouransari, Jingyun Xiao, Chris Sandino, Behrooz Mahasseni, Juri Minxha, Erdrin Azemi, Eva L. Dyer(+1 more)

Figure 1 for Generalizable autoregressive modeling of time series through functional narratives

Figure 2 for Generalizable autoregressive modeling of time series through functional narratives

Figure 3 for Generalizable autoregressive modeling of time series through functional narratives

Figure 4 for Generalizable autoregressive modeling of time series through functional narratives

Abstract:Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time periods, overlooking their functional properties. In this work, we propose a novel objective for transformers that learn time series by re-interpreting them as temporal functions. We build an alternative sequence of time series by constructing degradation operators of different intensity in the functional space, creating augmented variants of the original sample that are abstracted or simplified to different degrees. Based on the new set of generated sequence, we train an autoregressive transformer that progressively recovers the original sample from the most simplified variant. Analogous to the next word prediction task in languages that learns narratives by connecting different words, our autoregressive transformer aims to learn the Narratives of Time Series (NoTS) by connecting different functions in time. Theoretically, we justify the construction of the alternative sequence through its advantages in approximating functions. When learning time series data with transformers, constructing sequences of temporal functions allows for a broader class of approximable functions (e.g., differentiation) compared to sequences of time periods, leading to a 26\% performance improvement in synthetic feature regression experiments. Experimentally, we validate NoTS in 3 different tasks across 22 real-world datasets, where we show that NoTS significantly outperforms other pre-training methods by up to 6\%. Additionally, combining NoTS on top of existing transformer architectures can consistently boost the performance. Our results demonstrate the potential of NoTS as a general-purpose dynamic learner, offering a viable alternative for developing foundation models for time series analysis.

Via

Access Paper or Ask Questions

Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement

Oct 03, 2024

Gaurav Patel, Christopher Sandino, Behrooz Mahasseni, Ellen L Zippi, Erdrin Azemi, Ali Moin, Juri Minxha

Abstract:In this paper, we propose a framework for efficient Source-Free Domain Adaptation (SFDA) in the context of time-series, focusing on enhancing both parameter efficiency and data-sample utilization. Our approach introduces an improved paradigm for source-model preparation and target-side adaptation, aiming to enhance training efficiency during target adaptation. Specifically, we reparameterize the source model's weights in a Tucker-style decomposed manner, factorizing the model into a compact form during the source model preparation phase. During target-side adaptation, only a subset of these decomposed factors is fine-tuned, leading to significant improvements in training efficiency. We demonstrate using PAC Bayesian analysis that this selective fine-tuning strategy implicitly regularizes the adaptation process by constraining the model's learning capacity. Furthermore, this re-parameterization reduces the overall model size and enhances inference efficiency, making the approach particularly well suited for resource-constrained devices. Additionally, we demonstrate that our framework is compatible with various SFDA methods and achieves significant computational efficiency, reducing the number of fine-tuned parameters and inference overhead in terms of MACs by over 90% while maintaining model performance.

Via

Access Paper or Ask Questions

Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram

Jul 25, 2024

Jingping Nie, Ran Liu, Behrooz Mahasseni, Erdrin Azemi, Vikramjit Mitra

Figure 1 for Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram

Figure 2 for Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram

Figure 3 for Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram

Figure 4 for Model-driven Heart Rate Estimation and Heart Murmur Detection based on Phonocardiogram

Abstract:Acoustic signals are crucial for health monitoring, particularly heart sounds which provide essential data like heart rate and detect cardiac anomalies such as murmurs. This study utilizes a publicly available phonocardiogram (PCG) dataset to estimate heart rate using model-driven methods and extends the best-performing model to a multi-task learning (MTL) framework for simultaneous heart rate estimation and murmur detection. Heart rate estimates are derived using a sliding window technique on heart sound snippets, analyzed with a combination of acoustic features (Mel spectrogram, cepstral coefficients, power spectral density, root mean square energy). Our findings indicate that a 2D convolutional neural network (\textbf{\texttt{2dCNN}}) is most effective for heart rate estimation, achieving a mean absolute error (MAE) of 1.312 bpm. We systematically investigate the impact of different feature combinations and find that utilizing all four features yields the best results. The MTL model (\textbf{\texttt{2dCNN-MTL}}) achieves accuracy over 95% in murmur detection, surpassing existing models, while maintaining an MAE of 1.636 bpm in heart rate estimation, satisfying the requirements stated by Association for the Advancement of Medical Instrumentation (AAMI).

* 6 pages, 10 figures

Via

Access Paper or Ask Questions

Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech

Jul 17, 2024

Vikramjit Mitra, Anirban Chatterjee, Ke Zhai, Helen Weng, Ayuko Hill, Nicole Hay, Christopher Webb, Jamie Cheng, Erdrin Azemi

Figure 1 for Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech

Figure 2 for Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech

Figure 3 for Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech

Figure 4 for Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech

Abstract:The process of human speech production involves coordinated respiratory action to elicit acoustic speech signals. Typically, speech is produced when air is forced from the lungs and is modulated by the vocal tract, where such actions are interspersed by moments of breathing in air (inhalation) to refill the lungs again. Respiratory rate (RR) is a vital metric that is used to assess the overall health, fitness, and general well-being of an individual. Existing approaches to measure RR (number of breaths one takes in a minute) are performed using specialized equipment or training. Studies have demonstrated that machine learning algorithms can be used to estimate RR using bio-sensor signals as input. Speech-based estimation of RR can offer an effective approach to measure the vital metric without requiring any specialized equipment or sensors. This work investigates a machine learning based approach to estimate RR from speech segments obtained from subjects speaking to a close-talking microphone device. Data were collected from N=26 individuals, where the groundtruth RR was obtained through commercial grade chest-belts and then manually corrected for any errors. A convolutional long-short term memory network (Conv-LSTM) is proposed to estimate respiration time-series data from the speech signal. We demonstrate that the use of pre-trained representations obtained from a foundation model, such as Wav2Vec2, can be used to estimate respiration-time-series with low root-mean-squared error and high correlation coefficient, when compared with the baseline. The model-driven time series can be used to estimate $RR$ with a low mean absolute error (MAE) ~ 1.6 breaths/min.

* 8 pages, 6 figures, BioKDD workshop paper

Via

Access Paper or Ask Questions

Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis

Dec 17, 2023

Vikramjit Mitra, Jingping Nie, Erdrin Azemi

Figure 1 for Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis

Figure 2 for Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis

Figure 3 for Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis

Figure 4 for Investigating salient representations and label Variance in Dimensional Speech Emotion Analysis

Abstract:Representations derived from models such as BERT (Bidirectional Encoder Representations from Transformers) and HuBERT (Hidden units BERT), have helped to achieve state-of-the-art performance in dimensional speech emotion recognition. Despite their large dimensionality, and even though these representations are not tailored for emotion recognition tasks, they are frequently used to train large speech emotion models with high memory and computational costs. In this work, we show that there exist lower-dimensional subspaces within the these pre-trained representational spaces that offer a reduction in downstream model complexity without sacrificing performance on emotion estimation. In addition, we model label uncertainty in the form of grader opinion variance, and demonstrate that such information can improve the models generalization capacity and robustness. Finally, we compare the robustness of the emotion models against acoustic degradations and observed that the reduced dimensional representations were able to retain the performance similar to the full-dimensional representations without significant regression in dimensional emotion performance.

* ICASSP 2024
* 5 pages

Via

Access Paper or Ask Questions

Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Sep 12, 2023

Ran Liu, Ellen L. Zippi, Hadi Pouransari, Chris Sandino, Jingping Nie, Hanlin Goh, Erdrin Azemi, Ali Moin

Figure 1 for Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Figure 2 for Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Figure 3 for Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Figure 4 for Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Abstract:Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence of potential distributional shifts, we propose a frequency-aware masked autoencoder ($\texttt{bio}$FAME) that learns to parameterize the representation of biosignals in the frequency space. $\texttt{bio}$FAME incorporates a frequency-aware transformer, which leverages a fixed-size Fourier-based operator for global token mixing, independent of the length and sampling rate of inputs. To maintain the frequency components within each input channel, we further employ a frequency-maintain pretraining strategy that performs masked autoencoding in the latent space. The resulting architecture effectively utilizes multimodal information during pretraining, and can be seamlessly adapted to diverse tasks and modalities at test time, regardless of input size and order. We evaluated our approach on a diverse set of transfer experiments on unimodal time series, achieving an average of $\uparrow$5.5% improvement in classification accuracy over the previous state-of-the-art. Furthermore, we demonstrated that our architecture is robust in modality mismatch scenarios, including unpredicted modality dropout or substitution, proving its practical utility in real-world applications. Code will be available soon.

Via

Access Paper or Ask Questions