Abstract:Recent deep learning methods for fMRI-based diagnosis have achieved promising accuracy by modeling functional connectivity networks. However, standard approaches often struggle with noisy interactions, and conventional post-hoc attribution methods may lack reliability, potentially highlighting dataset-specific artifacts. To address these challenges, we introduce PIME, an interpretable framework that bridges intrinsic interpretability with minimal-sufficient subgraph optimization by integrating prototype-based classification and consistency training with structural perturbations during learning. This encourages a structured latent space and enables Monte Carlo Tree Search (MCTS) under a prototype-consistent objective to extract compact minimal-sufficient explanatory subgraphs post-training. Experiments on three benchmark fMRI datasets demonstrate that PIME achieves state-of-the-art performance. Furthermore, by constraining the search space via learned prototypes, PIME identifies critical brain regions that are consistent with established neuroimaging findings. Stability analysis shows 90% reproducibility and consistent explanations across atlases.
Abstract:Multimodal representation learning aims to construct a shared embedding space in which heterogeneous modalities are semantically aligned. Despite strong empirical results, InfoNCE-based objectives introduce inherent conflicts that yield distribution gaps across modalities. In this work, we identify two conflicts in the multimodal regime, both exacerbated as the number of modalities increases: (i) an alignment-uniformity conflict, whereby the repulsion of uniformity undermines pairwise alignment, and (ii) an intra-alignment conflict, where aligning multiple modalities induces competing alignment directions. To address these issues, we propose a principled decoupling of alignment and uniformity for multimodal representations, providing a conflict-free recipe for multimodal learning that simultaneously supports discriminative and generative use cases without task-specific modules. We then provide a theoretical guarantee that our method acts as an efficient proxy for a global Hölder divergence over multiple modality distributions, and thus reduces the distribution gap among modalities. Extensive experiments on retrieval and UnCLIP-style generation demonstrate consistent gains.
Abstract:Electroencephalography (EEG) analysis is critical for brain-computer interfaces and neuroscience, but the intrinsic noise and high dimensionality of EEG signals hinder effective feature learning. We propose a self-supervised framework based on the Hierarchical Functional Maximal Correlation Algorithm (HFMCA), which learns orthonormal EEG representations by enforcing feature decorrelation and reducing redundancy. This design enables robust capture of essential brain dynamics for various EEG recognition tasks. We validate HFMCA on two benchmark datasets, SEED and BCIC-2A, where pretraining with HFMCA consistently outperforms competitive self-supervised baselines, achieving notable gains in classification accuracy. Across diverse EEG tasks, our method demonstrates superior cross-subject generalization under leave-one-subject-out validation, advancing state-of-the-art by 2.71\% on SEED emotion recognition and 2.57\% on BCIC-2A motor imagery classification.
Abstract:Blind source separation, particularly through independent component analysis (ICA), is widely utilized across various signal processing domains for disentangling underlying components from observed mixed signals, owing to its fully data-driven nature that minimizes reliance on prior assumptions. However, conventional ICA methods rely on an assumption of linear mixing, limiting their ability to capture complex nonlinear relationships and to maintain robustness in noisy environments. In this work, we present deep deterministic nonlinear independent component analysis (DDICA), a novel deep neural network-based framework designed to address these limitations. DDICA leverages a matrix-based entropy function to directly optimize the independence criterion via stochastic gradient descent, bypassing the need for variational approximations or adversarial schemes. This results in a streamlined training process and improved resilience to noise. We validated the effectiveness and generalizability of DDICA across a range of applications, including simulated signal mixtures, hyperspectral image unmixing, modeling of primary visual receptive fields, and resting-state functional magnetic resonance imaging (fMRI) data analysis. Experimental results demonstrate that DDICA effectively separates independent components with high accuracy across a range of applications. These findings suggest that DDICA offers a robust and versatile solution for blind source separation in diverse signal processing tasks.
Abstract:Emotional states manifest as coordinated yet heterogeneous physiological responses across central and autonomic systems, posing a fundamental challenge for multimodal representation learning in affective computing. Learning such joint dynamics is further complicated by the scarcity and subjectivity of affective annotations, which motivates the use of self-supervised learning (SSL). However, most existing SSL approaches rely on pairwise alignment objectives, which are insufficient to characterize dependencies among more than two modalities and fail to capture higher-order interactions arising from coordinated brain and autonomic responses. To address this limitation, we propose Multimodal Functional Maximum Correlation (MFMC), a principled SSL framework that maximizes higher-order multimodal dependence through a Dual Total Correlation (DTC) objective. By deriving a tight sandwich bound and optimizing it using a functional maximum correlation analysis (FMCA) based trace surrogate, MFMC captures joint multimodal interactions directly, without relying on pairwise contrastive losses. Experiments on three public affective computing benchmarks demonstrate that MFMC consistently achieves state-of-the-art or competitive performance under both subject-dependent and subject-independent evaluation protocols, highlighting its robustness to inter-subject variability. In particular, MFMC improves subject-dependent accuracy on CEAP-360VR from 78.9% to 86.8%, and subject-independent accuracy from 27.5% to 33.1% using the EDA signal alone. Moreover, MFMC remains within 0.8 percentage points of the best-performing method on the most challenging EEG subject-independent split of MAHNOB-HCI. Our code is available at https://github.com/DY9910/MFMC.




Abstract:Multimodal regression aims to predict a continuous target from heterogeneous input sources and typically relies on fusion strategies such as early or late fusion. However, existing methods lack principled tools to disentangle and quantify the individual contributions of each modality and their interactions, limiting the interpretability of multimodal fusion. We propose a novel multimodal regression framework grounded in Partial Information Decomposition (PID), which decomposes modality-specific representations into unique, redundant, and synergistic components. The basic PID framework is inherently underdetermined. To resolve this, we introduce inductive bias by enforcing Gaussianity in the joint distribution of latent representations and the transformed response variable (after inverse normal transformation), thereby enabling analytical computation of the PID terms. Additionally, we derive a closed-form conditional independence regularizer to promote the isolation of unique information within each modality. Experiments on six real-world datasets, including a case study on large-scale brain age prediction from multimodal neuroimaging data, demonstrate that our framework outperforms state-of-the-art methods in both predictive accuracy and interpretability, while also enabling informed modality selection for efficient inference. Implementation is available at https://github.com/zhaozhaoma/PIDReg.
Abstract:Decoding motor imagery (MI) electroencephalogram (EEG) signals, a key non-invasive brain-computer interface (BCI) paradigm for controlling external systems, has been significantly advanced by deep learning. However, MI-EEG decoding remains challenging due to substantial inter-subject variability and limited labeled target data, which necessitate costly calibration for new users. Many existing multi-source domain adaptation (MSDA) methods indiscriminately incorporate all available source domains, disregarding the large inter-subject differences in EEG signals, which leads to negative transfer and excessive computational costs. Moreover, while many approaches focus on feature distribution alignment, they often neglect the explicit dependence between features and decision-level outputs, limiting their ability to preserve discriminative structures. To address these gaps, we propose a novel MSDA framework that leverages a pretrained large Brain Foundation Model (BFM) for dynamic and informed source subject selection, ensuring only relevant sources contribute to adaptation. Furthermore, we employ Cauchy-Schwarz (CS) and Conditional CS (CCS) divergences to jointly perform feature-level and decision-level alignment, enhancing domain invariance while maintaining class discriminability. Extensive evaluations on two benchmark MI-EEG datasets demonstrate that our framework outperforms a broad range of state-of-the-art baselines. Additional experiments with a large source pool validate the scalability and efficiency of BFM-guided selection, which significantly reduces training time without sacrificing performance.
Abstract:Recent evidence suggests that modeling higher-order interactions (HOIs) in functional magnetic resonance imaging (fMRI) data can enhance the diagnostic accuracy of machine learning systems. However, effectively extracting and utilizing HOIs remains a significant challenge. In this work, we propose MvHo-IB, a novel multi-view learning framework that integrates both pairwise interactions and HOIs for diagnostic decision-making, while automatically compressing task-irrelevant redundant information. MvHo-IB introduces several key innovations: (1) a principled method that combines O-information from information theory with a matrix-based Renyi alpha-order entropy estimator to quantify and extract HOIs, (2) a purpose-built Brain3DCNN encoder to effectively utilize these interactions, and (3) a new multi-view learning information bottleneck objective to enhance representation learning. Experiments on three benchmark fMRI datasets demonstrate that MvHo-IB achieves state-of-the-art performance, significantly outperforming previous methods, including recent hypergraph-based techniques. The implementation of MvHo-IB is available at https://github.com/zky04/MvHo-IB.
Abstract:Extracting meaningful latent representations from high-dimensional sequential data is a crucial challenge in machine learning, with applications spanning natural science and engineering. We introduce InfoDPCCA, a dynamic probabilistic Canonical Correlation Analysis (CCA) framework designed to model two interdependent sequences of observations. InfoDPCCA leverages a novel information-theoretic objective to extract a shared latent representation that captures the mutual structure between the data streams and balances representation compression and predictive sufficiency while also learning separate latent components that encode information specific to each sequence. Unlike prior dynamic CCA models, such as DPCCA, our approach explicitly enforces the shared latent space to encode only the mutual information between the sequences, improving interpretability and robustness. We further introduce a two-step training scheme to bridge the gap between information-theoretic representation learning and generative modeling, along with a residual connection mechanism to enhance training stability. Through experiments on synthetic and medical fMRI data, we demonstrate that InfoDPCCA excels as a tool for representation learning. Code of InfoDPCCA is available at https://github.com/marcusstang/InfoDPCCA.
Abstract:Multimodal learning with variational autoencoders (VAEs) requires estimating joint distributions to evaluate the evidence lower bound (ELBO). Current methods, the product and mixture of experts, aggregate single-modality distributions assuming independence for simplicity, which is an overoptimistic assumption. This research introduces a novel methodology for aggregating single-modality distributions by exploiting the principle of consensus of dependent experts (CoDE), which circumvents the aforementioned assumption. Utilizing the CoDE method, we propose a novel ELBO that approximates the joint likelihood of the multimodal data by learning the contribution of each subset of modalities. The resulting CoDE-VAE model demonstrates better performance in terms of balancing the trade-off between generative coherence and generative quality, as well as generating more precise log-likelihood estimations. CoDE-VAE further minimizes the generative quality gap as the number of modalities increases. In certain cases, it reaches a generative quality similar to that of unimodal VAEs, which is a desirable property that is lacking in most current methods. Finally, the classification accuracy achieved by CoDE-VAE is comparable to that of state-of-the-art multimodal VAE models.