Abstract:We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data. Conventional cross-validation frameworks that reuse the same folds for both model selection and performance estimation yield optimistically biased results, limiting reproducibility and generalization. Demonstrated on a high-dimensional structural MRI dataset of deep brain stimulation cognitive outcomes, the framework achieved a nested-CV balanced accuracy of 0.660\,$\pm$\,0.068 using a compact, interpretable subset selected via importance-guided ranking. By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains.
Abstract:We introduce Multi-scale Adaptive Recurrent Biomedical Linear-time Encoder (MARBLE), the first \textit{purely Mamba-based} multi-state multiple instance learning (MIL) framework for whole-slide image (WSI) analysis. MARBLE processes multiple magnification levels in parallel and integrates coarse-to-fine reasoning within a linear-time state-space model, efficiently capturing cross-scale dependencies with minimal parameter overhead. WSI analysis remains challenging due to gigapixel resolutions and hierarchical magnifications, while existing MIL methods typically operate at a single scale and transformer-based approaches suffer from quadratic attention costs. By coupling parallel multi-scale processing with linear-time sequence modeling, MARBLE provides a scalable and modular alternative to attention-based architectures. Experiments on five public datasets show improvements of up to \textbf{6.9\%} in AUC, \textbf{20.3\%} in accuracy, and \textbf{2.3\%} in C-index, establishing MARBLE as an efficient and generalizable framework for multi-scale WSI analysis.
Abstract:High-resolution spatial transcriptomics platforms, such as Xenium, generate single-cell images that capture both molecular and spatial context, but their extremely high dimensionality poses major challenges for representation learning and clustering. In this study, we analyze data from the Xenium platform, which captures high-resolution images of tumor microarray (TMA) tissues and converts them into cell-by-gene matrices suitable for computational analysis. We benchmark and extend nonnegative matrix factorization (NMF) for spatial transcriptomics by introducing two spatially regularized variants. First, we propose Spatial NMF (SNMF), a lightweight baseline that enforces local spatial smoothness by diffusing each cell's NMF factor vector over its spatial neighborhood. Second, we introduce Hybrid Spatial NMF (hSNMF), which performs spatially regularized NMF followed by Leiden clustering on a hybrid adjacency that integrates spatial proximity (via a contact-radius graph) and transcriptomic similarity through a tunable mixing parameter alpha. Evaluated on a cholangiocarcinoma dataset, SNMF and hSNMF achieve markedly improved spatial compactness (CHAOS < 0.004, Moran's I > 0.96), greater cluster separability (Silhouette > 0.12, DBI < 1.8), and higher biological coherence (CMC and enrichment) compared to other spatial baselines. Availability and implementation: https://github.com/ishtyaqmahmud/hSNMF