Gaussian process regression (GPR) is a non-parametric regression technique that models the relationship between input and output variables.
Microcalcification (MC) analysis is clinically important in screening mammography because clustered puncta can be an early sign of malignancy, yet dense MC segmentation remains challenging: targets are extremely small and sparse, dense pixel-level labels are expensive and ambiguous, and cross-site shift often induces texture-driven false positives and missed puncta in dense tissue. We propose MC-GenRef, a real dense-label-free framework that combines high-fidelity synthetic supervision with test-time generative posterior refinement (TT-GPR). During training, real negative mammogram patches are used as backgrounds, and physically plausible MC patterns are injected through a lightweight image formation model with local contrast modulation and blur, yielding exact image-mask pairs without real dense annotation. Using only these synthetic labeled pairs, MC-GenRef trains a base segmentor and a seed-conditioned rectified-flow (RF) generator that serves as a controllable generative prior. During inference, TT-GPR treats segmentation as approximate posterior inference: it derives a sparse seed from the current prediction, forms seed-consistent RF projections, converts them into case-specific surrogate targets through the frozen segmentor, and iteratively refines the logits with overlap-consistent and edge-aware regularization. On INbreast, the synthetic-only initializer achieved the best Dice without real dense annotations, while TT-GPR improved miss-sensitive performance to Recall and FNR, with strong class-balanced behavior (Bal.Acc., G-Mean). On an external private Yonsei cohort ( n=50 ), TT-GPR consistently improved the synthetic-only initializer under cross-site shift, increasing Dice and Recall while reducing FNR. These results suggest that test-time generative posterior refinement is a practical route to reduce MC misses and improve robustness without additional real dense labeling.
We develop a tensor-network surrogate for option pricing, targeting large-scale portfolio revaluation problems arising in market risk management (e.g., VaR and Expected Shortfall computations). The method involves representing high-dimensional price surfaces in tensor-train (TT) form using TT-cross approximation, constructing the surrogate directly from black-box price evaluations without materializing the full training tensor. For inference, we use a Laplacian kernel and derive TT representations of the kernel matrix and its closed-form inverse in the noise-free setting, enabling TT-based Gaussian process regression without dense matrix factorization or iterative linear solves. We found that hyperparameter optimization consistently favors a large kernel length-scale and show that in this regime the GPR predictor reduces to multilinear interpolation for off-grid inputs; we also derive a low-rank TT representation for this limit. We evaluate the approach on five-asset basket options over an eight dimensional parameter space (asset spot levels, strike, interest rate, and time to maturity). For European geometric basket puts, the tensor surrogate achieves lower test error at shorter training times than standard GPR by scaling to substantially larger effective training sets. For American arithmetic basket puts trained on LSMC data, the surrogate exhibits more favorable scaling with training-set size while providing millisecond-level evaluation per query, with overall runtime dominated by data generation.
This paper addresses the modeling gap between complex dispersive-medium characterization and clutter statistical analysis in single-snapshot frequency diverse array multiple-input multiple-output ground-penetrating radar (FDA-MIMO-GPR). Existing FDA-MIMO clutter studies have rarely incorporated subsurface dispersion, dissipation, and random inhomogeneity in an explicit statistical framework. To bridge this gap, a continuous relaxation spectrum is adopted to describe complex media, and a statistical propagation chain is established from random relaxation-spectrum perturbations to complex permittivity, complex wavenumber, steering-vector perturbation, medium-induced additional clutter covariance, and total clutter covariance. On this basis, the effects of medium randomness on covariance spectral spreading, effective rank, effective clutter-subspace dimension, and target-clutter separability are further characterized. Numerical results show close agreement between the derived theory and Monte Carlo sample statistics across multiple stages of the propagation chain. The results further indicate that medium uncertainty not only changes clutter-covariance entries, but also reshapes its eigenspectrum and effective subspace, thereby influencing the geometric separation between target and clutter. The study provides an explicit and interpretable theoretical interface for embedding complex-medium uncertainty into FDA-MIMO-GPR clutter statistical analysis.
Supervised machine learning describes the practice of fitting a parameterized model to labeled input-output data. Supervised machine learning methods have demonstrated promise in learning efficient surrogate models that can (partially) replace expensive high-fidelity models, making many-query analyses, such as optimization, uncertainty quantification, and inference, tractable. However, when training data must be obtained through the evaluation of an expensive model or experiment, the amount of training data that can be obtained is often limited, which can make learned surrogate models unreliable. However, in many engineering and scientific settings, cheaper \emph{low-fidelity} models may be available, for example arising from simplified physics modeling or coarse grids. These models may be used to generate additional low-fidelity training data. The goal of \emph{multifidelity} machine learning is to use both high- and low-fidelity training data to learn a surrogate model which is cheaper to evaluate than the high-fidelity model, but more accurate than any available low-fidelity model. This work proposes a new multifidelity training approach for Gaussian process regression which uses low-fidelity data to define additional features that augment the input space of the learned model. The approach unites desirable properties from two separate classes of existing multifidelity GPR approaches, cokriging and autoregressive estimators. Numerical experiments on several test problems demonstrate both increased predictive accuracy and reduced computational cost relative to the state of the art.
Epoxy polymers are widely used due to their multifunctional properties, but machine learning (ML) applications remain limited owing to their complex 3D molecular structure, multi-component nature, and lack of curated datasets. Existing ML studies are largely restricted to simulation data, specific properties, or narrow constituent ranges. To address these limitations, we developed an informed Gaussian Process Regression-based Knowledge Distillation (GPR-KD) framework for predicting multiple physical (glass transition temperature, density) and mechanical properties (elastic modulus, tensile strength, compressive strength, flexural strength, fracture energy, adhesive strength) of thermoset epoxy polymers. The model was trained on experimental literature data covering diverse monomer classes (9 resins, 40 hardeners). Individual GPR models serve as teacher models capturing nonlinear feature-property relationships, while a unified neural network student model learns distilled knowledge across all properties simultaneously. By encoding the target property as an input feature, the student model leverages cross-property correlations. Molecular-level descriptors extracted from SMILES representations using RDKit create a physics-informed model. The framework combines GPR interpretability and robustness with deep learning scalability and generalization. Comparative analysis demonstrates superior prediction accuracy over conventional ML models. Simultaneous multi-property prediction further improves accuracy through information sharing across correlated properties. The proposed framework enables accelerated design of novel epoxy polymers with tailored properties.
Spectrum sensing and the generation of 3D Radio Environment Maps (REMs) are essential for enabling spectrum sharing within cognitive radio networks. While Uncrewed Aerial Vehicles (UAVs) offer high-mobility 3D sensing, REM accuracy is challenged by dynamic flight behaviors, where fluctuations in UAV speed and direction introduce measurement inconsistencies. Furthermore, the structural influence of the airframe itself impacts the onboard antenna's radiation characteristics. In this paper, we present a comprehensive analysis of REM reconstruction at various altitudes, using real-world data from a fixed base station tower and a ground-vehicle source. We evaluate diverse reconstruction methodologies, including Kriging (simple, ordinary, and trans-Gaussian), matrix completion, and Gaussian process regression (GPR) for recovery from sparse samples. Our results indicate that simple Kriging and GPR remain more robust under extreme sample sparsity. We also propose a framework to enhance reconstruction accuracy in deep-shadowed regions by decomposing the REM into distinct smooth and deep-shadowed spatial components. We further investigate how REM reconstruction performance is influenced by physical and UAV-related external parameters. First, we demonstrate that the impact of UAV altitude on accuracy follows a tri-phasic trend: an initial performance gain up to $h_1$, a performance dip between $h_1$ and $h_2$, and a final stage of increasing accuracy. Additionally, we show that performance improves with increased spectrum bandwidth. Second, our analysis of UAV trajectories reveals that the variance of shadow fading exhibits a non-monotonic trend, peaking at both very low and mid-high elevation angles. Finally, we demonstrate that antenna pattern calibration from in-field measurements significantly enhances REM reconstruction accuracy by accounting for shadowing induced by the UAV airframe.
Autonomous mobile robots offer promising solutions for labor shortages and increased operational efficiency. However, navigating safely and effectively in dynamic environments, particularly crowded areas, remains challenging. This paper proposes a novel framework that integrates Vision-Language Models (VLM) and Gaussian Process Regression (GPR) to generate dynamic crowd-density maps (``Abstraction Maps'') for autonomous robot navigation. Our approach utilizes VLM's capability to recognize abstract environmental concepts, such as crowd densities, and represents them probabilistically via GPR. Experimental results from real-world trials on a university campus demonstrated that robots successfully generated routes avoiding both static obstacles and dynamic crowds, enhancing navigation safety and adaptability.
Ground-penetrating radar (GPR) has emerged as a prominent tool for imaging internal defects in cylindrical structures, such as columns, utility poles, and tree trunks. However, accurately reconstructing both the shape and permittivity of the defects inside cylindrical structures remains challenging due to complex wave scattering phenomena and the limited accuracy of the existing signal processing and deep learning techniques. To address these issues, this study proposes a migration-assisted deep learning scheme for reconstructing the shape and permittivity of defects within cylindrical structures. The proposed scheme involves three stages of GPR data processing. First, a dual-permittivity estimation network extracts the permittivity values of the defect and the cylindrical structure, the latter of which is estimated with the help of a novel structural similarity index measure-based autofocusing technique. Second, a modified Kirchhoff migration incorporating the extracted permittivity of the cylindrical structure maps the signals reflected from the defect to the imaging domain. Third, a shape reconstruction network processes the migrated image to recover the precise shape of the defect. The image of the interior defect is finally obtained by combining the reconstructed shape and extracted permittivity of the defect. The proposed scheme is validated using both synthetic and experimental data from a laboratory trunk model and real tree trunk samples. Comparative results show superior performance over existing deep learning methods, while generalization tests on live trees confirm its feasibility for in-field deployment. The underlying principle can further be applied to other circumferential GPR imaging scenarios. The code and database are available at: https://github.com/jwqian54/Migration-Assisted-DL.
This paper investigates whether structural econometric models can rival machine learning in forecasting energy--macro dynamics while retaining causal interpretability. Using monthly data from 1999 to 2025, we develop a unified framework that integrates Time-Varying Parameter Structural VARs (TVP-SVAR) with advanced dependence structures, including DCC-GARCH, t-copulas, and mixed Clayton--Frank--Gumbel copulas. These models are empirically evaluated against leading machine learning techniques Gaussian Process Regression (GPR), Artificial Neural Networks, Random Forests, and Support Vector Regression across seven macro-financial and energy variables, with Brent crude oil as the central asset. The findings reveal three major insights. First, TVP-SVAR consistently outperforms standard VAR models, confirming structural instability in energy transmission channels. Second, copula-based extensions capture non-linear and tail dependence more effectively than symmetric DCC models, particularly during periods of macroeconomic stress. Third, despite their methodological differences, copula-enhanced econometric models and GPR achieve statistically equivalent predictive accuracy (t-test p = 0.8444). However, only the econometric approach provides interpretable impulse responses, regime shifts, and tail-risk diagnostics. We conclude that machine learning can replicate predictive performance but cannot substitute the explanatory power of structural econometrics. This synthesis offers a pathway where AI accuracy and economic interpretability jointly inform energy policy and risk management.
Accurate channel estimation with low pilot overhead and computational complexity is key to efficiently utilizing multi-antenna wireless systems. Motivated by the evolution from purely statistical descriptions toward physics- and geometry-aware propagation models, this work focuses on incorporating channel information into a Gaussian process regression (GPR) framework for improving the channel estimation accuracy. In this work, we propose a GPR-based channel estimation framework along with a novel Spatial-correlation (SC) kernel that explicitly captures the channel's second-order statistics. We derive a closed-form expression of the proposed SC-based GPR estimator and prove that its posterior mean is optimal in terms of minimum mean-square error (MMSE) under the same second-order statistics, without requiring the underlying channel distribution to be Gaussian. Our analysis reveals that, with up to 50% pilot overhead reduction, the proposed method achieves the lowest normalized mean-square error, the highest empirical 95% credible-interval coverage, and superior preservation of spectral efficiency compared to benchmark estimators, while maintaining lower computational complexity than the conventional MMSE estimator.