We propose a neural network model for contextual regression in which the regression model depends on contextual features that determine the active submodel and an algorithm to fit the model. The proposed simple contextual neural network (SCtxtNN) separates context identification from context-specific regression, resulting in a structured and interpretable architecture with fewer parameters than a fully connected feed-forward network. We show mathematically that the proposed architecture is sufficient to represent contextual linear regression models using only standard neural network components. Numerical experiments are provided to support the theoretical result, showing that the proposed model achieves lower excess mean squared error and more stable performance than feed-forward neural networks with comparable numbers of parameters, while larger networks improve accuracy only at the cost of increased complexity. The results suggest that incorporating contextual structure can improve model efficiency while preserving interpretability.
Neural Collapse is a phenomenon that helps identify sparse and low rank structures in deep classifiers. Recent work has extended the definition of neural collapse to regression problems, albeit only measuring the phenomenon at the last layer. In this paper, we establish that Neural Regression Collapse (NRC) also occurs below the last layer across different types of models. We show that in the collapsed layers of neural regression models, features lie in a subspace that corresponds to the target dimension, the feature covariance aligns with the target covariance, the input subspace of the layer weights aligns with the feature subspace, and the linear prediction error of the features is close to the overall prediction error of the model. In addition to establishing Deep NRC, we also show that models that exhibit Deep NRC learn the intrinsic dimension of low rank targets and explore the necessity of weight decay in inducing Deep NRC. This paper provides a more complete picture of the simple structure learned by deep networks in the context of regression.
Multi-channel speech enhancement aims to recover clean speech from noisy multi-channel recordings. Most deep learning methods employ discriminative training, which can lead to non-linear distortions from regression-based objectives, especially under challenging environmental noise conditions. Inspired by ArrayDPS for unsupervised multi-channel source separation, we introduce ArrayDPS-Refine, a method designed to enhance the outputs of discriminative models using a clean speech diffusion prior. ArrayDPS-Refine is training-free, generative, and array-agnostic. It first estimates the noise spatial covariance matrix (SCM) from the enhanced speech produced by a discriminative model, then uses this estimated noise SCM for diffusion posterior sampling. This approach allows direct refinement of any discriminative model's output without retraining. Our results show that ArrayDPS-Refine consistently improves the performance of various discriminative models, including state-of-the-art waveform and STFT domain models. Audio demos are provided at https://xzwy.github.io/ArrayDPSRefineDemo/.
Constrained sampling is an important and challenging task in computational statistics, concerned with generating samples from a distribution under certain constraints. There are numerous types of algorithm aimed at this task, ranging from general Markov chain Monte Carlo, to unadjusted Langevin methods. In this article we propose a series of new sampling algorithms based on the latter of these, specifically the kinetic Langevin dynamics. Our series of algorithms are motivated on advanced numerical methods which are splitting order schemes, which include the BU and BAO families of splitting schemes.Their advantage lies in the fact that they have favorable strong order (bias) rates and computationally efficiency. In particular we provide a number of theoretical insights which include a Wasserstein contraction and convergence results. We are able to demonstrate favorable results, such as improved complexity bounds over existing non-splitting methodologies. Our results are verified through numerical experiments on a range of models with constraints, which include a toy example and Bayesian linear regression.
Accurate air quality index (AQI) forecasting is essential for the protecting public health in rapidly growing urban regions, and the practical model evaluation and selection are often challenged by the lack of rigorous, region-specific benchmarking on standardized datasets. Physics-guided machine learning and deep learning models could be a good and effective solution to resolve such issues with more accurate and efficient AQI forecasting. This research study presents an explainable and comprehensive benchmark that enables a guideline and proposed physics-guided best model by benchmarking classical time-series, machine-learning, and deep-learning approaches for multi-horizon AQI forecasting in North Texas (Dallas County). Using publicly available U.S. Environmental Protection Agency (EPA) daily observations of air quality data from 2022 to 2024, we curate city-level time series for PM2.5 and O3 by aggregating station measurements and constructing lag-wise forecasting datasets for LAG in {1,7,14,30} days. For benchmarking the best model, linear regression (LR), SARIMAX, multilayer perceptrons (MLP), and LSTM networks are evaluated with the proposed physics-guided variants (MLP+Physics and LSTM+Physics) that incorporate the EPA breakpoint-based AQI formulation as a consistency constraint through a weighted loss. Experiments using chronological train-test splits and error metrics MAE, RMSE showed that deep-learning models outperform simpler baselines, while physics guidance improves stability and yields physically consistent pollutant with AQI relationships, with the largest benefits observed for short-horizon prediction and for PM2.5 and O3. Overall, the results provide a practical reference for selecting AQI forecasting models in North Texas and clarify when lightweight physics constraints meaningfully improve predictive performance across pollutants and forecast horizons.
Missing covariate data pose a significant challenge to statistical inference and machine learning, particularly for classification tasks like logistic regression. Classical iterative approaches (EM, multiple imputation) are often computationally intensive, sensitive to high missingness rates, and limited in uncertainty propagation. Recent deep generative models based on VAEs show promise but rely on complex latent representations. We propose Amortized Variational Inference for Logistic Regression (AV-LR), a unified end-to-end framework for binary logistic regression with missing covariates. AV-LR integrates a probabilistic generative model with a simple amortized inference network, trained jointly by maximizing the evidence lower bound. Unlike competing methods, AV-LR performs inference directly in the space of missing data without additional latent variables, using a single inference network and a linear layer that jointly estimate regression parameters and the missingness mechanism. AV-LR achieves estimation accuracy comparable to or better than state-of-the-art EM-like algorithms, with significantly lower computational cost. It naturally extends to missing-not-at-random settings by explicitly modeling the missingness mechanism. Empirical results on synthetic and real-world datasets confirm its effectiveness and efficiency across various missing-data scenarios.
Accurate power flow analysis is critical for modern distribution systems, yet classical solvers face scalability issues, and current machine learning models often struggle with generalization. We introduce BOOST-RPF, a novel method that reformulates voltage prediction from a global graph regression task into a sequential path-based learning problem. By decomposing radial networks into root-to-leaf paths, we leverage gradient-boosted decision trees (XGBoost) to model local voltage-drop regularities. We evaluate three architectural variants: Absolute Voltage, Parent Residual, and Physics-Informed Residual. This approach aligns the model architecture with the recursive physics of power flow, ensuring size-agnostic application and superior out-of-distribution robustness. Benchmarked against the Kerber Dorfnetz grid and the ENGAGE suite, BOOST-RPF achieves state-of-the-art results with its Parent Residual variant which consistently outperforms both analytical and neural baselines in standard accuracy and generalization tasks. While global Multi-Layer Perceptrons (MLPs) and Graph Neural Networks (GNNs) often suffer from performance degradation under topological shifts, BOOST-RPF maintains high precision across unseen feeders. Furthermore, the framework displays linear $O(N)$ computational scaling and significantly increased sample efficiency through per-edge supervision, offering a scalable and generalizable alternative for real-time distribution system operator (DSO) applications.
We highlight a striking difference in behavior between two widely used variants of coordinate ascent variational inference: the sequential and parallel algorithms. While such differences were known in the numerical analysis literature in simpler settings, they remain largely unexplored in the optimization-focused literature on variational inference in more complex models. Focusing on the moderately high-dimensional linear regression problem, we show that the sequential algorithm, although typically slower, enjoys convergence guarantees under more relaxed conditions than the parallel variant, which is often employed to facilitate block-wise updates and improve computational efficiency.
Interpreting complex machine learning models is a critical challenge, especially for tabular data where model transparency is paramount. Local Interpretable Model-Agnostic Explanations (LIME) has been a very popular framework for interpretable machine learning, also inspiring many extensions. While traditional surrogate models used in LIME variants (e.g. linear regression and decision trees) offer a degree of stability, they can struggle to faithfully capture the complex non-linear decision boundaries that are inherent in many sophisticated black-box models. This work contributes toward bridging the gap between high predictive performance and interpretable decision-making. Specifically, we propose the NDT-LIME variant that integrates Neural Decision Trees (NDTs) as surrogate models. By leveraging the structured, hierarchical nature of NDTs, our approach aims at providing more accurate and meaningful local explanations. We evaluate its effectiveness on several benchmark tabular datasets, showing consistent improvements in explanation fidelity over traditional LIME surrogates.
The automotive industry is under growing pressure to reduce its environmental impact, requiring accurate predictive modeling to support sustainable engineering design. This study examines the factors that determine vehicle fuel consumption from the seminal Motor Trend dataset, identifying the governing physical factors of efficiency through rigorous quantitative analysis. Methodologically, the research uses data sanitization, statistical outlier elimination, and in-depth Exploratory Data Analysis (EDA) to curb the occurrence of multicollinearity between powertrain features. A comparative analysis of machine learning paradigms including Multiple Linear Regression, Support Vector Machines (SVM), and Logistic Regression was carried out to assess predictive efficacy. Findings indicate that SVM Regression is most accurate on continuous prediction (R-squared = 0.889, RMSE = 0.326), and is effective in capturing the non-linear relationships between vehicle mass and engine displacement. In parallel, Logistic Regression proved superior for classification (Accuracy = 90.8%) and showed exceptional recall (0.957) when identifying low-efficiency vehicles. These results challenge the current trend toward black-box deep learning architectures for static physical datasets, providing validation of robust performance by interpretable and well-tuned classical models. The research finds that intrinsic vehicle efficiency is fundamentally determined by physical design parameters, weight and displacement, offering a data-driven framework for how manufacturers should focus on lightweighting and engine downsizing to achieve stringent global sustainability goals.