Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vahid Tarokh

Duke University

Elliptic Loss Regularization

Mar 04, 2025

Ali Hasan, Haoming Yang, Yuting Ng, Vahid Tarokh

Figure 1 for Elliptic Loss Regularization

Figure 2 for Elliptic Loss Regularization

Figure 3 for Elliptic Loss Regularization

Figure 4 for Elliptic Loss Regularization

Abstract:Regularizing neural networks is important for anticipating model behavior in regions of the data space that are not well represented. In this work, we propose a regularization technique for enforcing a level of smoothness in the mapping between the data input space and the loss value. We specify the level of regularity by requiring that the loss of the network satisfies an elliptic operator over the data domain. To do this, we modify the usual empirical risk minimization objective such that we instead minimize a new objective that satisfies an elliptic operator over points within the domain. This allows us to use existing theory on elliptic operators to anticipate the behavior of the error for points outside the training set. We propose a tractable computational method that approximates the behavior of the elliptic operator while being computationally efficient. Finally, we analyze the properties of the proposed regularization to understand the performance on common problems of distribution shift and group imbalance. Numerical experiments confirm the utility of the proposed regularization technique.

* ICLR 2025

Via

Access Paper or Ask Questions

Parabolic Continual Learning

Mar 03, 2025

Haoming Yang, Ali Hasan, Vahid Tarokh

Figure 1 for Parabolic Continual Learning

Figure 2 for Parabolic Continual Learning

Figure 3 for Parabolic Continual Learning

Figure 4 for Parabolic Continual Learning

Abstract:Regularizing continual learning techniques is important for anticipating algorithmic behavior under new realizations of data. We introduce a new approach to continual learning by imposing the properties of a parabolic partial differential equation (PDE) to regularize the expected behavior of the loss over time. This class of parabolic PDEs has a number of favorable properties that allow us to analyze the error incurred through forgetting and the error induced through generalization. Specifically, we do this through imposing boundary conditions where the boundary is given by a memory buffer. By using the memory buffer as a boundary, we can enforce long term dependencies by bounding the expected error by the boundary loss. Finally, we illustrate the empirical performance of the method on a series of continual learning tasks.

Via

Access Paper or Ask Questions

S2TX: Cross-Attention Multi-Scale State-Space Transformer for Time Series Forecasting

Feb 17, 2025

Zihao Wu, Juncheng Dong, Haoming Yang, Vahid Tarokh

Abstract:Time series forecasting has recently achieved significant progress with multi-scale models to address the heterogeneity between long and short range patterns. Despite their state-of-the-art performance, we identify two potential areas for improvement. First, the variates of the multivariate time series are processed independently. Moreover, the multi-scale (long and short range) representations are learned separately by two independent models without communication. In light of these concerns, we propose State Space Transformer with cross-attention (S2TX). S2TX employs a cross-attention mechanism to integrate a Mamba model for extracting long-range cross-variate context and a Transformer model with local window attention to capture short-range representations. By cross-attending to the global context, the Transformer model further facilitates variate-level interactions as well as local/global communications. Comprehensive experiments on seven classic long-short range time-series forecasting benchmark datasets demonstrate that S2TX can achieve highly robust SOTA results while maintaining a low memory footprint.

Via

Access Paper or Ask Questions

Teleportation With Null Space Gradient Projection for Optimization Acceleration

Feb 17, 2025

Zihao Wu, Juncheng Dong, Ahmed Aloui, Vahid Tarokh

Abstract:Optimization techniques have become increasingly critical due to the ever-growing model complexity and data scale. In particular, teleportation has emerged as a promising approach, which accelerates convergence of gradient descent-based methods by navigating within the loss invariant level set to identify parameters with advantageous geometric properties. Existing teleportation algorithms have primarily demonstrated their effectiveness in optimizing Multi-Layer Perceptrons (MLPs), but their extension to more advanced architectures, such as Convolutional Neural Networks (CNNs) and Transformers, remains challenging. Moreover, they often impose significant computational demands, limiting their applicability to complex architectures. To this end, we introduce an algorithm that projects the gradient of the teleportation objective function onto the input null space, effectively preserving the teleportation within the loss invariant level set and reducing computational cost. Our approach is readily generalizable from MLPs to CNNs, transformers, and potentially other advanced architectures. We validate the effectiveness of our algorithm across various benchmark datasets and optimizers, demonstrating its broad applicability.

Via

Access Paper or Ask Questions

Score-Based Metropolis-Hastings Algorithms

Dec 31, 2024

Ahmed Aloui, Ali Hasan, Juncheng Dong, Zihao Wu, Vahid Tarokh

Figure 1 for Score-Based Metropolis-Hastings Algorithms

Figure 2 for Score-Based Metropolis-Hastings Algorithms

Figure 3 for Score-Based Metropolis-Hastings Algorithms

Figure 4 for Score-Based Metropolis-Hastings Algorithms

Abstract:In this paper, we introduce a new approach for integrating score-based models with the Metropolis-Hastings algorithm. While traditional score-based diffusion models excel in accurately learning the score function from data points, they lack an energy function, making the Metropolis-Hastings adjustment step inaccessible. Consequently, the unadjusted Langevin algorithm is often used for sampling using estimated score functions. The lack of an energy function then prevents the application of the Metropolis-adjusted Langevin algorithm and other Metropolis-Hastings methods, limiting the wealth of other algorithms developed that use acceptance functions. We address this limitation by introducing a new loss function based on the \emph{detailed balance condition}, allowing the estimation of the Metropolis-Hastings acceptance probabilities given a learned score function. We demonstrate the effectiveness of the proposed method for various scenarios, including sampling from heavy-tail distributions.

Via

Access Paper or Ask Questions

Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Dec 21, 2024

Shyam Venkatasubramanian, Vahid Tarokh

Figure 1 for Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Figure 2 for Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Figure 3 for Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Figure 4 for Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Abstract:Accelerating model convergence in resource-constrained environments is essential for fast and efficient neural network training. This work presents learn2mix, a new training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Unlike classical training methods that use static class proportions, learn2mix continually adapts class proportions during training, leading to faster convergence. Empirical evaluations on benchmark datasets show that neural networks trained with learn2mix converge faster than those trained with classical approaches, achieving improved results for classification, regression, and reconstruction tasks under limited training resources and with imbalanced classes. Our empirical findings are supported by theoretical analysis.

Via

Access Paper or Ask Questions

Offline Stochastic Optimization of Black-Box Objective Functions

Dec 03, 2024

Juncheng Dong, Zihao Wu, Hamid Jafarkhani, Ali Pezeshki, Vahid Tarokh

Figure 1 for Offline Stochastic Optimization of Black-Box Objective Functions

Figure 2 for Offline Stochastic Optimization of Black-Box Objective Functions

Figure 3 for Offline Stochastic Optimization of Black-Box Objective Functions

Figure 4 for Offline Stochastic Optimization of Black-Box Objective Functions

Abstract:Many challenges in science and engineering, such as drug discovery and communication network design, involve optimizing complex and expensive black-box functions across vast search spaces. Thus, it is essential to leverage existing data to avoid costly active queries of these black-box functions. To this end, while Offline Black-Box Optimization (BBO) is effective for deterministic problems, it may fall short in capturing the stochasticity of real-world scenarios. To address this, we introduce Stochastic Offline BBO (SOBBO), which tackles both black-box objectives and uncontrolled uncertainties. We propose two solutions: for large-data regimes, a differentiable surrogate allows for gradient-based optimization, while for scarce-data regimes, we directly estimate gradients under conservative field constraints, improving robustness, convergence, and data efficiency. Numerical experiments demonstrate the effectiveness of our approach on both synthetic and real-world tasks.

Via

Access Paper or Ask Questions

Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians

Nov 21, 2024

William N. Caballero, Matthew LaRosa, Alexander Fisher, Vahid Tarokh

Figure 1 for Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians

Figure 2 for Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians

Figure 3 for Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians

Figure 4 for Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians

Abstract:The multivariate Gaussian distribution underpins myriad operations-research, decision-analytic, and machine-learning models (e.g., Bayesian optimization, Gaussian influence diagrams, and variational autoencoders). However, despite recent advances in adversarial machine learning (AML), inference for Gaussian models in the presence of an adversary is notably understudied. Therefore, we consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables. To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence. We consider white- and grey-box settings such that the attacker has complete and incomplete knowledge about the decisionmaker's underlying multivariate Gaussian distribution, respectively. Select instances are shown to reduce to quadratic and stochastic quadratic programs, and structural properties are derived to inform solution methods. We assess the impact and efficacy of these attacks in three examples, including, real estate evaluation, interest rate estimation and signals processing. Each example leverages an alternative underlying model, thereby highlighting the attacks' broad applicability. Through these applications, we also juxtapose the behavior of the white- and grey-box attacks to understand how uncertainty and structure affect attacker behavior.

* 30 pages, 6 figures; 4 tables

Via

Access Paper or Ask Questions

Asymptotically Optimal Change Detection for Unnormalized Pre- and Post-Change Distributions

Oct 18, 2024

Arman Adibi, Sanjeev Kulkarni, H. Vincent Poor, Taposh Banerjee, Vahid Tarokh

Abstract:This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible. This situation happens in many scenarios in physics such as in ferromagnetism, crystallography, magneto-hydrodynamics, and thermodynamics, where the energy models are difficult to normalize. Our approach is based on the estimation of the Cumulative Sum (CUSUM) statistics, which is known to produce optimal performance. We first present an intuitively appealing approximation method. Unfortunately, this produces a biased estimator of the CUSUM statistics and may cause performance degradation. We then propose the Log-Partition Approximation Cumulative Sum (LPA-CUSUM) algorithm based on thermodynamic integration (TI) in order to estimate the log-ratio of normalizing constants of pre- and post-change distributions. It is proved that this approach gives an unbiased estimate of the log-partition function and the CUSUM statistics, and leads to an asymptotically optimal performance. Moreover, we derive a relationship between the required sample size for thermodynamic integration and the desired detection delay performance, offering guidelines for practical parameter selection. Numerical studies are provided demonstrating the efficacy of our approach.

Via

Access Paper or Ask Questions

Steinmetz Neural Networks for Complex-Valued Data

Sep 16, 2024

Shyam Venkatasubramanian, Ali Pezeshki, Vahid Tarokh

Figure 1 for Steinmetz Neural Networks for Complex-Valued Data

Figure 2 for Steinmetz Neural Networks for Complex-Valued Data

Figure 3 for Steinmetz Neural Networks for Complex-Valued Data

Figure 4 for Steinmetz Neural Networks for Complex-Valued Data

Abstract:In this work, we introduce a new approach to processing complex-valued data using DNNs consisting of parallel real-valued subnetworks with coupled outputs. Our proposed class of architectures, referred to as Steinmetz Neural Networks, leverages multi-view learning to construct more interpretable representations within the latent space. Subsequently, we present the Analytic Neural Network, which implements a consistency penalty that encourages analytic signal representations in the Steinmetz neural network's latent space. This penalty enforces a deterministic and orthogonal relationship between the real and imaginary components. Utilizing an information-theoretic construction, we demonstrate that the upper bound on the generalization error posited by the analytic neural network is lower than that of the general class of Steinmetz neural networks. Our numerical experiments demonstrate the improved performance and robustness to additive noise, afforded by our proposed networks on benchmark datasets and synthetic examples.

Via

Access Paper or Ask Questions