Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Javad Lavaei

Max

Few-Shot Test-Time Optimization Without Retraining for Semiconductor Recipe Generation and Beyond

May 21, 2025

Shangding Gu, Donghao Ying, Ming Jin, Yu Joe Lu, Jun Wang, Javad Lavaei, Costas Spanos

Figure 1 for Few-Shot Test-Time Optimization Without Retraining for Semiconductor Recipe Generation and Beyond

Figure 2 for Few-Shot Test-Time Optimization Without Retraining for Semiconductor Recipe Generation and Beyond

Figure 3 for Few-Shot Test-Time Optimization Without Retraining for Semiconductor Recipe Generation and Beyond

Figure 4 for Few-Shot Test-Time Optimization Without Retraining for Semiconductor Recipe Generation and Beyond

Abstract:We introduce Model Feedback Learning (MFL), a novel test-time optimization framework for optimizing inputs to pre-trained AI models or deployed hardware systems without requiring any retraining of the models or modifications to the hardware. In contrast to existing methods that rely on adjusting model parameters, MFL leverages a lightweight reverse model to iteratively search for optimal inputs, enabling efficient adaptation to new objectives under deployment constraints. This framework is particularly advantageous in real-world settings, such as semiconductor manufacturing recipe generation, where modifying deployed systems is often infeasible or cost-prohibitive. We validate MFL on semiconductor plasma etching tasks, where it achieves target recipe generation in just five iterations, significantly outperforming both Bayesian optimization and human experts. Beyond semiconductor applications, MFL also demonstrates strong performance in chemical processes (e.g., chemical vapor deposition) and electronic systems (e.g., wire bonding), highlighting its broad applicability. Additionally, MFL incorporates stability-aware optimization, enhancing robustness to process variations and surpassing conventional supervised learning and random search methods in high-dimensional control settings. By enabling few-shot adaptation, MFL provides a scalable and efficient paradigm for deploying intelligent control in real-world environments.

Via

Access Paper or Ask Questions

High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise

Mar 24, 2025

Yuchen Fang, Javad Lavaei, Katya Scheinberg, Sen Na

Abstract:In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order $\epsilon$-stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates irreducible and heavy-tailed noise in the zeroth-order oracle and significantly extends the analysis to second-order stationarity. We show that under weaker noise conditions, our method achieves the same high-probability first-order iteration complexity bounds, while also exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-2})$ iterations and a second-order $\epsilon$-stationary point in $\mathcal{O}(\epsilon^{-3})$ iterations with high probability, provided that $\epsilon$ is lower bounded by a constant determined by the irreducible noise level in estimation. We validate our theoretical findings and evaluate the practical performance of our method on CUTEst benchmark test set.

* 50 pages, 5 figures

Via

Access Paper or Ask Questions

Subgradient Method for System Identification with Non-Smooth Objectives

Mar 20, 2025

Baturalp Yalcin, Javad Lavaei

Abstract:This paper investigates a subgradient-based algorithm to solve the system identification problem for linear time-invariant systems with non-smooth objectives. This is essential for robust system identification in safety-critical applications. While existing work provides theoretical exact recovery guarantees using optimization solvers, the design of fast learning algorithms with convergence guarantees for practical use remains unexplored. We analyze the subgradient method in this setting where the optimization problems to be solved change over time as new measurements are taken, and we establish linear convergence results for both the best and Polyak step sizes after a burn-in period. Additionally, we characterize the asymptotic convergence of the best average sub-optimality gap under diminishing and constant step sizes. Finally, we compare the time complexity of standard solvers with the subgradient algorithm and support our findings with experimental results. This is the first work to analyze subgradient algorithms for system identification with non-smooth objectives.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Reward-Safety Balance in Offline Safe RL via Diffusion Regularization

Feb 18, 2025

Junyu Guo, Zhi Zheng, Donghao Ying, Ming Jin, Shangding Gu, Costas Spanos, Javad Lavaei

Figure 1 for Reward-Safety Balance in Offline Safe RL via Diffusion Regularization

Figure 2 for Reward-Safety Balance in Offline Safe RL via Diffusion Regularization

Figure 3 for Reward-Safety Balance in Offline Safe RL via Diffusion Regularization

Figure 4 for Reward-Safety Balance in Offline Safe RL via Diffusion Regularization

Abstract:Constrained reinforcement learning (RL) seeks high-performance policies under safety constraints. We focus on an offline setting where the agent has only a fixed dataset -- common in realistic tasks to prevent unsafe exploration. To address this, we propose Diffusion-Regularized Constrained Offline Reinforcement Learning (DRCORL), which first uses a diffusion model to capture the behavioral policy from offline data and then extracts a simplified policy to enable efficient inference. We further apply gradient manipulation for safety adaptation, balancing the reward objective and constraint satisfaction. This approach leverages high-quality offline data while incorporating safety requirements. Empirical results show that DRCORL achieves reliable safety performance, fast inference, and strong reward outcomes across robot learning tasks. Compared to existing safe offline RL methods, it consistently meets cost limits and performs well with the same hyperparameters, indicating practical applicability in real-world scenarios.

Via

Access Paper or Ask Questions

Exact Recovery Guarantees for Parameterized Non-linear System Identification Problem under Adversarial Attacks

Aug 30, 2024

Haixiang Zhang, Baturalp Yalcin, Javad Lavaei, Eduardo Sontag

Abstract:In this work, we study the system identification problem for parameterized non-linear systems using basis functions under adversarial attacks. Motivated by the LASSO-type estimators, we analyze the exact recovery property of a non-smooth estimator, which is generated by solving an embedded $\ell_1$-loss minimization problem. First, we derive necessary and sufficient conditions for the well-specifiedness of the estimator and the uniqueness of global solutions to the underlying optimization problem. Next, we provide exact recovery guarantees for the estimator under two different scenarios of boundedness and Lipschitz continuity of the basis functions. The non-asymptotic exact recovery is guaranteed with high probability, even when there are more severely corrupted data than clean data. Finally, we numerically illustrate the validity of our theory. This is the first study on the sample complexity analysis of a non-smooth estimator for the non-linear system identification problem.

* 33 pages

Via

Access Paper or Ask Questions

A Black Swan Hypothesis in Markov Decision Process via Irrationality

Jul 25, 2024

Hyunin Lee, David Abel, Ming Jin, Javad Lavaei, Somayeh Sojoudi

Figure 1 for A Black Swan Hypothesis in Markov Decision Process via Irrationality

Abstract:Black swan events are statistically rare occurrences that carry extremely high risks. A typical view of defining black swan events is heavily assumed to originate from an unpredictable time-varying environments; however, the community lacks a comprehensive definition of black swan events. To this end, this paper challenges that the standard view is incomplete and claims that high-risk, statistically rare events can also occur in unchanging environments due to human misperception of their value and likelihood, which we call as spatial black swan event. We first carefully categorize black swan events, focusing on spatial black swan events, and mathematically formalize the definition of black swan events. We hope these definitions can pave the way for the development of algorithms to prevent such events by rationally correcting human perception

Via

Access Paper or Ask Questions

A CMDP-within-online framework for Meta-Safe Reinforcement Learning

May 26, 2024

Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, Ming Jin

Figure 1 for A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Figure 2 for A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Figure 3 for A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Figure 4 for A CMDP-within-online framework for Meta-Safe Reinforcement Learning

Abstract:Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-online framework to establish the first provable guarantees in this important setting. We obtain task-averaged regret bounds for the reward maximization (optimality gap) and constraint violations using gradient-based meta-learning and show that the task-averaged optimality gap and constraint satisfaction improve with task-similarity in a static environment or task-relatedness in a dynamic environment. Several technical challenges arise when making this framework practical. To this end, we propose a meta-algorithm that performs inexact online learning on the upper bounds of within-task optimality gap and constraint violations estimated by off-policy stationary distribution corrections. Furthermore, we enable the learning rates to be adapted for every task and extend our approach to settings with a competing dynamically changing oracle. Finally, experiments are conducted to demonstrate the effectiveness of our approach.

* ICLR 2023

Via

Access Paper or Ask Questions

Pausing Policy Learning in Non-stationary Reinforcement Learning

May 25, 2024

Hyunin Lee, Ming Jin, Javad Lavaei, Somayeh Sojoudi

Figure 1 for Pausing Policy Learning in Non-stationary Reinforcement Learning

Figure 2 for Pausing Policy Learning in Non-stationary Reinforcement Learning

Figure 3 for Pausing Policy Learning in Non-stationary Reinforcement Learning

Figure 4 for Pausing Policy Learning in Non-stationary Reinforcement Learning

Abstract:Real-time inference is a challenge of real-world reinforcement learning due to temporal differences in time-varying environments: the system collects data from the past, updates the decision model in the present, and deploys it in the future. We tackle a common belief that continually updating the decision is optimal to minimize the temporal gap. We propose forecasting an online reinforcement learning framework and show that strategically pausing decision updates yields better overall performance by effectively managing aleatoric uncertainty. Theoretically, we compute an optimal ratio between policy update and hold duration, and show that a non-zero policy hold duration provides a sharper upper bound on the dynamic regret. Our experimental evaluations on three different environments also reveal that a non-zero policy hold duration yields higher rewards compared to continuous decision updates.

* conference

Via

Access Paper or Ask Questions

Absence of spurious solutions far from ground truth: A low-rank analysis with high-order losses

Mar 10, 2024

Ziye Ma, Ying Chen, Javad Lavaei, Somayeh Sojoudi

Abstract:Matrix sensing problems exhibit pervasive non-convexity, plaguing optimization with a proliferation of suboptimal spurious solutions. Avoiding convergence to these critical points poses a major challenge. This work provides new theoretical insights that help demystify the intricacies of the non-convex landscape. In this work, we prove that under certain conditions, critical points sufficiently distant from the ground truth matrix exhibit favorable geometry by being strict saddle points rather than troublesome local minima. Moreover, we introduce the notion of higher-order losses for the matrix sensing problem and show that the incorporation of such losses into the objective function amplifies the negative curvature around those distant critical points. This implies that increasing the complexity of the objective function via high-order losses accelerates the escape from such critical points and acts as a desirable alternative to increasing the complexity of the optimization problem via over-parametrization. By elucidating key characteristics of the non-convex optimization landscape, this work makes progress towards a comprehensive framework for tackling broader machine learning objectives plagued by non-convexity.

* Accepted by AISTATS 2024

Via

Access Paper or Ask Questions

Algorithmic Regularization in Tensor Optimization: Towards a Lifted Approach in Matrix Sensing

Oct 24, 2023

Ziye Ma, Javad Lavaei, Somayeh Sojoudi

Abstract:Gradient descent (GD) is crucial for generalization in machine learning models, as it induces implicit regularization, promoting compact representations. In this work, we examine the role of GD in inducing implicit regularization for tensor optimization, particularly within the context of the lifted matrix sensing framework. This framework has been recently proposed to address the non-convex matrix sensing problem by transforming spurious solutions into strict saddles when optimizing over symmetric, rank-1 tensors. We show that, with sufficiently small initialization scale, GD applied to this lifted problem results in approximate rank-1 tensors and critical points with escape directions. Our findings underscore the significance of the tensor parametrization of matrix sensing, in combination with first-order methods, in achieving global optimality in such problems.

* NeurIPS23 Poster

Via

Access Paper or Ask Questions