Abstract:The computational cost of stiff chemical kinetics remains a dominant bottleneck in reacting-flow simulation, yet hybrid integration strategies are typically driven by hand-tuned heuristics or supervised predictors that make myopic decisions from instantaneous local state. We introduce a constrained reinforcement learning (RL) framework that autonomously selects between an implicit BDF integrator (CVODE) and a quasi-steady-state (QSS) solver during chemistry integration. Solver selection is cast as a Markov decision process. The agent learns trajectory-aware policies that account for how present solver choices influence downstream error accumulation, while minimizing computational cost under a user-prescribed accuracy tolerance enforced through a Lagrangian reward with online multiplier adaptation. Across sampled 0D homogeneous reactor conditions, the RL-adaptive policy achieves a mean speedup of approximately $3\times$, with speedups ranging from $1.11\times$ to $10.58\times$, while maintaining accurate ignition delays and species profiles for a 106-species \textit{n}-dodecane mechanism and adding approximately $1\%$ inference overhead. Without retraining, the 0D-trained policy transfers to 1D counterflow diffusion flames over strain rates $10$--$2000~\mathrm{s}^{-1}$, delivering consistent $\approx 2.2\times$ speedup relative to CVODE while preserving near-reference temperature accuracy and selecting CVODE at only $12$--$15\%$ of space-time points. Overall, the results demonstrate the potential of the proposed reinforcement learning framework to learn problem-specific integration strategies while respecting accuracy constraints, thereby opening a pathway toward adaptive, self-optimizing workflows for multiphysics systems with spatially heterogeneous stiffness.
Abstract:We present an approach for designing swarm-based optimizers for the global optimization of expensive black-box functions. In the proposed approach, the problem of finding efficient optimizers is framed as a reinforcement learning problem, where the goal is to find optimization policies that require a few function evaluations to converge to the global optimum. The state of each agent within the swarm is defined as its current position and function value within a design space and the agents learn to take favorable actions that maximize reward, which is based on the final value of the objective function. The proposed approach is tested on various benchmark optimization functions and compared to the performance of other global optimization strategies. Furthermore, the effect of changing the number of agents, as well as the generalization capabilities of the trained agents are investigated. The results show superior performance compared to the other optimizers, desired scaling when the number of agents is varied, and acceptable performance even when applied to unseen functions. On a broader scale, the results show promise for the rapid development of domain-specific optimizers.