Abstract:Reinforcement learning (RL) with combinatorial action spaces remains challenging because feasible action sets are exponentially large and governed by complex feasibility constraints, making direct policy parameterization impractical. Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness. We propose a solver-induced \emph{latent spherical flow policy} that brings the expressiveness of modern generative policies to combinatorial RL while guaranteeing feasibility by design. Our method, LSFlow, learns a \emph{stochastic} policy in a compact continuous latent space via spherical flow matching, and delegates feasibility to a combinatorial optimization solver that maps each latent sample to a valid structured action. To improve efficiency, we train the value network directly in the latent space, avoiding repeated solver calls during policy optimization. To address the piecewise-constant and discontinuous value landscape induced by solver-based action selection, we introduce a smoothed Bellman operator that yields stable, well-defined learning targets. Empirically, our approach outperforms state-of-the-art baselines by an average of 20.6\% across a range of challenging combinatorial RL tasks.
Abstract:Operational deployment of a fully automated greenhouse gas (GHG) plume detection system remains an elusive goal for imaging spectroscopy missions, despite recent advances in deep learning approaches. With the dramatic increase in data availability, however, automation continues to increase in importance for natural and anthropogenic emissions monitoring. This work reviews and addresses several key obstacles in the field: data and label quality control, prevention of spatiotemporal biases, and correctly aligned modeling objectives. We demonstrate through rigorous experiments using multicampaign data from airborne and spaceborne instruments that convolutional neural networks (CNNs) are able to achieve operational detection performance when these obstacles are alleviated. We demonstrate that a multitask model that learns both instance detection and pixelwise segmentation simultaneously can successfully lead towards an operational pathway. We evaluate the model's plume detectability across emission source types and regions, identifying thresholds for operational deployment. Finally, we provide analysis-ready data, models, and source code for reproducibility, and work to define a set of best practices and validation standards to facilitate future contributions to the field.