Mahanakorn University of Technology, International College, Bangkok, Thailand
Abstract:Closed-loop decision-making systems (e.g., lending, screening, or recidivism risk assessment) often operate under fairness and service constraints while inducing feedback effects: decisions change who appears in the future, yielding non-stationary data and potentially amplifying disparities. We study online learning of a one-dimensional threshold policy from bandit feedback under demographic parity (DP) and, optionally, service-rate constraints. The learner observes only a scalar score each round and selects a threshold; reward and constraint residuals are revealed only for the chosen threshold. We propose Optimistic Feasible Search (OFS), a simple grid-based method that maintains confidence bounds for reward and constraint residuals for each candidate threshold. At each round, OFS selects a threshold that appears feasible under confidence bounds and, among those, maximizes optimistic reward; if no threshold appears feasible, OFS selects the threshold minimizing optimistic constraint violation. This design directly targets feasible high-utility thresholds and is particularly effective for low-dimensional, interpretable policy classes where discretization is natural. We evaluate OFS on (i) a synthetic closed-loop benchmark with stable contraction dynamics and (ii) two semi-synthetic closed-loop benchmarks grounded in German Credit and COMPAS, constructed by training a score model and feeding group-dependent acceptance decisions back into population composition. Across all environments, OFS achieves higher reward with smaller cumulative constraint violation than unconstrained and primal-dual bandit baselines, and is near-oracle relative to the best feasible fixed threshold under the same sweep procedure. Experiments are reproducible and organized with double-blind-friendly relative outputs.
Abstract:First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that subspace, leaving the orthogonal complement first-order. For L-smooth non-convex objectives, we recover the standard O(1/T) stationarity guarantee with a widened stable stepsize range; under a Polyak--Lojasiewicz (PL) condition with bounded residual curvature outside the sketch, the loss contracts at refresh steps. On CIFAR-10/100 with ResNet-18/34, the method enters the low-loss region substantially earlier: measured by epochs to a pre-declared train-loss threshold (0.75), it reaches the threshold 2.95x faster than Adam on CIFAR-100/ResNet-18, while matching final test accuracy. The approach is one-knob: performance is insensitive to the sketch rank k across {1,3,5}, and k=0 yields a principled curvature-free ablation. We release anonymized logs and scripts that regenerate all figures and tables.