Abstract:In this paper, we study nonconvex constrained stochastic zeroth-order optimization problems, for which we have access to exact information of constraints and noisy function values of the objective. We propose a Bregman linearized augmented Lagrangian method that utilizes stochastic zeroth-order gradient estimators combined with a variance reduction technique. We analyze its oracle complexity, in terms of the total number of stochastic function value evaluations required to achieve an \(\epsilon\)-KKT point in \(\ell_p\)-norm metrics with \(p \ge 2\), where \(p\) is a parameter associated with the selected Bregman distance. In particular, starting from a near-feasible initial point and using Rademacher smoothing, the oracle complexity is in order \(O(p d^{2/p} \epsilon^{-3})\) for \(p \in [2, 2 \ln d]\), and \(O(\ln d \cdot \epsilon^{-3})\) for \(p > 2 \ln d\), where \(d\) denotes the problem dimension. Those results show that the complexity of the proposed method can achieve a dimensional dependency lower than \(O(d)\) without requiring additional assumptions, provided that a Bregman distance is chosen properly. This offers a significant improvement in the high-dimensional setting over existing work, and matches the lowest complexity order with respect to the tolerance \(\epsilon\) reported in the literature. Numerical experiments on constrained Lasso and black-box adversarial attack problems highlight the promising performances of the proposed method.
Abstract:In this paper, we establish tight lower bounds for Byzantine-robust distributed first-order stochastic optimization methods in both strongly convex and non-convex stochastic optimization. We reveal that when the distributed nodes have heterogeneous data, the convergence error comprises two components: a non-vanishing Byzantine error and a vanishing optimization error. We establish the lower bounds on the Byzantine error and on the minimum number of queries to a stochastic gradient oracle required to achieve an arbitrarily small optimization error. Nevertheless, we identify significant discrepancies between our established lower bounds and the existing upper bounds. To fill this gap, we leverage the techniques of Nesterov's acceleration and variance reduction to develop novel Byzantine-robust distributed stochastic optimization methods that provably match these lower bounds, up to logarithmic factors, implying that our established lower bounds are tight.