Abstract:In this paper, we design two compressed decentralized algorithms for solving nonconvex stochastic optimization under two different scenarios. Both algorithms adopt a momentum technique to achieve fast convergence and a message-compression technique to save communication costs. Though momentum acceleration and compressed communication have been used in literature, it is highly nontrivial to theoretically prove the effectiveness of their composition in a decentralized algorithm that can maintain the benefits of both sides, because of the need to simultaneously control the consensus error, the compression error, and the bias from the momentum gradient. For the scenario where gradients are bounded, our proposal is a compressed decentralized adaptive method. To the best of our knowledge, this is the first decentralized adaptive stochastic gradient method with compressed communication. For the scenario of data heterogeneity without bounded gradients, our proposal is a compressed decentralized heavy-ball method, which applies a gradient tracking technique to address the challenge of data heterogeneity. Notably, both methods achieve an optimal convergence rate, and they can achieve linear speed up and adopt topology-independent algorithmic parameters within a certain regime of the user-specified error tolerance. Superior empirical performance is observed over state-of-the-art methods on training deep neural networks (DNNs) and Transformers.
Abstract:Graph convolutional networks (GCNs) are a powerful tool for graph representation learning. Due to the recursive neighborhood aggregations employed by GCNs, efficient training methods suffer from a lack of theoretical guarantees or are missing important practical elements from modern deep learning algorithms, such as adaptivity and momentum. In this paper, we present several neighbor-sampling (NS) based Adam-type stochastic methods for solving a nonconvex GCN training problem. We utilize the control variate technique proposed by [1] to reduce the stochastic error caused by neighbor sampling. Under standard assumptions for Adam-type methods, we show that our methods enjoy the optimal convergence rate. In addition, we conduct extensive numerical experiments on node classification tasks with several benchmark datasets. The results demonstrate superior performance of our methods over classic NS-based SGD that also uses the control-variate technique, especially for large-scale graph datasets. Our code is available at https://github.com/RPI-OPT/CV-ADAM-GNN .
Abstract:Existing inpainting methods often require extensive retraining or fine-tuning to integrate new content seamlessly, yet they struggle to maintain coherence in both structure and style between inpainted regions and the surrounding background. Motivated by these limitations, we introduce HarmonPaint, a training-free inpainting framework that seamlessly integrates with the attention mechanisms of diffusion models to achieve high-quality, harmonized image inpainting without any form of training. By leveraging masking strategies within self-attention, HarmonPaint ensures structural fidelity without model retraining or fine-tuning. Additionally, we exploit intrinsic diffusion model properties to transfer style information from unmasked to masked regions, achieving a harmonious integration of styles. Extensive experiments demonstrate the effectiveness of HarmonPaint across diverse scenes and styles, validating its versatility and performance.
Abstract:A specific failure mode designated as transient micro-short circuit (TMSC) has been identified in practical battery systems, exhibiting subtle and latent characteristics with measurable voltage deviations. To further improve the safe use of lithium-ion batteries (LIBs), this letter introduces a novel method for the precise detection of this TMSC faults within LIBs. The method applies the continuous wavelet transform (CWT) to voltage and current signals, followed by the identification of micro-scale anomalies through the analysis of the coherence in the wavelet spectrum at specific frequency. Through designed fault experiments, the effec-tiveness of this method has been verified. Result demon-strates that it can effectively capture micro-faults with a voltage drop as low as 30 mV within just a few seconds. Furthermore, the proposed method is inherently highly robust and is able to effectively detect false faults and hidden faults under varying current loads, which highlights the superiority of this method.
Abstract:Tensor Robust Principal Component Analysis (TRPCA) is a fundamental technique for decomposing multi-dimensional data into a low-rank tensor and an outlier tensor, yet existing methods relying on sparse outlier assumptions often fail under structured corruptions. In this paper, we propose a self-guided data augmentation approach that employs adaptive weighting to suppress outlier influence, reformulating the original TRPCA problem into a standard Tensor Principal Component Analysis (TPCA) problem. The proposed model involves an optimization-driven weighting scheme that dynamically identifies and downweights outlier contributions during tensor augmentation. We develop an efficient proximal block coordinate descent algorithm with closed-form updates to solve the resulting optimization problem, ensuring computational efficiency. Theoretical convergence is guaranteed through a framework combining block coordinate descent with majorization-minimization principles. Numerical experiments on synthetic and real-world datasets, including face recovery, background subtraction, and hyperspectral denoising, demonstrate that our method effectively handles various corruption patterns. The results show the improvements in both accuracy and computational efficiency compared to state-of-the-art methods.
Abstract:Recent deep learning models for Long-term Time Series Forecasting (LTSF) often emphasize complex, handcrafted designs, while simpler architectures like linear models or MLPs have often outperformed these intricate solutions. In this paper, we revisit and organize the core ideas behind several key techniques, such as redundancy reduction and multi-scale modeling, which are frequently employed in advanced LTSF models. Our goal is to streamline these ideas for more efficient deep learning utilization. To this end, we introduce TimeCapsule, a model built around the principle of high-dimensional information compression that unifies these techniques in a generalized yet simplified framework. Specifically, we model time series as a 3D tensor, incorporating temporal, variate, and level dimensions, and leverage mode production to capture multi-mode dependencies while achieving dimensionality compression. We propose an internal forecast within the compressed representation domain, supported by the Joint-Embedding Predictive Architecture (JEPA), to monitor the learning of predictive representations. Extensive experiments on challenging benchmarks demonstrate the versatility of our method, showing that TimeCapsule can achieve state-of-the-art performance.
Abstract:In this paper, we study the inexact Moreau envelope Lagrangian (iMELa) method for solving smooth non-convex optimization problems over a simple polytope with additional convex inequality constraints. By incorporating a proximal term into the traditional Lagrangian function, the iMELa method approximately solves a convex optimization subproblem over the polyhedral set at each main iteration. Under the assumption of a local error bound condition for subsets of the feasible set defined by subsets of the constraints, we establish that the iMELa method can find an $\epsilon$-Karush-Kuhn-Tucker point with $\tilde O(\epsilon^{-2})$ gradient oracle complexity.
Abstract:Applications such as adversarially robust training and Wasserstein Distributionally Robust Optimization (WDRO) can be naturally formulated as min-sum-max optimization problems. While this formulation can be rewritten as an equivalent min-max problem, the summation of max terms introduces computational challenges, including increased complexity and memory demands, which must be addressed. These challenges are particularly evident in WDRO, where existing tractable algorithms often rely on restrictive assumptions on the objective function, limiting their applicability to state-of-the-art machine learning problems such as the training of deep neural networks. This study introduces a novel stochastic smoothing framework based on the \mbox{log-sum-exp} function, efficiently approximating the max operator in min-sum-max problems. By leveraging the Clarke regularity of the max operator, we develop an iterative smoothing algorithm that addresses these computational difficulties and guarantees almost surely convergence to a Clarke/directional stationary point. We further prove that the proposed algorithm finds an $\epsilon$-scaled Clarke stationary point of the original problem, with a worst-case iteration complexity of $\widetilde{O}(\epsilon^{-3})$. Our numerical experiments demonstrate that our approach outperforms or is competitive with state-of-the-art methods in solving the newsvendor problem, deep learning regression, and adversarially robust deep learning. The results highlight that our method yields more accurate and robust solutions in these challenging problem settings.
Abstract:We present a framework for generating universal semantic embeddings of chemical elements to advance materials inference and discovery. This framework leverages ElementBERT, a domain-specific BERT-based natural language processing model trained on 1.29 million abstracts of alloy-related scientific papers, to capture latent knowledge and contextual relationships specific to alloys. These semantic embeddings serve as robust elemental descriptors, consistently outperforming traditional empirical descriptors with significant improvements across multiple downstream tasks. These include predicting mechanical and transformation properties, classifying phase structures, and optimizing materials properties via Bayesian optimization. Applications to titanium alloys, high-entropy alloys, and shape memory alloys demonstrate up to 23% gains in prediction accuracy. Our results show that ElementBERT surpasses general-purpose BERT variants by encoding specialized alloy knowledge. By bridging contextual insights from scientific literature with quantitative inference, our framework accelerates the discovery and optimization of advanced materials, with potential applications extending beyond alloys to other material classes.
Abstract:Many real-world problems, such as those with fairness constraints, involve complex expectation constraints and large datasets, necessitating the design of efficient stochastic methods to solve them. Most existing research focuses on cases with no {constraint} or easy-to-project constraints or deterministic constraints. In this paper, we consider nonconvex nonsmooth stochastic optimization problems with expectation constraints, for which we build a novel exact penalty model. We first show the relationship between the penalty model and the original problem. Then on solving the penalty problem, we present a single-loop SPIDER-type stochastic subgradient method, which utilizes the subgradients of both the objective and constraint functions, as well as the constraint function value at each iteration. Under certain regularity conditions (weaker than Slater-type constraint qualification or strong feasibility assumed in existing works), we establish an iteration complexity result of $O(\epsilon^{-4})$ to reach a near-$\epsilon$ stationary point of the penalized problem in expectation, matching the lower bound for such tasks. Building on the exact penalization, an $(\epsilon,\epsilon)$-KKT point of the original problem is obtained. For a few scenarios, our complexity of either the {objective} sample subgradient or the constraint sample function values can be lower than the state-of-the-art results by a factor of $\epsilon^{-2}$. Moreover, on solving two fairness-constrained problems, our method is significantly (up to 466 times) faster than the state-of-the-art algorithms, including switching subgradient method and inexact proximal point methods.