Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fanghui Liu

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

Jun 03, 2026

Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, Fanghui Liu

Abstract:Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions. We evaluate LeanMarathon on two recent research papers spanning four Erdős problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at https://github.com/YuanheZ/LeanMarathon.

* 26 pages, 9 figures. Comments are welcome

Via

Access Paper or Ask Questions

Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization

May 21, 2026

Weizhao Li, Fanghui Liu, Lei Shi

Abstract:We study approximation by shallow ReLU$^s$ networks, $σ_s(t)=\max{0,t}^s$, and the generalization behavior of such networks under $\ell_1$ path-norm control. For the $L^p$-type integral spaces $\widetilde{\mathcal{F}}_{p,τ_d,s}$, $1\le p\le2$, we establish approximation bounds for shallow networks using spherical harmonic analysis. In particular, when the parameter measure is the uniform measure $τ_d$ and $p<p^*=(2d+2)/(d+3)$, we obtain the rate $O(m^{-1/2-d(2-p)/(2d(2-p)+2p(2s+d+1))}\log^{3/2}m)$, which improves the corresponding random-feature rate. We also derive approximation rates for Sobolev spaces $W^{α,p}$ in the range $1\le p<2$ by embedding them into spectral Barron spaces. Finally, for nonparametric regression with sub-Gaussian noise, we prove minimax-optimal generalization bounds for path-norm-regularized shallow ReLU$^s$ networks over Barron and Sobolev spaces, with matching lower bounds up to logarithmic factors.

* 42 pages, 1 figure. Authors are listed in alphabetical order and contributed equally

Via

Access Paper or Ask Questions

Statistical Learning Theory in Lean 4: Empirical Processes from Scratch

Feb 02, 2026

Yuanhe Zhang, Jason D. Lee, Fanghui Liu

Abstract:We present the first comprehensive Lean 4 formalization of statistical learning theory (SLT) grounded in empirical process theory. Our end-to-end formal infrastructure implement the missing contents in latest Lean 4 Mathlib library, including a complete development of Gaussian Lipschitz concentration, the first formalization of Dudley's entropy integral theorem for sub-Gaussian processes, and an application to least-squares (sparse) regression with a sharp rate. The project was carried out using a human-AI collaborative workflow, in which humans design proof strategies and AI agents execute tactical proof construction, leading to the human-verified Lean 4 toolbox for SLT. Beyond implementation, the formalization process exposes and resolves implicit assumptions and missing details in standard SLT textbooks, enforcing a granular, line-by-line understanding of the theory. This work establishes a reusable formal foundation and opens the door for future developments in machine learning theory. The code is available at https://github.com/YuanheZ/lean-stat-learning-theory

* 19 pages, 2 figures. Comments are welcome

Via

Access Paper or Ask Questions

Re-examining Double Descent and Scaling Laws under Norm-based Capacity via Deterministic Equivalence

Feb 03, 2025

Yichen Wang, Yudong Chen, Lorenzo Rosasco, Fanghui Liu

Figure 1 for Re-examining Double Descent and Scaling Laws under Norm-based Capacity via Deterministic Equivalence

Figure 2 for Re-examining Double Descent and Scaling Laws under Norm-based Capacity via Deterministic Equivalence

Figure 3 for Re-examining Double Descent and Scaling Laws under Norm-based Capacity via Deterministic Equivalence

Figure 4 for Re-examining Double Descent and Scaling Laws under Norm-based Capacity via Deterministic Equivalence

Abstract:We investigate double descent and scaling laws in terms of weights rather than the number of parameters. Specifically, we analyze linear and random features models using the deterministic equivalence approach from random matrix theory. We precisely characterize how the weights norm concentrate around deterministic quantities and elucidate the relationship between the expected test error and the norm-based capacity (complexity). Our results rigorously answer whether double descent exists under norm-based capacity and reshape the corresponding scaling laws. Moreover, they prompt a rethinking of the data-parameter paradigm - from under-parameterized to over-parameterized regimes - by shifting the focus to norms (weights) rather than parameter count.

* 71 pages

Via

Access Paper or Ask Questions

One-step full gradient suffices for low-rank fine-tuning, provably and efficiently

Feb 03, 2025

Yuanhe Zhang, Fanghui Liu, Yudong Chen

Figure 1 for One-step full gradient suffices for low-rank fine-tuning, provably and efficiently

Figure 2 for One-step full gradient suffices for low-rank fine-tuning, provably and efficiently

Figure 3 for One-step full gradient suffices for low-rank fine-tuning, provably and efficiently

Figure 4 for One-step full gradient suffices for low-rank fine-tuning, provably and efficiently

Abstract:This paper studies how to improve the performance of Low-Rank Adaption (LoRA) as guided by our theoretical analysis. Our first set of theoretical results show that for random initialization and linear models, \textit{i)} LoRA will align to the certain singular subspace of one-step gradient of full fine-tuning; \textit{ii)} preconditioners improve convergence in the high-rank case. These insights motivate us to focus on preconditioned LoRA using a specific spectral initialization strategy for aligning with certain subspaces. For both linear and nonlinear models, we prove that alignment and generalization guarantees can be directly achieved at initialization, and the subsequent linear convergence can be also built. Our analysis leads to the \emph{LoRA-One} algorithm (using \emph{One}-step gradient and preconditioning), a theoretically grounded algorithm that achieves significant empirical improvement over vanilla LoRA and its variants on several benchmarks. Our theoretical analysis, based on decoupling the learning dynamics and characterizing how spectral initialization contributes to feature learning, may be of independent interest for understanding matrix sensing and deep learning theory. The source code can be found in the https://github.com/YuanheZ/LoRA-One.

* 86 pages

Via

Access Paper or Ask Questions

Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Jul 04, 2024

Tao Li, Weisen Jiang, Fanghui Liu, Xiaolin Huang, James T. Kwok

Figure 1 for Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Figure 2 for Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Figure 3 for Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Figure 4 for Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Abstract:Pre-training followed by fine-tuning is widely adopted among practitioners. The performance can be improved by "model soups"~\cite{wortsman2022model} via exploring various hyperparameter configurations.The Learned-Soup, a variant of model soups, significantly improves the performance but suffers from substantial memory and time costs due to the requirements of (i) having to load all fine-tuned models simultaneously, and (ii) a large computational graph encompassing all fine-tuned models. In this paper, we propose Memory Efficient Hyperplane Learned Soup (MEHL-Soup) to tackle this issue by formulating the learned soup as a hyperplane optimization problem and introducing block coordinate gradient descent to learn the mixing coefficients. At each iteration, MEHL-Soup only needs to load a few fine-tuned models and build a computational graph with one combined model. We further extend MEHL-Soup to MEHL-Soup+ in a layer-wise manner. Experimental results on various ViT models and data sets show that MEHL-Soup(+) outperforms Learned-Soup(+) in terms of test accuracy, and also reduces memory usage by more than $13\times$. Moreover, MEHL-Soup(+) can be run on a single GPU and achieves $9\times$ speed up in soup construction compared with the Learned-Soup. The code is released at https://github.com/nblt/MEHL-Soup.

* ECCV 2024

Via

Access Paper or Ask Questions

Benign overfitting in Fixed Dimension via Physics-Informed Learning with Smooth Inductive Bias

Jun 16, 2024

Honam Wong, Wendao Wu, Fanghui Liu, Yiping Lu

Figure 1 for Benign overfitting in Fixed Dimension via Physics-Informed Learning with Smooth Inductive Bias

Figure 2 for Benign overfitting in Fixed Dimension via Physics-Informed Learning with Smooth Inductive Bias

Figure 3 for Benign overfitting in Fixed Dimension via Physics-Informed Learning with Smooth Inductive Bias

Abstract:Recent advances in machine learning have inspired a surge of research into reconstructing specific quantities of interest from measurements that comply with certain physical laws. These efforts focus on inverse problems that are governed by partial differential equations (PDEs). In this work, we develop an asymptotic Sobolev norm learning curve for kernel ridge(less) regression when addressing (elliptical) linear inverse problems. Our results show that the PDE operators in the inverse problem can stabilize the variance and even behave benign overfitting for fixed-dimensional problems, exhibiting different behaviors from regression problems. Besides, our investigation also demonstrates the impact of various inductive biases introduced by minimizing different Sobolev norms as a form of implicit regularization. For the regularized least squares estimator, we find that all considered inductive biases can achieve the optimal convergence rate, provided the regularization parameter is appropriately chosen. The convergence rate is actually independent to the choice of (smooth enough) inductive bias for both ridge and ridgeless regression. Surprisingly, our smoothness requirement recovered the condition found in Bayesian setting and extend the conclusion to the minimum norm interpolation estimators.

Via

Access Paper or Ask Questions

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

Jun 05, 2024

Yihang Chen, Fanghui Liu, Taiji Suzuki, Volkan Cevher

Abstract:This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales. In our analysis, the bias and variance can be characterized by the spectral decay of a data-dependent regularized kernel: the original kernel matrix associated with an additional re-weighting matrix, and thus the re-weighting strategy can be regarded as a data-dependent regularization for better understanding. Besides, our analysis provides asymptotic expansion of kernel functions/vectors under covariate shift, which has its own interest.

* ICML 2024

Via

Access Paper or Ask Questions

Revisiting character-level adversarial attacks

May 07, 2024

Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, Volkan Cevher

Figure 1 for Revisiting character-level adversarial attacks

Figure 2 for Revisiting character-level adversarial attacks

Figure 3 for Revisiting character-level adversarial attacks

Figure 4 for Revisiting character-level adversarial attacks

Abstract:Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in 4.84% points and the USE similarity in 8% points with respect to the previous art. Our implementation is available in https://github.com/LIONS-EPFL/Charmer.

* Accepted in ICML 2024

Via

Access Paper or Ask Questions

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks

Apr 29, 2024

Fanghui Liu, Leello Dadi, Volkan Cevher

Abstract:Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks as the curse of dimensionality (CoD) cannot be evaded when trying to approximate even a single ReLU neuron (Bach, 2017). In this paper, we study a suitable function space for over-parameterized two-layer neural networks with bounded norms (e.g., the path norm, the Barron norm) in the perspective of sample complexity and generalization properties. First, we show that the path norm (as well as the Barron norm) is able to obtain width-independence sample complexity bounds, which allows for uniform convergence guarantees. Based on this result, we derive the improved result of metric entropy for $\epsilon$-covering up to $\mathcal{O}(\epsilon^{-\frac{2d}{d+2}})$ ($d$ is the input dimension and the depending constant is at most polynomial order of $d$) via the convex hull technique, which demonstrates the separation with kernel methods with $\Omega(\epsilon^{-d})$ to learn the target function in a Barron space. Second, this metric entropy result allows for building a sharper generalization bound under a general moment hypothesis setting, achieving the rate at $\mathcal{O}(n^{-\frac{d+2}{2d+2}})$. Our analysis is novel in that it offers a sharper and refined estimation for metric entropy (with a clear dependence relationship on the dimension $d$) and unbounded sampling in the estimation of the sample error and the output error.

* Accepted by JMLR

Via

Access Paper or Ask Questions