Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Esha Singh

Joy

The Inductive Bias of Convolutional Neural Networks: Locality and Weight Sharing Reshape Implicit Regularization

Mar 05, 2026

Tongtong Liang, Esha Singh, Rahul Parhi, Alexander Cloninger, Yu-Xiang Wang

Abstract:We study how architectural inductive bias reshapes the implicit regularization induced by the edge-of-stability phenomenon in gradient descent. Prior work has established that for fully connected networks, the strength of this regularization is governed solely by the global input geometry; consequently, it is insufficient to prevent overfitting on difficult distributions such as the high-dimensional sphere. In this paper, we show that locality and weight sharing fundamentally change this picture. Specifically, we prove that provided the receptive field size $m$ remains small relative to the ambient dimension $d$, these networks generalize on spherical data with a rate of $n^{-\frac{1}{6} +O(m/d)}$, a regime where fully connected networks provably fail. This theoretical result confirms that weight sharing couples the learned filters to the low-dimensional patch manifold, thereby bypassing the high dimensionality of the ambient space. We further corroborate our theory by analyzing the patch geometry of natural images, showing that standard convolutional designs induce patch distributions that are highly amenable to this stability mechanism, thus providing a systematic explanation for the superior generalization of convolutional networks over fully connected baselines.

* Under Review. Comments welcome!

Via

Access Paper or Ask Questions

Divide and Learn: Multi-Objective Combinatorial Optimization at Scale

Feb 11, 2026

Esha Singh, Dongxia Wu, Chien-Yi Yang, Tajana Rosing, Rose Yu, Yi-An Ma

Abstract:Multi-objective combinatorial optimization seeks Pareto-optimal solutions over exponentially large discrete spaces, yet existing methods sacrifice generality, scalability, or theoretical guarantees. We reformulate it as an online learning problem over a decomposed decision space, solving position-wise bandit subproblems via adaptive expert-guided sequential construction. This formulation admits regret bounds of $O(d\sqrt{T \log T})$ depending on subproblem dimensionality $d$ rather than combinatorial space size. On standard benchmarks, our method achieves 80--98\% of specialized solvers performance while achieving two to three orders of magnitude improvement in sample and computational efficiency over Bayesian optimization methods. On real-world hardware-software co-design for AI accelerators with expensive simulations, we outperform competing methods under fixed evaluation budgets. The advantage grows with problem scale and objective count, establishing bandit optimization over decomposed decision spaces as a principled alternative to surrogate modeling or offline training for multi-objective optimization.

* Tech report. Code URL coming soon

Via

Access Paper or Ask Questions

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Jun 10, 2024

Dan Qiao, Kaiqi Zhang, Esha Singh, Daniel Soudry, Yu-Xiang Wang

Figure 1 for Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Figure 2 for Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Figure 3 for Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Figure 4 for Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Abstract:We study the generalization of two-layer ReLU neural networks in a univariate nonparametric regression problem with noisy labels. This is a problem where kernels (\emph{e.g.} NTK) are provably sub-optimal and benign overfitting does not happen, thus disqualifying existing theory for interpolating (0-loss, global optimal) solutions. We present a new theory of generalization for local minima that gradient descent with a constant learning rate can \emph{stably} converge to. We show that gradient descent with a fixed learning rate $\eta$ can only find local minima that represent smooth functions with a certain weighted \emph{first order total variation} bounded by $1/\eta - 1/2 + \widetilde{O}(\sigma + \sqrt{\mathrm{MSE}})$ where $\sigma$ is the label noise level, $\mathrm{MSE}$ is short for mean squared error against the ground truth, and $\widetilde{O}(\cdot)$ hides a logarithmic factor. Under mild assumptions, we also prove a nearly-optimal MSE bound of $\widetilde{O}(n^{-4/5})$ within the strict interior of the support of the $n$ data points. Our theoretical results are validated by extensive simulation that demonstrates large learning rate training induces sparse linear spline fits. To the best of our knowledge, we are the first to obtain generalization bound via minima stability in the non-interpolation case and the first to show ReLU NNs without regularization can achieve near-optimal rates in nonparametric regression.

* 51 pages

Via

Access Paper or Ask Questions

A Conversational Agent System for Dietary Supplements Use

Apr 04, 2021

Esha Singh, Anu Bompelli, Ruyuan Wan, Jiang Bian, Serguei Pakhomov, Rui Zhang

Figure 1 for A Conversational Agent System for Dietary Supplements Use

Figure 2 for A Conversational Agent System for Dietary Supplements Use

Figure 3 for A Conversational Agent System for Dietary Supplements Use

Figure 4 for A Conversational Agent System for Dietary Supplements Use

Abstract:Dietary supplements (DS) have been widely used by consumers, but the information around the effectiveness and safety of DS is disparate or incomplete, making barriers to consumers to find information effectively. Conversational agent systems have been applied to the healthcare domain but there is no such a system to answer consumers regarding DS use, although widespread use of the dietary supplement. In this study, we develop the first conversational agent system for DS use.

Via

Access Paper or Ask Questions

Social determinants of health in the era of artificial intelligence with electronic health records: A systematic review

Jan 22, 2021

Anusha Bompelli, Yanshan Wang, Ruyuan Wan, Esha Singh, Yuqi Zhou, Lin Xu, David Oniani, Bhavani Singh Agnikula Kshatriya, Joyce, E. Balls-Berry(+1 more)

Figure 1 for Social determinants of health in the era of artificial intelligence with electronic health records: A systematic review

Figure 2 for Social determinants of health in the era of artificial intelligence with electronic health records: A systematic review

Figure 3 for Social determinants of health in the era of artificial intelligence with electronic health records: A systematic review

Figure 4 for Social determinants of health in the era of artificial intelligence with electronic health records: A systematic review

Abstract:There is growing evidence showing the significant role of social determinant of health (SDOH) on a wide variety of health outcomes. In the era of artificial intelligence (AI), electronic health records (EHRs) have been widely used to conduct observational studies. However, how to make the best of SDOH information from EHRs is yet to be studied. In this paper, we systematically reviewed recently published papers and provided a methodology review of AI methods using the SDOH information in EHR data. A total of 1250 articles were retrieved from the literature between 2010 and 2020, and 74 papers were included in this review after abstract and full-text screening. We summarized these papers in terms of general characteristics (including publication years, venues, countries etc.), SDOH types, disease areas, study outcomes, AI methods to extract SDOH from EHRs and AI methods using SDOH for healthcare outcomes. Finally, we conclude this paper with discussion on the current trends, challenges, and future directions on using SDOH from EHRs.

* 27 pages, 5 figures

Via

Access Paper or Ask Questions