Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianfeng Lu

Accelerate Langevin Sampling with Birth-Death process and Exploration Component

May 06, 2023

Lezhi Tan, Jianfeng Lu

Abstract:Sampling a probability distribution with known likelihood is a fundamental task in computational science and engineering. Aiming at multimodality, we propose a new sampling method that takes advantage of both birth-death process and exploration component. The main idea of this method is \textit{look before you leap}. We keep two sets of samplers, one at warmer temperature and one at original temperature. The former one serves as pioneer in exploring new modes and passing useful information to the other, while the latter one samples the target distribution after receiving the information. We derive a mean-field limit and show how the exploration process determines sampling efficiency. Moreover, we prove exponential asymptotic convergence under mild assumption. Finally, we test on experiments from previous literature and compared our methodology to previous ones.

* 23 pages, 10 figures

Via

Access Paper or Ask Questions

Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

Apr 18, 2023

Jing An, Jianfeng Lu

Abstract:We extend the global convergence result of Chatterjee \cite{chatterjee2022convergence} by considering the stochastic gradient descent (SGD) for non-convex objective functions. With minimal additional assumptions that can be realized by finitely wide neural networks, we prove that if we initialize inside a local region where the \L{}ajasiewicz condition holds, with a positive probability, the stochastic gradient iterates converge to a global minimum inside this region. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. For that, we assume the SGD noise scales with the objective function, which is called machine learning noise and achievable in many real examples. Furthermore, we provide a negative argument to show why using the boundedness of noise with Robbins-Monro type step sizes is not enough to keep the key component valid.

Via

Access Paper or Ask Questions

DeFeeNet: Consecutive 3D Human Motion Prediction with Deviation Feedback

Apr 14, 2023

Xiaoning Sun, Huaijiang Sun, Bin Li, Dong Wei, Weiqing Li, Jianfeng Lu

Abstract:Let us rethink the real-world scenarios that require human motion prediction techniques, such as human-robot collaboration. Current works simplify the task of predicting human motions into a one-off process of forecasting a short future sequence (usually no longer than 1 second) based on a historical observed one. However, such simplification may fail to meet practical needs due to the neglect of the fact that motion prediction in real applications is not an isolated ``observe then predict'' unit, but a consecutive process composed of many rounds of such unit, semi-overlapped along the entire sequence. As time goes on, the predicted part of previous round has its corresponding ground truth observable in the new round, but their deviation in-between is neither exploited nor able to be captured by existing isolated learning fashion. In this paper, we propose DeFeeNet, a simple yet effective network that can be added on existing one-off prediction models to realize deviation perception and feedback when applied to consecutive motion prediction task. At each prediction round, the deviation generated by previous unit is first encoded by our DeFeeNet, and then incorporated into the existing predictor to enable a deviation-aware prediction manner, which, for the first time, allows for information transmit across adjacent prediction units. We design two versions of DeFeeNet as MLP-based and GRU-based, respectively. On Human3.6M and more complicated BABEL, experimental results indicate that our proposed network improves consecutive human motion prediction performance regardless of the basic model.

* accepted by CVPR2023

Via

Access Paper or Ask Questions

Meta-Auxiliary Learning for Adaptive Human Pose Prediction

Apr 13, 2023

Qiongjie Cui, Huaijiang Sun, Jianfeng Lu, Bin Li, Weiqing Li

Figure 1 for Meta-Auxiliary Learning for Adaptive Human Pose Prediction

Figure 2 for Meta-Auxiliary Learning for Adaptive Human Pose Prediction

Figure 3 for Meta-Auxiliary Learning for Adaptive Human Pose Prediction

Figure 4 for Meta-Auxiliary Learning for Adaptive Human Pose Prediction

Abstract:Predicting high-fidelity future human poses, from a historically observed sequence, is decisive for intelligent robots to interact with humans. Deep end-to-end learning approaches, which typically train a generic pre-trained model on external datasets and then directly apply it to all test samples, emerge as the dominant solution to solve this issue. Despite encouraging progress, they remain non-optimal, as the unique properties (e.g., motion style, rhythm) of a specific sequence cannot be adapted. More generally, at test-time, once encountering unseen motion categories (out-of-distribution), the predicted poses tend to be unreliable. Motivated by this observation, we propose a novel test-time adaptation framework that leverages two self-supervised auxiliary tasks to help the primary forecasting network adapt to the test sequence. In the testing phase, our model can adjust the model parameters by several gradient updates to improve the generation quality. However, due to catastrophic forgetting, both auxiliary tasks typically tend to the low ability to automatically present the desired positive incentives for the final prediction performance. For this reason, we also propose a meta-auxiliary learning scheme for better adaptation. In terms of general setup, our approach obtains higher accuracy, and under two new experimental designs for out-of-distribution data (unseen subjects and categories), achieves significant improvements.

* 10 pages, 6 figures, AAAI 2023 accepted

Via

Access Paper or Ask Questions

Global Optimality of Elman-type RNN in the Mean-Field Regime

Mar 12, 2023

Andrea Agazzi, Jianfeng Lu, Sayan Mukherjee

Figure 1 for Global Optimality of Elman-type RNN in the Mean-Field Regime

Figure 2 for Global Optimality of Elman-type RNN in the Mean-Field Regime

Abstract:We analyze Elman-type Recurrent Reural Networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime

* 31 pages, 2 figures

Via

Access Paper or Ask Questions

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

Feb 11, 2023

Mo Zhou, Jianfeng Lu

Abstract:In this work, we consider the stochastic optimal control problem in continuous time and a policy gradient method to solve it. In particular, we study the gradient flow for the control, viewed as a continuous time limit of the policy gradient. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions. The main novelty in the analysis is the notion of local optimal control function, which is introduced to compare the local optimality of the iterate.

* 54 pages

Via

Access Paper or Ask Questions

On Enhancing Expressive Power via Compositions of Single Fixed-Size ReLU Network

Jan 29, 2023

Shijun Zhang, Jianfeng Lu, Hongkai Zhao

Abstract:This paper studies the expressive power of deep neural networks from the perspective of function compositions. We show that repeated compositions of a single fixed-size ReLU network can produce super expressive power. In particular, we prove by construction that $\mathcal{L}_2\circ \boldsymbol{g}^{\circ r}\circ \boldsymbol{\mathcal{L}}_1$ can approximate $1$-Lipschitz continuous functions on $[0,1]^d$ with an error $\mathcal{O}(r^{-1/d})$, where $\boldsymbol{g}$ is realized by a fixed-size ReLU network, $\boldsymbol{\mathcal{L}}_1$ and $\mathcal{L}_2$ are two affine linear maps matching the dimensions, and $\boldsymbol{g}^{\circ r}$ means the $r$-times composition of $\boldsymbol{g}$. Furthermore, we extend such a result to generic continuous functions on $[0,1]^d$ with the approximation error characterized by the modulus of continuity. Our results reveal that a continuous-depth network generated via a dynamical system has good approximation power even if its dynamics function is time-independent and realized by a fixed-size ReLU network.

* arXiv admin note: text overlap with arXiv:2205.09459

Via

Access Paper or Ask Questions

A Structure-guided Effective and Temporal-lag Connectivity Network for Revealing Brain Disorder Mechanisms

Dec 01, 2022

Zhengwang Xia, Tao Zhou, Saqib Mamoon, Amani Alfakih, Jianfeng Lu

Figure 1 for A Structure-guided Effective and Temporal-lag Connectivity Network for Revealing Brain Disorder Mechanisms

Figure 2 for A Structure-guided Effective and Temporal-lag Connectivity Network for Revealing Brain Disorder Mechanisms

Figure 3 for A Structure-guided Effective and Temporal-lag Connectivity Network for Revealing Brain Disorder Mechanisms

Figure 4 for A Structure-guided Effective and Temporal-lag Connectivity Network for Revealing Brain Disorder Mechanisms

Abstract:Brain network provides important insights for the diagnosis of many brain disorders, and how to effectively model the brain structure has become one of the core issues in the domain of brain imaging analysis. Recently, various computational methods have been proposed to estimate the causal relationship (i.e., effective connectivity) between brain regions. Compared with traditional correlation-based methods, effective connectivity can provide the direction of information flow, which may provide additional information for the diagnosis of brain diseases. However, existing methods either ignore the fact that there is a temporal-lag in the information transmission across brain regions, or simply set the temporal-lag value between all brain regions to a fixed value. To overcome these issues, we design an effective temporal-lag neural network (termed ETLN) to simultaneously infer the causal relationships and the temporal-lag values between brain regions, which can be trained in an end-to-end manner. In addition, we also introduce three mechanisms to better guide the modeling of brain networks. The evaluation results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database demonstrate the effectiveness of the proposed method.

Via

Access Paper or Ask Questions

Regularized Stein Variational Gradient Flow

Nov 15, 2022

Ye He, Krishnakumar Balasubramanian, Bharath K. Sriperumbudur, Jianfeng Lu

Figure 1 for Regularized Stein Variational Gradient Flow

Figure 2 for Regularized Stein Variational Gradient Flow

Figure 3 for Regularized Stein Variational Gradient Flow

Abstract:The Stein Variational Gradient Descent (SVGD) algorithm is an deterministic particle method for sampling. However, a mean-field analysis reveals that the gradient flow corresponding to the SVGD algorithm (i.e., the Stein Variational Gradient Flow) only provides a constant-order approximation to the Wasserstein Gradient Flow corresponding to the KL-divergence minimization. In this work, we propose the Regularized Stein Variational Gradient Flow which interpolates between the Stein Variational Gradient Flow and the Wasserstein Gradient Flow. We establish various theoretical properties of the Regularized Stein Variational Gradient Flow (and its time-discretization) including convergence to equilibrium, existence and uniqueness of weak solutions, and stability of the solutions. We provide preliminary numerical evidence of the improved performance offered by the regularization.

Via

Access Paper or Ask Questions

Improved Analysis of Score-based Generative Modeling: User-Friendly Bounds under Minimal Smoothness Assumptions

Nov 03, 2022

Hongrui Chen, Holden Lee, Jianfeng Lu

Abstract:In this paper, we focus on the theoretical analysis of diffusion-based generative modeling. Under an $L^2$-accurate score estimator, we provide convergence guarantees with polynomial complexity for any data distribution with second-order moment, by either employing an early stopping technique or assuming smoothness condition on the score function of the data distribution. Our result does not rely on any log-concavity or functional inequality assumption and has a logarithmic dependence on the smoothness. In particular, we show that under only a finite second moment condition, approximating the following in KL divergence in $\epsilon$-accuracy can be done in $\tilde O\left(\frac{d^2 \log^2 (1/\delta)}{\epsilon^2}\right)$ steps: 1) the variance-$\delta$ Gaussian perturbation of any data distribution; 2) data distributions with $1/\delta$-smooth score functions. Our theoretical analysis also provides quantitative comparison between different discrete approximations and may guide the choice of discretization points in practice.

Via

Access Paper or Ask Questions