Abstract:Brenier proved that under certain conditions on a source and a target probability measure there exists a strictly convex function such that its gradient is a transport map from the source to the target distribution. This function is called the Brenier potential. Furthermore, detailed information on the H\"older regularity of the Brenier potential is available. In this work we develop the statistical learning theory of generative adversarial neural networks that learn the Brenier potential. As by the transformation of densities formula, the density of the generated measure depends on the second derivative of the Brenier potential, we develop the universal approximation theory of ReCU networks with cubic activation $\mathtt{ReCU}(x)=\max\{0,x\}^3$ that combines the favorable approximation properties of H\"older functions with a Lipschitz continuous density. In order to assure the convexity of such general networks, we introduce an adversarial training procedure for a potential function represented by the ReCU networks that combines the classical discriminator cross entropy loss with a penalty term that enforces (strict) convexity. We give a detailed decomposition of learning errors and show that for a suitable high penalty parameter all networks chosen in the adversarial min-max optimization problem are strictly convex. This is further exploited to prove the consistency of the learning procedure for (slowly) expanding network capacity. We also implement the described learning algorithm and apply it to a number of standard test cases from Gaussian mixture to image data as target distributions. As predicted in theory, we observe that the convexity loss becomes inactive during the training process and the potentials represented by the neural networks have learned convexity.
Abstract:Recently, several deep learning (DL) methods for approximating high-dimensional partial differential equations (PDEs) have been proposed. The interest that these methods have generated in the literature is in large part due to simulations which appear to demonstrate that such DL methods have the capacity to overcome the curse of dimensionality (COD) for PDEs in the sense that the number of computational operations they require to achieve a certain approximation accuracy $\varepsilon\in(0,\infty)$ grows at most polynomially in the PDE dimension $d\in\mathbb N$ and the reciprocal of $\varepsilon$. While there is thus far no mathematical result that proves that one of such methods is indeed capable of overcoming the COD, there are now a number of rigorous results in the literature that show that deep neural networks (DNNs) have the expressive power to approximate PDE solutions without the COD in the sense that the number of parameters used to describe the approximating DNN grows at most polynomially in both the PDE dimension $d\in\mathbb N$ and the reciprocal of the approximation accuracy $\varepsilon>0$. Roughly speaking, in the literature it is has been proved for every $T>0$ that solutions $u_d\colon [0,T]\times\mathbb R^d\to \mathbb R$, $d\in\mathbb N$, of semilinear heat PDEs with Lipschitz continuous nonlinearities can be approximated by DNNs with ReLU activation at the terminal time in the $L^2$-sense without the COD provided that the initial value functions $\mathbb R^d\ni x\mapsto u_d(0,x)\in\mathbb R$, $d\in\mathbb N$, can be approximated by ReLU DNNs without the COD. It is the key contribution of this work to generalize this result by establishing this statement in the $L^p$-sense with $p\in(0,\infty)$ and by allowing the activation function to be more general covering the ReLU, the leaky ReLU, and the softplus activation functions as special cases.