Whereas recovery of the manifold from data is a well-studied topic, approximation rates for functions defined on manifolds are less known. In this work, we study a regression problem with inputs on a $d^*$-dimensional manifold that is embedded into a space with potentially much larger ambient dimension. It is shown that sparsely connected deep ReLU networks can approximate a H\"older function with smoothness index $\beta$ up to error $\epsilon$ using of the order of $\epsilon^{-d^*/\beta}\log(1/\epsilon)$ many non-zero network parameters. As an application, we derive statistical convergence rates for the estimator minimizing the empirical risk over all possible choices of bounded network parameters.