A basic problem in machine learning is to find a mapping $f$ from a low dimensional latent space to a high dimensional observation space. Equipped with the representation power of non-linearity, a learner can easily find a mapping which perfectly fits all the observations. However such a mapping is often not considered as good as it is not simple enough and over-fits. How to define simplicity? This paper tries to make such a formal definition of the amount of information imposed by a non-linear mapping. This definition is based on information geometry and is independent of observations, nor specific parametrizations. We prove these basic properties and discuss relationships with parametric and non-parametric embeddings.