In this paper we first analyzed the inductive bias underlying the data scattered across complex free energy landscapes (FEL), and exploited it to train deep neural networks which yield reduced and clustered representation for the FEL. Our parametric method, called Information Distilling of Metastability (IDM), is end-to-end differentiable thus scalable to ultra-large dataset. IDM is also a clustering algorithm and is able to cluster the samples in the meantime of reducing the dimensions. Besides, as an unsupervised learning method, IDM differs from many existing dimensionality reduction and clustering methods in that it neither requires a cherry-picked distance metric nor the ground-true number of clusters, and that it can be used to unroll and zoom-in the hierarchical FEL with respect to different timescales. Through multiple experiments, we show that IDM can achieve physically meaningful representations which partition the FEL into well-defined metastable states hence are amenable for downstream tasks such as mechanism analysis and kinetic modeling.