We introduce the Kernel-Elastic Autoencoder (KAE), a self-supervised generative model based on the transformer architecture with enhanced performance for molecular design. KAE is formulated based on two novel loss functions: modified maximum mean discrepancy and weighted reconstruction. KAE addresses the long-standing challenge of achieving valid generation and accurate reconstruction at the same time. KAE achieves remarkable diversity in molecule generation while maintaining near-perfect reconstructions on the independent testing dataset, surpassing previous molecule-generating models. KAE enables conditional generation and allows for decoding based on beam search resulting in state-of-the-art performance in constrained optimizations. Furthermore, KAE can generate molecules conditional to favorable binding affinities in docking applications as confirmed by AutoDock Vina and Glide scores, outperforming all existing candidates from the training dataset. Beyond molecular design, we anticipate KAE could be applied to solve problems by generation in a wide range of applications.
Precise physical descriptions of molecules can be obtained by solving the Schrodinger equation; however, these calculations are intractable and even approximations can be cumbersome. Force fields, which estimate interatomic potentials based on empirical data, are also time-consuming. This paper proposes a new methodology for modeling a set of physical parameters by taking advantage of the restricted Boltzmann machine's fast learning capacity and representational power. By training the machine on ab initio data, we can predict new data in the distribution of molecular configurations matching the ab initio distribution. In this paper we introduce a new RBM based on the Tanh activation function, and conduct a comparison of RBMs with different activation functions, including sigmoid, Gaussian, and (Leaky) ReLU. Finally we demonstrate the ability of Gaussian RBMs to model small molecules such as water and ethane.