Recently, molecule generation using deep learning has been actively investigated in drug discovery. In this field, Transformer and VAE are widely used as powerful models, but they are rarely used in combination due to structural and performance mismatch of them. This study proposes a model that combines these two models through structural and parameter optimization in handling diverse molecules. The proposed model shows comparable performance to existing models in generating molecules, and showed by far superior performance in generating molecules with unseen structures. In addition, the proposed model successfully predicted molecular properties using the latent representation of VAE. Ablation studies suggested the advantage of VAE over other generative models like language model in generating novel molecules, and that the molecules can be described by ~32 dimensional variables, much smaller than existing descriptors and models. This study is expected to provide a virtual chemical library containing a wide variety of compounds for virtual screening and to enable efficient screening.