Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tatsuya Sagawa

How Well Do Large-Scale Chemical Language Models Transfer to Downstream Tasks?

Feb 17, 2026

Tatsuya Sagawa, Ryosuke Kojima

Abstract:Chemical Language Models (CLMs) pre-trained on large scale molecular data are widely used for molecular property prediction. However, the common belief that increasing training resources such as model size, dataset size, and training compute improves both pretraining loss and downstream task performance has not been systematically validated in the chemical domain. In this work, we evaluate this assumption by pretraining CLMs while scaling training resources and measuring transfer performance across diverse molecular property prediction (MPP) tasks. We find that while pretraining loss consistently decreases with increased training resources, downstream task performance shows limited improvement. Moreover, alternative metrics based on the Hessian or loss landscape also fail to estimate downstream performance in CLMs. We further identify conditions under which downstream performance saturates or degrades despite continued improvements in pretraining metrics, and analyze the underlying task dependent failure modes through parameter space visualizations. These results expose a gap between pretraining based evaluation and downstream performance, and emphasize the need for model selection and evaluation strategies that explicitly account for downstream task characteristics.

Via

Access Paper or Ask Questions

ReactionT5: a large-scale pre-trained model towards application of limited reaction data

Nov 12, 2023

Tatsuya Sagawa, Ryosuke Kojima

Figure 1 for ReactionT5: a large-scale pre-trained model towards application of limited reaction data

Figure 2 for ReactionT5: a large-scale pre-trained model towards application of limited reaction data

Figure 3 for ReactionT5: a large-scale pre-trained model towards application of limited reaction data

Figure 4 for ReactionT5: a large-scale pre-trained model towards application of limited reaction data

Abstract:Transformer-based deep neural networks have revolutionized the field of molecular-related prediction tasks by treating molecules as symbolic sequences. These models have been successfully applied in various organic chemical applications by pretraining them with extensive compound libraries and subsequently fine-tuning them with smaller in-house datasets for specific tasks. However, many conventional methods primarily focus on single molecules, with limited exploration of pretraining for reactions involving multiple molecules. In this paper, we propose ReactionT5, a novel model that leverages pretraining on the Open Reaction Database (ORD), a publicly available large-scale resource. We further fine-tune this model for yield prediction and product prediction tasks, demonstrating its impressive performance even with limited fine-tuning data compared to traditional models. The pre-trained ReactionT5 model is publicly accessible on the Hugging Face platform.

Via

Access Paper or Ask Questions