Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gufeng Yu

Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

Nov 16, 2025

Runhan Shi, Letian Chen, Gufeng Yu, Yang Yang

Figure 1 for Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

Figure 2 for Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

Figure 3 for Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

Figure 4 for Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

Abstract:Chemical reaction prediction remains a fundamental challenge in organic chemistry, where existing machine learning models face two critical limitations: sensitivity to input permutations (molecule/atom orderings) and inadequate modeling of substructural interactions governing reactivity. These shortcomings lead to inconsistent predictions and poor generalization to real-world scenarios. To address these challenges, we propose ReaDISH, a novel reaction prediction model that learns permutation-invariant representations while incorporating interaction-aware features. It introduces two innovations: (1) symmetric difference shingle encoding, which extends the differential reaction fingerprint (DRFP) by representing shingles as continuous high-dimensional embeddings, capturing structural changes while eliminating order sensitivity; and (2) geometry-structure interaction attention, a mechanism that models intra- and inter-molecular interactions at the shingle level. Extensive experiments demonstrate that ReaDISH improves reaction prediction performance across diverse benchmarks. It shows enhanced robustness with an average improvement of 8.76% on R$^2$ under permutation perturbations.

Via

Access Paper or Ask Questions

RTMol: Rethinking Molecule-text Alignment in a Round-trip View

Nov 15, 2025

Letian Chen, Runhan Shi, Gufeng Yu, Yang Yang

Figure 1 for RTMol: Rethinking Molecule-text Alignment in a Round-trip View

Figure 2 for RTMol: Rethinking Molecule-text Alignment in a Round-trip View

Figure 3 for RTMol: Rethinking Molecule-text Alignment in a Round-trip View

Figure 4 for RTMol: Rethinking Molecule-text Alignment in a Round-trip View

Abstract:Aligning molecular sequence representations (e.g., SMILES notations) with textual descriptions is critical for applications spanning drug discovery, materials design, and automated chemical literature analysis. Existing methodologies typically treat molecular captioning (molecule-to-text) and text-based molecular design (text-to-molecule) as separate tasks, relying on supervised fine-tuning or contrastive learning pipelines. These approaches face three key limitations: (i) conventional metrics like BLEU prioritize linguistic fluency over chemical accuracy, (ii) training datasets frequently contain chemically ambiguous narratives with incomplete specifications, and (iii) independent optimization of generation directions leads to bidirectional inconsistency. To address these issues, we propose RTMol, a bidirectional alignment framework that unifies molecular captioning and text-to-SMILES generation through self-supervised round-trip learning. The framework introduces novel round-trip evaluation metrics and enables unsupervised training for molecular captioning without requiring paired molecule-text corpora. Experiments demonstrate that RTMol enhances bidirectional alignment performance by up to 47% across various LLMs, establishing an effective paradigm for joint molecule-text understanding and generation.

Via

Access Paper or Ask Questions