Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sreejata Dey

Model synthesis and identifiability analysis of stiff chemical reaction systems with inVAErt networks

May 05, 2026

Sreejata Dey, Guoxiang Grayson Tong, Jonathan F. MacArt, Daniele E. Schiavazzi

Abstract:We consider the problem of learning data-driven replicas for stiff systems of ordinary differential equations arising in chemical kinetics that can be evaluated with high computational efficiency. We first focus on training emulators for families of reaction equations under varying reaction rates, using conditional residual networks or long-short term memory architectures. We then apply a recently proposed data-driven framework known as ``inVAErt networks'' to address the ill-posed inverse problem of inferring reaction rates, integration time, and possibly initial conditions from a target set of species concentrations - a problem that has received relatively little attention in the literature. The proposed approach is demonstrated on chemical systems with reversible and irreversible kinetics, spanning 2 to 20 differential equations, 3 to 20 chemical species, and 3 to 25 reaction rate parameters. Relative root mean squared errors produced by the proposed emulators range from $10^{-5}$ for lower-dimensional systems to $10^{-4}$ and $10^{-3}$ for an air pollution model and a hydrogen-air reaction system, respectively. Manifolds of non-identifiable reaction rates recovered by the proposed approach can be analytically verified for simple systems and are consistent with local identifiability analysis in higher dimensions.

Via

Access Paper or Ask Questions

Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

Mar 12, 2026

Khiem Le, Sreejata Dey, Marcos Martínez Galindo, Vanessa Lopez, Ting Hua, Nitesh V. Chawla, Hoang Thanh Lam

Abstract:Molecular Property Prediction (MPP) is a central task in drug discovery. While Large Language Models (LLMs) show promise as generalist models for MPP, their current performance remains below the threshold for practical adoption. We propose TreeKD, a novel knowledge distillation method that transfers complementary knowledge from tree-based specialist models into LLMs. Our approach trains specialist decision trees on functional group features, then verbalizes their learned predictive rules as natural language to enable rule-augmented context learning. This enables LLMs to leverage structural insights that are difficult to extract from SMILES strings alone. We further introduce rule-consistency, a test-time scaling technique inspired by bagging that ensembles predictions across diverse rules from a Random Forest. Experiments on 22 ADMET properties from the TDC benchmark demonstrate that TreeKD substantially improves LLM performance, narrowing the gap with SOTA specialist models and advancing toward practical generalist models for molecular property prediction.

Via

Access Paper or Ask Questions