Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hongzhe Sun

ChemCLIP: Bridging Organic and Inorganic Anticancer Compounds Through Contrastive Learning

Mar 30, 2026

Mohamad Koohi-Moghadam, Hongzhe Sun, Hongyan Li, Kyongtae Tyler Bae

Abstract:The discovery of anticancer therapeutics has traditionally treated organic small molecules and metal-based coordination complexes as separate chemical domains, limiting knowledge transfer despite their shared biological objectives. This disparity is particularly pronounced in available data, with extensive screening databases for organic compounds compared to only a few thousand characterized metal complexes. Here, we introduce ChemCLIP, a dual-encoder contrastive learning framework that bridges this organic-inorganic divide by learning unified representations based on shared anticancer activities rather than structural similarity. We compiled complementary datasets comprising 44,854 unique organic compounds and 5,164 unique metal complexes, standardized across 60 cancer cell lines. By training parallel encoders with activity-aware hard negative mining, we mapped structurally distinct compounds into a shared 256-dimensional embedding space where biologically similar compounds cluster together regardless of chemical class. We systematically evaluated four molecular encoding strategies: Morgan fingerprints, ChemBERTa, MolFormer, and Chemprop, through quantitative alignment metrics, embedding visualizations, and downstream classification tasks. Morgan fingerprints achieved superior performance with an average alignment ratio of 0.899 and downstream classification AUCs of 0.859 (inorganic) and 0.817 (organic). This work establishes contrastive learning as an effective strategy for unifying disparate chemical domains and provides empirical guidance for encoder selection in multi-modal chemistry applications, with implications extending beyond anticancer drug discovery to any scenario requiring cross-domain chemical knowledge transfer.

* 15 pages

Via

Access Paper or Ask Questions

Theoretical Data-Driven MobilePosenet: Lightweight Neural Network for Accurate Calibration-Free 5-DOF Magnet Localization

Jan 06, 2025

Wenxuan Xie, Yuelin Zhang, Jiwei Shan, Hongzhe Sun, Jiewen Tan, Shing Shin Cheng

Figure 1 for Theoretical Data-Driven MobilePosenet: Lightweight Neural Network for Accurate Calibration-Free 5-DOF Magnet Localization

Figure 2 for Theoretical Data-Driven MobilePosenet: Lightweight Neural Network for Accurate Calibration-Free 5-DOF Magnet Localization

Figure 3 for Theoretical Data-Driven MobilePosenet: Lightweight Neural Network for Accurate Calibration-Free 5-DOF Magnet Localization

Figure 4 for Theoretical Data-Driven MobilePosenet: Lightweight Neural Network for Accurate Calibration-Free 5-DOF Magnet Localization

Abstract:Permanent magnet tracking using the external sensor array is crucial for the accurate localization of wireless capsule endoscope robots. Traditional tracking algorithms, based on the magnetic dipole model and Levenberg-Marquardt (LM) algorithm, face challenges related to computational delays and the need for initial position estimation. More recently proposed neural network-based approaches often require extensive hardware calibration and real-world data collection, which are time-consuming and labor-intensive. To address these challenges, we propose MobilePosenet, a lightweight neural network architecture that leverages depthwise separable convolutions to minimize computational cost and a channel attention mechanism to enhance localization accuracy. Besides, the inputs to the network integrate the sensors' coordinate information and random noise, compensating for the discrepancies between the theoretical model and the actual magnetic fields and thus allowing MobilePosenet to be trained entirely on theoretical data. Experimental evaluations conducted in a $90 \times 90 \times 80$ mm workspace demonstrate that MobilePosenet exhibits excellent 5-DOF localization accuracy ($1.54 \pm 1.03$ mm and $2.24 \pm 1.84^{\circ}$) and inference speed (0.9 ms) against state-of-the-art methods trained on real-world data. Since network training relies solely on theoretical data, MobilePosenet can eliminate the hardware calibration and real-world data collection process, improving the generalizability of this permanent magnet localization method and the potential for rapid adoption in different clinical settings.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions