Abstract:Therapeutic antibodies require not only high-affinity target engagement, but also favorable manufacturability, stability, and safety profiles for clinical effectiveness. These properties are collectively called `developability'. To enable a computational framework for optimizing antibody sequences for favorable developability, we introduce a guided discrete diffusion model trained on natural paired heavy- and light-chain sequences from the Observed Antibody Space (OAS) and quantitative developability measurements for 246 clinical-stage antibodies. To steer generation toward biophysically viable candidates, we integrate a Soft Value-based Decoding in Diffusion (SVDD) Module that biases sampling without compromising naturalness. In unconstrained sampling, our model reproduces global features of both the natural repertoire and approved therapeutics, and under SVDD guidance we achieve significant enrichment in predicted developability scores over unguided baselines. When combined with high-throughput developability assays, this framework enables an iterative, ML-driven pipeline for designing antibodies that satisfy binding and biophysical criteria in tandem.
Abstract:Efficient management of storage resources in big data and cloud computing environments requires accurate identification of data's "cold" and "hot" states. Traditional methods, such as rule-based algorithms and early AI techniques, often struggle with dynamic workloads, leading to low accuracy, poor adaptability, and high operational overhead. To address these issues, we propose a novel solution based on online learning strategies. Our approach dynamically adapts to changing data access patterns, achieving higher accuracy and lower operational costs. Rigorous testing with both synthetic and real-world datasets demonstrates a significant improvement, achieving a 90% accuracy rate in hot-cold classification. Additionally, the computational and storage overheads are considerably reduced.