Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chencheng Xu

CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models

May 28, 2025

Junbo Yin, Chao Zha, Wenjia He, Chencheng Xu, Xin Gao

Abstract:Existing PLMs generate protein sequences based on a single-condition constraint from a specific modality, struggling to simultaneously satisfy multiple constraints across different modalities. In this work, we introduce CFP-Gen, a novel diffusion language model for Combinatorial Functional Protein GENeration. CFP-Gen facilitates the de novo protein design by integrating multimodal conditions with functional, sequence, and structural constraints. Specifically, an Annotation-Guided Feature Modulation (AGFM) module is introduced to dynamically adjust the protein feature distribution based on composable functional annotations, e.g., GO terms, IPR domains and EC numbers. Meanwhile, the Residue-Controlled Functional Encoding (RCFE) module captures residue-wise interaction to ensure more precise control. Additionally, off-the-shelf 3D structure encoders can be seamlessly integrated to impose geometric constraints. We demonstrate that CFP-Gen enables high-throughput generation of novel proteins with functionality comparable to natural proteins, while achieving a high success rate in designing multifunctional proteins. Code and data available at https://github.com/yinjunbo/cfpgen.

* Accepted at ICML 2025. Code is available at https://github.com/yinjunbo/cfpgen

Via

Access Paper or Ask Questions

Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Mar 05, 2022

Chencheng Xu, Zhiwei Hong, Minlie Huang, Tao Jiang

Figure 1 for Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Figure 2 for Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Figure 3 for Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Figure 4 for Acceleration of Federated Learning with Alleviated Forgetting in Local Training

Abstract:Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy by independently training local models on each client and then aggregating parameters on a central server, thereby producing an effective global model. Although a variety of FL algorithms have been proposed, their training efficiency remains low when the data are not independently and identically distributed (non-i.i.d.) across different clients. We observe that the slow convergence rates of the existing methods are (at least partially) caused by the catastrophic forgetting issue during the local training stage on each individual client, which leads to a large increase in the loss function concerning the previous training data at the other clients. Here, we propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage by regularizing locally trained parameters with the loss on generated pseudo data, which encode the knowledge of previous training data learned by the global model. Our comprehensive experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep and the clients' data are extremely non-i.i.d., but is also able to protect privacy better in classification problems and more robust against gradient inversion attacks. The code is available at: https://github.com/Zoesgithub/FedReg.

* In International Conference on Learning Representations (2021, Sept)

Via

Access Paper or Ask Questions

Reinforced Molecular Optimization with Neighborhood-Controlled Grammars

Nov 14, 2020

Chencheng Xu, Qiao Liu, Minlie Huang, Tao Jiang

Figure 1 for Reinforced Molecular Optimization with Neighborhood-Controlled Grammars

Figure 2 for Reinforced Molecular Optimization with Neighborhood-Controlled Grammars

Figure 3 for Reinforced Molecular Optimization with Neighborhood-Controlled Grammars

Figure 4 for Reinforced Molecular Optimization with Neighborhood-Controlled Grammars

Abstract:A major challenge in the pharmaceutical industry is to design novel molecules with specific desired properties, especially when the property evaluation is costly. Here, we propose MNCE-RL, a graph convolutional policy network for molecular optimization with molecular neighborhood-controlled embedding grammars through reinforcement learning. We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation and design an efficient algorithm to infer grammatical production rules from given molecules. The use of grammars guarantees the validity of the generated molecular structures. By transforming molecular graphs to parse trees with the inferred grammars, the molecular structure generation task is modeled as a Markov decision process where a policy gradient strategy is utilized. In a series of experiments, we demonstrate that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks and exhibits significant superiority in optimizing molecular properties with a limited number of property evaluations.

* Advances in Neural Information Processing Systems, 33 (2020)
* 12 pages;two figures; NeurIPS 2020

Via

Access Paper or Ask Questions