Recent developments in synthetic biology, next-generation sequencing, and machine learning provide an unprecedented opportunity to rationally design new disease treatments based on measured responses to gene perturbations and drugs to reprogram cells. The main challenges to seizing this opportunity are the incomplete knowledge of the cellular network and the combinatorial explosion of possible interventions, both of which are insurmountable by experiments. To address these challenges, we develop a transfer learning approach to control cell behavior that is pre-trained on transcriptomic data associated with human cell fates, thereby generating a model of the network dynamics that can be transferred to specific reprogramming goals. The approach combines transcriptional responses to gene perturbations to minimize the difference between a given pair of initial and target transcriptional states. We demonstrate our approach's versatility by applying it to a microarray dataset comprising >9,000 microarrays across 54 cell types and 227 unique perturbations, and an RNASeq dataset consisting of >10,000 sequencing runs across 36 cell types and 138 perturbations. Our approach reproduces known reprogramming protocols with an AUROC of 0.91 while innovating over existing methods by pre-training an adaptable model that can be tailored to specific reprogramming transitions. We show that the number of gene perturbations required to steer from one fate to another increases with decreasing developmental relatedness and that fewer genes are needed to progress along developmental paths than to regress. These findings establish a proof-of-concept for our approach to computationally design control strategies and provide insights into how gene regulatory networks govern phenotype.
The relationship between microscopic observations and macroscopic behavior is a fundamental open question in biophysical systems. Here, we develop a unified approach that---in contrast with existing methods---predicts cell type from macromolecular data even when accounting for the scale of human tissue diversity and limitations in the available data. We achieve these benefits by applying a k-nearest-neighbors algorithm after projecting our data onto the eigenvectors of the correlation matrix inferred from many observations of gene expression or chromatin conformation. Our approach identifies variations in epigenotype that impact cell type, thereby supporting the cell type attractor hypothesis and representing the first step toward model-independent control strategies in biological systems.