Abstract:In integrative analyses of omics data, it is often of interest to extract data embedding from one data type that best reflect relations with another data type. This task is traditionally fulfilled by linear methods such as canonical correlation and partial least squares. However, information contained in one data type pertaining to the other data type may not be in the linear form. Deep learning provides a convenient alternative to extract nonlinear information. Here we develop a method Autoencoder-based Integrative Multi-omics data Embedding (AIME) to extract such information. Using a real gene expression - methylation dataset, we show that AIME extracted meaningful information that the linear approach could not find. The R implementation is available at http://web1.sph.emory.edu/users/tyu8/AIME/.
Abstract:A unique challenge in predictive model building for omics data has been the small number of samples $(n)$ versus the large amount of features $(p)$. This "$n\ll p$" property brings difficulties for disease outcome classification using deep learning techniques. Sparse learning by incorporating external gene network information such as the graph-embedded deep feedforward network (GEDFN) model has been a solution to this issue. However, such methods require an existing feature graph, and potential mis-specification of the feature graph can be harmful on classification and feature selection. To address this limitation and develop a robust classification model without relying on external knowledge, we propose a \underline{for}est \underline{g}raph-\underline{e}mbedded deep feedforward \underline{net}work (forgeNet) model, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task. To validate the method's capability, we experimented the forgeNet model with both synthetic and real datasets. The resulting high classification accuracy suggests that the method is a valuable addition to sparse deep learning models for omics data.
Abstract:We present a method of variable selection for the sparse generalized additive model. The method doesn't assume any specific functional form, and can select from a large number of candidates. It takes the form of incremental forward stagewise regression. Given no functional form is assumed, we devised an approach termed roughening to adjust the residuals in the iterations. In simulations, we show the new method is competitive against popular machine learning approaches. We also demonstrate its performance using some real datasets. The method is available as a part of the nlnet package on CRAN https://cran.r-project.org/package=nlnet.
Abstract:Gene expression data represents a unique challenge in predictive model building, because of the small number of samples $(n)$ compared to the huge amount of features $(p)$. This "$n<<p$" property has hampered application of deep learning techniques for disease outcome classification. Sparse learning by incorporating external gene network information could be a potential solution to this issue. Still, the problem is very challenging because (1) there are tens of thousands of features and only hundreds of training samples, (2) the scale-free structure of the gene network is unfriendly to the setup of convolutional neural networks. To address these issues and build a robust classification model, we propose the Graph-Embedded Deep Feedforward Networks (GEDFN), to integrate external relational information of features into the deep neural network architecture. The method is able to achieve sparse connection between network layers to prevent overfitting. To validate the method's capability, we conducted both simulation experiments and a real data analysis using a breast cancer RNA-seq dataset from The Cancer Genome Atlas (TCGA). The resulting high classification accuracy and easily interpretable feature selection results suggest the method is a useful addition to the current classification models and feature selection procedures. The method is available at https://github.com/yunchuankong/NetworkNeuralNetwork.