Abstract:An important goal in cancer research is the survival prognosis of a patient based on a minimal panel of genomic and molecular markers such as genes or proteins. Purely data-driven models without any biological knowledge can produce non-interpretable results. We propose a penalized semiparametric Bayesian Cox model with graph-structured selection priors for sparse identification of multi-omics features by making use of a biologically meaningful graph via a Markov random field (MRF) prior to capturing known relationships between multi-omics features. Since the fixed graph in the MRF prior is for the prior probability distribution, it is not a hard constraint to determine variable selection, so the proposed model can verify known information and has the potential to identify new and novel biomarkers for drawing new biological knowledge. Our simulation results show that the proposed Bayesian Cox model with graph-based prior knowledge results in more trustable and stable variable selection and non-inferior survival prediction, compared to methods modeling the covariates independently without any prior knowledge. The results also indicate that the performance of the proposed model is robust to a partially correct graph in the MRF prior, meaning that in a real setting where not all the true network information between covariates is known, the graph can still be useful. The proposed model is applied to the primary invasive breast cancer patients data in The Cancer Genome Atlas project.
Abstract:In this paper, we consider the regularized multi-response regression problem where there exists some structural relation within the responses and also between the covariates and a set of modifying variables. To handle this problem, we propose MADMMplasso, a novel regularized regression method. This method is able to find covariates and their corresponding interactions, with some joint association with multiple related responses. We allow the interaction term between covariate and modifying variable to be included in a (weak) asymmetrical hierarchical manner by first considering whether the corresponding covariate main term is in the model. For parameter estimation, we develop an ADMM algorithm that allows us to implement the overlapping groups in a simple way. The results from the simulations and analysis of a pharmacogenomic screen data set show that the proposed method has an advantage in handling correlated responses and interaction effects, both with respect to prediction and variable selection performance.