A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data

Abstract

Gene expression data represents a unique challenge in predictive modelbuilding, because of the small number of samples $(n)$ compared to the hugeamount of features $(p)$. This "$n<<p$" property has hampered application ofdeep learning techniques for disease outcome classification. Sparse learning byincorporating external gene network information could be a potential solutionto this issue. Still, the problem is very challenging because (1) there aretens of thousands of features and only hundreds of training samples, (2) thescale-free structure of the gene network is unfriendly to the setup ofconvolutional neural networks. To address these issues and build a robustclassification model, we propose the Graph-Embedded Deep Feedforward Networks(GEDFN), to integrate external relational information of features into the deepneural network architecture. The method is able to achieve sparse connectionbetween network layers to prevent overfitting. To validate the method'scapability, we conducted both simulation experiments and a real data analysisusing a breast cancer RNA-seq dataset from The Cancer Genome Atlas (TCGA). Theresulting high classification accuracy and easily interpretable featureselection results suggest the method is a useful addition to the currentclassification models and feature selection procedures. The method is availableat https://github.com/yunchuankong/NetworkNeuralNetwork.

Quick Read (beta)

loading the full paper ...