Inference of a robust diagnostic signature in the case of melanoma: Gene selection by information gain and gene ontology tree exploration
Date
2013Résumé
Integrated datasets originating from multi-modal data can be used towards the identification of causal biological actions that through a systems level process trigger the development of a disease. We use, here, an integrated dataset related to cutaneous melanoma that comes from two separate sets (microarray and imaging) and the application of data imputation methods. Our goal is to select a subset of genes that comprise candidate biomarkers and compare these to imaging features, that characterize disease at a macroscopic level. Using information gain ratio measurements and exploration of Gene Ontology (GO) tree, we identified a set of 33 genes both highly correlated to the disease status and with a central role in regulatory mechanisms. Selected genes were used to train various classifiers that could generalize well when discriminating malignant from benign melanoma samples. Results showed that classifiers performed better when selected genes were used as input, rather than imaging features selected by information gain measurements. Thus, genes in the backstage of low-level biological processes showed to carry higher information content than the macroscopic imaging features. © 2013 IEEE.