Heterogeneous data fusion and selection in high-volume molecular and imaging datasets
In this work, two disparate datasets, concerning the study of the same physiological type of cutaneous melanoma but derived from different donors, one of image (dermatoscopy) and the other of molecular (trascriptomic expression) origin are utilized, so as to form an expanded in description depth, integrative dataset. Four different imputation methods are employed in order to derive the unified dataset, prior the application of backward selection together with ensemble classifiers (random forests). The various imputation schemes applied, manage to emulate the effect of biological noise on the unified dataset, adding realistic signal variation. Thus, they immunize the discovery process in the integrative dataset, from false positive artifacts, which do not have a true differential effect. The results suggest that the expansion of the feature space through the data integration and the exploitation of elaborate imputation schemes in general, aid the classification task, imparting stability as regards the derivation of the putative classifiers. © 2012 IEEE.