Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data

Zintzaras, E.; Kowald, A.

Συγγραφέας

Zintzaras, E.; Kowald, A.

Ημερομηνία

2010

DOI

10.1016/j.compbiomed.2010.03.006

Εμφάνιση Μεταδεδομένων

Επιτομή

Classification into multiple classes when the measured variables are outnumbered is a major methodological challenge in -omics studies. Two algorithms that overcome the dimensionality problem are presented: the forest classification tree (FCT) and the forest support vector machines (FSVM). In FCT, a set of variables is randomly chosen and a classification tree (CT) is grown using a forward classification algorithm. The process is repeated and a forest of CTs is derived. Finally, the most frequent variables from the trees with the smallest apparent misclassification rate (AMR) are used to construct a productive tree. In FSVM, the CTs are replaced by SVMs. The methods are demonstrated using prostate gene expression data for classifying tissue samples into four tumor types. For threshold split value 0.001 and utilizing 100 markers the productive CT consisted of 29 terminal nodes and achieved perfect classification (AMR=0). When the threshold value was set to 0.01, a tree with 17 terminal nodes was constructed based on 15 markers (AMR=7%). In FSVM, reducing the fraction of the forest that was used to construct the best classifier from the top 80% to the top 20% reduced the misclassification to 25% (when using 200 markers). The proposed methodologies may be used for identifying important variables in high dimensional data. Furthermore, the FT allows exploring the data structure and provides a decision rule. (C) 2010 Elsevier Ltd. All rights reserved.

URI

http://hdl.handle.net/11615/34965

Collections

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]