A classification system based on a new wrapper feature selection algorithm for the diagnosis of primary and secondary polycythemia
Primary and Secondary Polycythemia are diseases of the bone marrow that affect the blood's composition and prohibit patients from becoming blood donors. Since these diseases may become fatal, their early diagnosis is important. In this paper, a classification system for the diagnosis of Primary and Secondary Polycythemia is proposed. The proposed system classifies input data into three classes; Healthy, Primary Polycythemic (PP) and Secondary Polycythemic (SP) and is implemented using two separate binary classification levels. The first level performs the Healthy/non-Healthy classification and the second level the PP/SP classification. To this end, a novel wrapper feature selection algorithm, called the LM-FM algorithm, is presented in order to maximize the classifier's performance. The algorithm is comprised of two stages that are applied sequentially: the Local Maximization (LM) stage and the Floating Maximization (FM) stage. The LM stage finds the best possible subset of a fixed predefined size, which is then used as an input for the next stage. The FM stage uses a floating size technique to search for an even better solution by varying the initially provided subset size. Then, the Support Vector Machine (SVM) classifier is used for the discrimination of the data at each classification level. The proposed classification system is compared with various well-established feature selection techniques such as the Sequential Floating Forward Selection (SFFS) and the Maximum Output Information (MOI) wrapper schemes, and with standalone classification techniques such as the Multilayer Perceptron (MLP) and SVM classifier. The proposed LM-FM feature selection algorithm combined with the SVM classifier increases the overall performance of the classification system, scoring up to 98.9% overall accuracy at the first classification level and up to 96.6% at the second classification level. Moreover, it provides excellent robustness regardless of the size of the input feature subset used. (C) 2013 Elsevier Ltd. All rights reserved.