A Covering Classification Rule Induction Approach for Big Datasets
Επιτομή
With the ever increasing production of data from various heterogeneous sources in modern information societies, the need for scalable data-intensive processing is increasing. MapReduce quickly became the de facto framework for large scale data analysis, due to its simple and abstract programming model and its efficient underlying execution system. However, this simplicity comes with a price: its unidirectional communication model and the lack of support for iterations, makes repeated querying of datasets difficult and imposes limitations in many fields including Machine Learning. In this paper we describe the implementation of a classification rule induction algorithm based on MapReduce, with the aim of building a classification model within as few iterations as possible. After a thorough description of the algorithm, we evaluate its performance from three perspectives: its accuracy, its parallel performance and the communication costs. The evaluations indicate that the approach is scalable and since it produces a comprehensive human-readable model it can be proven valuable for a wide range of applications. © 2014 IEEE.