RuleMR: Classification rule discovery with MapReduce
Date
2014Sujet
Résumé
The vast amounts of data generated, exchanged and consumed on a daily basis by contemporary networks and devices renders their analysis a cumbersome procedure with inherent difficulties. On the one hand, the need for efficient Machine Learning algorithms and tools that scale on large datasets is continuously growing. On the other, parallel or distributed solutions have proven to conceal many pitfalls. The MapReduce programming model has quickly emerged as the de facto model for executing simple algorithmic tasks over huge volumes of data, since it is simple, highly abstract and efficient. However, due to its unidirectional communication model and the inherent lack of support for iterative execution, few Machine Learning algorithms can easily be implemented on MapReduce. In this paper, we present a classification rule discovery algorithm, namely RuleMR, which despite its iterative nature, can capitalize on MapReduce. In order to construct quality rules in less iterations, the algorithm exploits the distributed nature of MapReduce to explore only the promising areas in the search space. We conduct a series of experimental evaluations which indicate that the proposed approach not only scales well with respect to the size of the training dataset, but also, in many cases, the resulting model is comparable to many well known algorithms in matters of accuracy. © 2014 IEEE.