A data perturbation approach to sensitive classification rule hiding
This paper focuses on privacy preservation in classification rule mining. The subject at hand is approached through the proposition of a data perturbation approach for hiding sensitive classification rules in categorical datasets. Such a methodology is absolutely necessary in case the data needs to be published on the web so that it is amply available for public use as opposed to other approaches like output perturbation or cryptographic techniques that restrict the usability of the data in different ways. This methodology is based upon the unique characteristics of sequential covering classification algorithms. It modifies the tuples of sensitive rules of a dataset D in such a way that these are distributed to the "more important" non-sensitive rules. In addition it assures that the tuples belonging to the sensitive rules are assigned to the non-sensitive rules in proportion to their rank in the ruleset. In that way, it is ensured that not only the sensitive rules are hidden but also that the current structure of the ruleset, thus the information value of the dataset, is preserved. Moreover a modification of the basic method which exhibits an alternative distribution procedure is also presented. Finally, a series of experiments are executed in order to evaluate the validity and effectiveness of the proposed approaches against existing similar ones. © 2010 ACM.