Hiding sensitive knowledge without side effects
Sensitive knowledge hiding in large transactional databases is one of the major goals of privacy preserving data mining. However, it is only recently that researchers were able to identify exact solutions for the hiding of knowledge, depicted in the form of sensitive frequent itemsets and their related association rules. Exact solutions allow for the hiding of vulnerable knowledge without any critical compromises, such as the hiding of nonsensitive patterns or the accidental uncovering of infrequent itemsets, amongst the frequent ones, in the sanitized outcome. In this paper, we highlight the process of border revision, which plays a significant role towards the identification of exact hiding solutions, and we provide efficient algorithms for the computation of the revised borders. Furthermore, we review two algorithms that identify exact hiding solutions, and we extend the functionality of one of them to effectively identify exact solutions for a wider range of problems (than its original counterpart). Following that, we introduce a novel framework for decomposition and parallel solving of hiding problems, which are handled by each of these approaches. This framework improves to a substantial degree the size of the problems that both algorithms can handle and significantly decreases their runtime. Through experimentation, we demonstrate the effectiveness of these approaches toward providing high quality knowledge hiding solutions.