Online clustering of distributed streaming data using belief propagation techniques
Data
2011Soggetto
Abstract
Extraction of patterns out of streaming data that are generated from geographically dispersed devices is a major challenge in data mining. The sequential, distributed fashion in which data become available to the decision maker, together with the fact that the decision maker needs to rely only on recently received data due to storage and communication constraints, render the objective of keeping track of data evolution a nontrivial one. We consider a set of distributed nodes that communicate directly with a central location. We address the problem of clustering distributed streaming data through a two-level clustering approach. We adopt belief propagation techniques to perform stream clustering at both levels. At the node level, a batch of data arrives at each time slot, and the goal is to maintain a set of salient data (local exemplars) at each time slot, which best represents the data received up to that slot. At each epoch, the local exemplars from distributed nodes are sent to the central location, which in turn performs a second-level clustering on them to derive a data synopsis global for the whole system. The local exemplars that emerge from the second level clustering procedure are fed back to the nodes with appropriately modified weights which reflect their importance in global clustering. As demonstrated by our experiments, the two-level belief propagation-based clustering approach together with the feedback is ideal for handling data from different nodes, as it has the same performance in terms of clustering quality with the case where the clustering is performed on the raw data sent from nodes to the central location. © 2011 IEEE.
Collections
Related items
Showing items related by title, author, creator and subject.
-
A Scalable Short-Text Clustering Algorithm Using Apache Spark
Akritidis L., Alamaniotis M., Fevgas A., Bozanis P. (2021)Short text clustering deals with the problem of grouping together semantically similar documents with small lengths. Nowadays, huge amounts of text data is being generated by numerous applications such as microblogs, ... -
Distributed clustering in vehicular networks
Maglaras, L. A.; Katsaros, D. (2012)Clustering in vanets is of crucial importance in order to cope with the dynamic features of the vehicular topologies. Algorithms that give good results in Manets fail to create stable clusters since vehicular nodes are ... -
Improving Hierarchical Short Text Clustering through Dominant Feature Learning
Akritidis L., Alamaniotis M., Fevgas A., Tsompanopoulou P., Bozanis P. (2022)This paper focuses on the popular problem of short text clustering. Since the short text documents typically exhibit high degrees of data sparseness and dimensionality, the problem in question is generally considered more ...