Enhancing Clustering of Single-Cell RNA-Seq Data by Proximity Learning on Random Projected Spaces
Ημερομηνία
2019Γλώσσα
en
Λέξη-κλειδί
Επιτομή
We are in the era of single-cell RNA sequencing technology, which offers a great potential for uncovering cellular differences with a higher resolution, shedding light in various complex biological processes and complex human diseases. However, such studies create extremely high dimensional data isolating expression profiles for thousands or even millions of cells. Consequently, dealing with single-cell RNA-seq (scRNA-seq) data is considered the main challenge for unsupervised clustering, which can be used in order to identify grouped cell types. Towards this direction, we present a framework that enhances hierarchical clustering utilizing Proximity Learning on Random Projected spaces (PLRP). The proposed method's efficiency lies in the fact that we exploit the distances from multiple significantly lower dimension spaces defined by Random Projections using ensembles of k-nearest neighbor searches. In the transformed data we applied hierarchical agglomerative clustering (HAC) improving significantly its performance when compared against using the original space. The performance of the proposed PLRP was evaluated in a publicly available experimental dataset with scRNA-seq expression profiles, against three well-established clustering tools. The results showed that our approach greatly enhances clustering performance exposing its applicability in ultra-high dimensions and imposing further development towards this direction. © 2019 IEEE.
Collections
Related items
Showing items related by title, author, creator and subject.
-
A Scalable Short-Text Clustering Algorithm Using Apache Spark
Akritidis L., Alamaniotis M., Fevgas A., Bozanis P. (2021)Short text clustering deals with the problem of grouping together semantically similar documents with small lengths. Nowadays, huge amounts of text data is being generated by numerous applications such as microblogs, ... -
Online clustering of distributed streaming data using belief propagation techniques
Halkidi, M.; Koutsopoulos, I. (2011)Extraction of patterns out of streaming data that are generated from geographically dispersed devices is a major challenge in data mining. The sequential, distributed fashion in which data become available to the decision ... -
Distributed clustering in vehicular networks
Maglaras, L. A.; Katsaros, D. (2012)Clustering in vanets is of crucial importance in order to cope with the dynamic features of the vehicular topologies. Algorithms that give good results in Manets fail to create stable clusters since vehicular nodes are ...