Logo
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Ελληνικά 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Σύνδεση
Προβολή τεκμηρίου 
  •   Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Προβολή τεκμηρίου
  •   Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Προβολή τεκμηρίου
JavaScript is disabled for your browser. Some features of this site may not work without it.
Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
Όλο το DSpace
  • Κοινότητες & Συλλογές
  • Ανά ημερομηνία δημοσίευσης
  • Συγγραφείς
  • Τίτλοι
  • Λέξεις κλειδιά

Confronting Sparseness and High Dimensionality in Short Text Clustering via Feature Vector Projections

Thumbnail
Συγγραφέας
Akritidis L., Alamaniotis M., Fevgas A., Bozanis P.
Ημερομηνία
2020
Γλώσσα
en
DOI
10.1109/ICTAI50040.2020.00129
Λέξη-κλειδί
Artificial intelligence
Cluster analysis
Vector spaces
Data clustering algorithm
Experimental evaluation
High dimensional spaces
Normalized mutual information
Real-world datasets
Short-text documents
Split-and-merge operations
Two-stage clustering
Clustering algorithms
IEEE Computer Society
Εμφάνιση Μεταδεδομένων
Επιτομή
Short text clustering is a popular problem that focuses on the unsupervised grouping of similar short text documents, or entitled entities. Since the short texts are currently being utilized in a vast number of applications, the problem in question has been rendered increasingly significant in the past few years. The high cluster homogeneity and completeness are two among the most important goals of all data clustering algorithms. However, in the context of short texts, their fulfilment is particularly difficult, because this type of data is typically represented by sparse vectors that collectively comprise a very high dimensional space. In this article we introduce VEPHC, a two-stage clustering algorithm designed to confront the sparseness and high dimensionality traits of short texts. During the first stage (or else, the VEP part), the initial feature vectors are projected onto a lower dimensional space by constructing and scoring variable-sized combinations of features (that is, terms). In the second stage (or else, the HC part), VEPHC improves the homogeneity and completeness of the generated clusters through split and merge operations that are based on the similarities of all inter-cluster elements. The experimental evaluation of VEPHC on two real-world datasets demonstrates its superior performance over numerous state-of-The-Art clustering algorithms in terms of F1 scores and Normalized Mutual Information. © 2020 IEEE.
URI
http://hdl.handle.net/11615/70350
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

Related items

Showing items related by title, author, creator and subject.

  • Thumbnail

    A Scalable Short-Text Clustering Algorithm Using Apache Spark 

    Akritidis L., Alamaniotis M., Fevgas A., Bozanis P. (2021)
    Short text clustering deals with the problem of grouping together semantically similar documents with small lengths. Nowadays, huge amounts of text data is being generated by numerous applications such as microblogs, ...
  • Thumbnail

    Online clustering of distributed streaming data using belief propagation techniques 

    Halkidi, M.; Koutsopoulos, I. (2011)
    Extraction of patterns out of streaming data that are generated from geographically dispersed devices is a major challenge in data mining. The sequential, distributed fashion in which data become available to the decision ...
  • Thumbnail

    Distributed clustering in vehicular networks 

    Maglaras, L. A.; Katsaros, D. (2012)
    Clustering in vanets is of crucial importance in order to cope with the dynamic features of the vehicular topologies. Algorithms that give good results in Manets fail to create stable clusters since vehicular nodes are ...
htmlmap 

 

Πλοήγηση

Όλο το DSpaceΚοινότητες & ΣυλλογέςΑνά ημερομηνία δημοσίευσηςΣυγγραφείςΤίτλοιΛέξεις κλειδιάΑυτή η συλλογήΑνά ημερομηνία δημοσίευσηςΣυγγραφείςΤίτλοιΛέξεις κλειδιά

Ο λογαριασμός μου

ΣύνδεσηΕγγραφή (MyDSpace)
Πληροφορίες-Επικοινωνία
ΑπόθεσηΣχετικά μεΒοήθειαΕπικοινωνήστε μαζί μας
Επιλογή ΓλώσσαςΌλο το DSpace
EnglishΕλληνικά
htmlmap