Confronting Sparseness and High Dimensionality in Short Text Clustering via Feature Vector Projections

Akritidis L., Alamaniotis M., Fevgas A., Bozanis P.

dc.creator	Akritidis L., Alamaniotis M., Fevgas A., Bozanis P.	en
dc.date.accessioned	2023-01-31T07:30:36Z
dc.date.available	2023-01-31T07:30:36Z
dc.date.issued	2020
dc.identifier	10.1109/ICTAI50040.2020.00129
dc.identifier.isbn	9781728192284
dc.identifier.issn	10823409
dc.identifier.uri	http://hdl.handle.net/11615/70350
dc.description.abstract	Short text clustering is a popular problem that focuses on the unsupervised grouping of similar short text documents, or entitled entities. Since the short texts are currently being utilized in a vast number of applications, the problem in question has been rendered increasingly significant in the past few years. The high cluster homogeneity and completeness are two among the most important goals of all data clustering algorithms. However, in the context of short texts, their fulfilment is particularly difficult, because this type of data is typically represented by sparse vectors that collectively comprise a very high dimensional space. In this article we introduce VEPHC, a two-stage clustering algorithm designed to confront the sparseness and high dimensionality traits of short texts. During the first stage (or else, the VEP part), the initial feature vectors are projected onto a lower dimensional space by constructing and scoring variable-sized combinations of features (that is, terms). In the second stage (or else, the HC part), VEPHC improves the homogeneity and completeness of the generated clusters through split and merge operations that are based on the similarities of all inter-cluster elements. The experimental evaluation of VEPHC on two real-world datasets demonstrates its superior performance over numerous state-of-The-Art clustering algorithms in terms of F1 scores and Normalized Mutual Information. © 2020 IEEE.	en
dc.language.iso	en	en
dc.source	Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85098793675&doi=10.1109%2fICTAI50040.2020.00129&partnerID=40&md5=9ed8f68236b20ec369b31ecce52127a4
dc.subject	Artificial intelligence	en
dc.subject	Cluster analysis	en
dc.subject	Vector spaces	en
dc.subject	Data clustering algorithm	en
dc.subject	Experimental evaluation	en
dc.subject	High dimensional spaces	en
dc.subject	Normalized mutual information	en
dc.subject	Real-world datasets	en
dc.subject	Short-text documents	en
dc.subject	Split-and-merge operations	en
dc.subject	Two-stage clustering	en
dc.subject	Clustering algorithms	en
dc.subject	IEEE Computer Society	en
dc.title	Confronting Sparseness and High Dimensionality in Short Text Clustering via Feature Vector Projections	en
dc.type	conferenceItem	en

Αρχεία σε αυτό το τεκμήριο

Αρχεία	Μέγεθος	Τύπος	Προβολή
Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

Εμφάνιση απλής εγγραφής

Confronting Sparseness and High Dimensionality in Short Text Clustering via Feature Vector Projections

Αρχεία σε αυτό το τεκμήριο

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Related items

A Scalable Short-Text Clustering Algorithm Using Apache Spark ﻿

Online clustering of distributed streaming data using belief propagation techniques ﻿

Distributed clustering in vehicular networks ﻿

A Scalable Short-Text Clustering Algorithm Using Apache Spark

Online clustering of distributed streaming data using belief propagation techniques

Distributed clustering in vehicular networks