Improving Hierarchical Short Text Clustering through Dominant Feature Learning

Akritidis L., Alamaniotis M., Fevgas A., Tsompanopoulou P., Bozanis P.

dc.creator	Akritidis L., Alamaniotis M., Fevgas A., Tsompanopoulou P., Bozanis P.	en
dc.date.accessioned	2023-01-31T07:30:36Z
dc.date.available	2023-01-31T07:30:36Z
dc.date.issued	2022
dc.identifier	10.1142/S0218213022500348
dc.identifier.issn	02182130
dc.identifier.uri	http://hdl.handle.net/11615/70351
dc.description.abstract	This paper focuses on the popular problem of short text clustering. Since the short text documents typically exhibit high degrees of data sparseness and dimensionality, the problem in question is generally considered more challenging than the traditional clustering scenarios. Our proposed solution, named VEPH, is based on a novel algorithm that was published recently with the aim of optimally clustering short text documents. VEPH includes two stages: During the first stage, the original text vectors are projected on a lower dimensional space and the documents with projection vectors lying on the same dimensional space are grouped in the same cluster. The second stage is a refinement process which attempts to improve the quality of the clusters that were generated during the previous stage. The quality of a cluster is determined by its homogeneity and completeness and these are the two primary design criteria of this stage. Initially VEPH cleanses the clusters by removing all dissimilar elements, and then, it iteratively merges the similar clusters in a hierarchical agglomerative manner. The proposed algorithm has been experimentally evaluated in terms of F1 and NMI, by employing three datasets with diverse attributes. The results demonstrated its superiority over other state-of-the-art works of the relevant literature. © 2022 World Scientific Publishing Company.	en
dc.language.iso	en	en
dc.source	International Journal on Artificial Intelligence Tools	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85136201674&doi=10.1142%2fS0218213022500348&partnerID=40&md5=925e43ceb1900432afdde91235ebb3ba
dc.subject	Cluster analysis	en
dc.subject	Iterative methods	en
dc.subject	Clusterings	en
dc.subject	Data dimensionality	en
dc.subject	Data sparseness	en
dc.subject	Feature learning	en
dc.subject	Machine-learning	en
dc.subject	Short text clustering	en
dc.subject	Short texts	en
dc.subject	Short-text documents	en
dc.subject	Text Clustering	en
dc.subject	Traditional clustering	en
dc.subject	Vector spaces	en
dc.subject	World Scientific	en
dc.title	Improving Hierarchical Short Text Clustering through Dominant Feature Learning	en
dc.type	journalArticle	en

Dateien zu dieser Ressource

Dateien	Größe	Format	Anzeige
Zu diesem Dokument gibt es keine Dateien.

Das Dokument erscheint in:

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Zur Kurzanzeige

Improving Hierarchical Short Text Clustering through Dominant Feature Learning

Dateien zu dieser Ressource

Das Dokument erscheint in:

Verwandte Dokumente

A Scalable Short-Text Clustering Algorithm Using Apache Spark ﻿

Online clustering of distributed streaming data using belief propagation techniques ﻿

Distributed clustering in vehicular networks ﻿

A Scalable Short-Text Clustering Algorithm Using Apache Spark

Online clustering of distributed streaming data using belief propagation techniques

Distributed clustering in vehicular networks