Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

Akritidis L., Bozanis P.

dc.creator	Akritidis L., Bozanis P.	en
dc.date.accessioned	2023-01-31T07:30:37Z
dc.date.available	2023-01-31T07:30:37Z
dc.date.issued	2018
dc.identifier	10.1109/INISTA.2018.8466294
dc.identifier.isbn	9781538651506
dc.identifier.uri	http://hdl.handle.net/11615/70352
dc.description.abstract	The problem of matching product titles is of particular interest for both users and marketers. The former, frequently search the Web with the aim of comparing prices and characteristics, or obtaining and aggregating information provided by other users. The latter, often require wide knowledge of competitive policies, prices and features to organize a promotional campaign about a group of products. To address this interesting problem, recent studies have attempted to enrich the product titles by exploiting Web search engines. More specifically, these methods suggest that for each product title a query should be submitted. After the results have been collected, the most important words which appear in the results are identified and appended in the titles. In the sequel, each word is assigned an importance score and finally, a similarity measure is applied to identify if two or more titles refer to the same product. Nonetheless, these methods have multiple problems including scalability, slow retrieval of the required additional search results, and lack of flexibility. In this paper, we present a different approach which addresses all these issues and is based on the morphological analysis of the titles of the products. In particular, our method operates in two phases. In the first phase, we compute the combinations of the words of the titles and we record several statistics such as word proximity and frequency values. In the second phase, we use this information to assign a score to each combination. The highest scoring combination is then declared as label of the cluster which contains each product. The experimental evaluation of the algorithm, in a real world dataset, demonstrated that compared to three popular string similarity metrics, our approach achieves up to 36% better matching performance and at least 13 times faster execution. © 2018 IEEE.	en
dc.language.iso	en	en
dc.source	2018 IEEE (SMC) International Conference on Innovations in Intelligent Systems and Applications, INISTA 2018	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85055473467&doi=10.1109%2fINISTA.2018.8466294&partnerID=40&md5=717826b6aaecb27ecdcccc14e8eea38d
dc.subject	Algorithms	en
dc.subject	Costs	en
dc.subject	Intelligent systems	en
dc.subject	Search engines	en
dc.subject	Unsupervised learning	en
dc.subject	Entity matching	en
dc.subject	Experimental evaluation	en
dc.subject	Matching performance	en
dc.subject	Morphological analysis	en
dc.subject	products matching	en
dc.subject	Promotional campaign	en
dc.subject	Similarity measure	en
dc.subject	String similarity	en
dc.subject	Data mining	en
dc.subject	Institute of Electrical and Electronics Engineers Inc.	en
dc.title	Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations	en
dc.type	conferenceItem	en

Files in questo item

Files	Dimensione	Formato	Mostra
Nessun files in questo item.

Questo item appare nelle seguenti collezioni

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Mostra i principali dati dell'item