Logo
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Ελληνικά 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Σύνδεση
Προβολή τεκμηρίου 
  •   Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Προβολή τεκμηρίου
  •   Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Προβολή τεκμηρίου
JavaScript is disabled for your browser. Some features of this site may not work without it.
Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
Όλο το DSpace
  • Κοινότητες & Συλλογές
  • Ανά ημερομηνία δημοσίευσης
  • Συγγραφείς
  • Τίτλοι
  • Λέξεις κλειδιά

Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

Thumbnail
Συγγραφέας
Akritidis L., Bozanis P.
Ημερομηνία
2018
Γλώσσα
en
DOI
10.1109/INISTA.2018.8466294
Λέξη-κλειδί
Algorithms
Costs
Intelligent systems
Search engines
Unsupervised learning
Entity matching
Experimental evaluation
Matching performance
Morphological analysis
products matching
Promotional campaign
Similarity measure
String similarity
Data mining
Institute of Electrical and Electronics Engineers Inc.
Εμφάνιση Μεταδεδομένων
Επιτομή
The problem of matching product titles is of particular interest for both users and marketers. The former, frequently search the Web with the aim of comparing prices and characteristics, or obtaining and aggregating information provided by other users. The latter, often require wide knowledge of competitive policies, prices and features to organize a promotional campaign about a group of products. To address this interesting problem, recent studies have attempted to enrich the product titles by exploiting Web search engines. More specifically, these methods suggest that for each product title a query should be submitted. After the results have been collected, the most important words which appear in the results are identified and appended in the titles. In the sequel, each word is assigned an importance score and finally, a similarity measure is applied to identify if two or more titles refer to the same product. Nonetheless, these methods have multiple problems including scalability, slow retrieval of the required additional search results, and lack of flexibility. In this paper, we present a different approach which addresses all these issues and is based on the morphological analysis of the titles of the products. In particular, our method operates in two phases. In the first phase, we compute the combinations of the words of the titles and we record several statistics such as word proximity and frequency values. In the second phase, we use this information to assign a score to each combination. The highest scoring combination is then declared as label of the cluster which contains each product. The experimental evaluation of the algorithm, in a real world dataset, demonstrated that compared to three popular string similarity metrics, our approach achieves up to 36% better matching performance and at least 13 times faster execution. © 2018 IEEE.
URI
http://hdl.handle.net/11615/70352
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]
htmlmap 

 

Πλοήγηση

Όλο το DSpaceΚοινότητες & ΣυλλογέςΑνά ημερομηνία δημοσίευσηςΣυγγραφείςΤίτλοιΛέξεις κλειδιάΑυτή η συλλογήΑνά ημερομηνία δημοσίευσηςΣυγγραφείςΤίτλοιΛέξεις κλειδιά

Ο λογαριασμός μου

ΣύνδεσηΕγγραφή (MyDSpace)
Πληροφορίες-Επικοινωνία
ΑπόθεσηΣχετικά μεΒοήθειαΕπικοινωνήστε μαζί μας
Επιλογή ΓλώσσαςΌλο το DSpace
EnglishΕλληνικά
htmlmap