Εμφάνιση απλής εγγραφής

dc.creatorElmagarmid, A. K.en
dc.creatorIpeirotis, P. G.en
dc.creatorVerykios, V. S.en
dc.date.accessioned2015-11-23T10:26:17Z
dc.date.available2015-11-23T10:26:17Z
dc.date.issued2007
dc.identifier10.1109/tkde.2007.250581
dc.identifier.issn1041-4347
dc.identifier.urihttp://hdl.handle.net/11615/27362
dc.description.abstractOften, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors. In this paper, we present a thorough analysis of the literature on duplicate record detection. We cover similarity metrics that are commonly used to detect similar field entries, and we present an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database. We also cover multiple techniques for improving the efficiency and scalability of approximate duplicate detection algorithms. We conclude with coverage of existing tools and with a brief discussion of the big open problems in the area.en
dc.source.uri<Go to ISI>://WOS:000242041400001
dc.subjectduplicate detectionen
dc.subjectdata cleaningen
dc.subjectdata integrationen
dc.subjectrecord linkageen
dc.subjectdata deduplicationen
dc.subjectinstance identificationen
dc.subjectdatabase hardeningen
dc.subjectnameen
dc.subjectmatchingen
dc.subjectidentity uncertaintyen
dc.subjectentity resolutionen
dc.subjectfuzzy duplicateen
dc.subjectdetectionen
dc.subjectentity matchingen
dc.subjectDECISION-MODELen
dc.subjectLINKAGEen
dc.subjectINFORMATIONen
dc.subjectIDENTIFICATIONen
dc.subjectINTEGRATIONen
dc.subjectDOCUMENTSen
dc.subjectSEQUENCEen
dc.subjectDISTANCEen
dc.subjectLINKINGen
dc.subjectERRORSen
dc.subjectComputer Science, Artificial Intelligenceen
dc.subjectComputer Science, Informationen
dc.subjectSystemsen
dc.subjectEngineering, Electrical & Electronicen
dc.titleDuplicate record detection: A surveyen
dc.typejournalArticleen


Αρχεία σε αυτό το τεκμήριο

ΑρχείαΜέγεθοςΤύποςΠροβολή

Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής