Zur Kurzanzeige

dc.creatorKolias V., Anagnostopoulos I., Zeadally S.en
dc.date.accessioned2023-01-31T08:43:36Z
dc.date.available2023-01-31T08:43:36Z
dc.date.issued2018
dc.identifier10.1093/comjnl/bxx098
dc.identifier.issn00104620
dc.identifier.urihttp://hdl.handle.net/11615/74972
dc.description.abstractThe Web has been identified to consist of a large portion of content that cannot be crawled by general-purpose search engines because it is only generated after a valid submission to a search interface. Accessing such content, however, requires the location and identification of search interfaces. Towards the automation of this task, many approaches have been proposed that involve the manual definition of rules for the identification of query interfaces. In this paper, we propose a rule induction approach to automatically construct a set of rules by searching the most promising subspace of all possible rules with a brute-force method and information theoretic criteria. To specify the features for the rules, we initially make a descriptive analysis of Yahoo L11, a specialized dataset containing complex interfaces, which to the best of our knowledge has not been used in previous works. We perform a series of evaluations and present the rules constructed by running the algorithm on a random sample of the Yahoo L11 dataset and another dataset used in similar works. The resulting rules yield high classification accuracy in predicting the functionality of new, previously unseen forms and since humans can easily interpret them, they can be easily ported to any application as-is. © 2018 The British Computer Society. All rights reserved.en
dc.language.isoenen
dc.sourceComputer Journalen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85046767175&doi=10.1093%2fcomjnl%2fbxx098&partnerID=40&md5=459422495522b13bafa25c0fe9cc1d88
dc.subjectSearch enginesen
dc.subjectClassification accuracyen
dc.subjectComplex interfaceen
dc.subjectDeep weben
dc.subjectDescriptive analysisen
dc.subjectGeneral-purpose search enginesen
dc.subjectInformation theoretic criterionen
dc.subjectRule inductionen
dc.subjectSearch interfacesen
dc.subjectInformation theoryen
dc.subjectOxford University Pressen
dc.titleStructural analysis and classification of search interfaces for the deep weben
dc.typejournalArticleen


Dateien zu dieser Ressource

DateienGrößeFormatAnzeige

Zu diesem Dokument gibt es keine Dateien.

Das Dokument erscheint in:

Zur Kurzanzeige