• English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • français 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Ouvrir une session
Voir le document 
  •   Accueil de DSpace
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Voir le document
  •   Accueil de DSpace
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Voir le document
JavaScript is disabled for your browser. Some features of this site may not work without it.
Tout DSpace
  • Communautés & Collections
  • Par date de publication
  • Auteurs
  • Titres
  • Sujets

Structural analysis and classification of search interfaces for the deep web

Thumbnail
Auteur
Kolias V., Anagnostopoulos I., Zeadally S.
Date
2018
Language
en
DOI
10.1093/comjnl/bxx098
Sujet
Search engines
Classification accuracy
Complex interface
Deep web
Descriptive analysis
General-purpose search engines
Information theoretic criterion
Rule induction
Search interfaces
Information theory
Oxford University Press
Afficher la notice complète
Résumé
The Web has been identified to consist of a large portion of content that cannot be crawled by general-purpose search engines because it is only generated after a valid submission to a search interface. Accessing such content, however, requires the location and identification of search interfaces. Towards the automation of this task, many approaches have been proposed that involve the manual definition of rules for the identification of query interfaces. In this paper, we propose a rule induction approach to automatically construct a set of rules by searching the most promising subspace of all possible rules with a brute-force method and information theoretic criteria. To specify the features for the rules, we initially make a descriptive analysis of Yahoo L11, a specialized dataset containing complex interfaces, which to the best of our knowledge has not been used in previous works. We perform a series of evaluations and present the rules constructed by running the algorithm on a random sample of the Yahoo L11 dataset and another dataset used in similar works. The resulting rules yield high classification accuracy in predicting the functionality of new, previously unseen forms and since humans can easily interpret them, they can be easily ported to any application as-is. © 2018 The British Computer Society. All rights reserved.
URI
http://hdl.handle.net/11615/74972
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]
htmlmap 

 

Parcourir

Tout DSpaceCommunautés & CollectionsPar date de publicationAuteursTitresSujetsCette collectionPar date de publicationAuteursTitresSujets

Mon compte

Ouvrir une sessionS'inscrire
Help Contact
DepositionAboutHelpContactez-nous
Choose LanguageTout DSpace
EnglishΕλληνικά
htmlmap