• English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • español 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Login
Ver ítem 
  •   DSpace Principal
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Ver ítem
  •   DSpace Principal
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.
Todo DSpace
  • Comunidades & Colecciones
  • Por fecha de publicación
  • Autores
  • Títulos
  • Materias

Optimal Web Page Download Scheduling Policies for Green Web Crawling

Thumbnail
Autor
Hatzi V., Cambazoglu B.B., Koutsopoulos I.
Fecha
2016
Language
en
DOI
10.1109/JSAC.2016.2520246
Materia
Carbon footprint
Energy utilization
Environmental impact
Green computing
HTTP
Optimization
Web services
Websites
Carbon emissions
Computational resources
Crawling
Greenness
Multiple threads
Optimal policies
Real data sets
Scheduling policies
Web crawler
Institute of Electrical and Electronics Engineers Inc.
Mostrar el registro completo del ítem
Resumen
A web crawler is responsible for discovering and downloading new pages on the Web as well as refreshing previously downloaded pages. During these operations, the crawler issues a large number of HTTP requests to web servers. These requests increase the energy consumption and carbon footprint of the web servers since computational resources are used while serving the requests. In this work, we introduce the problem of green web crawling, where the objective is to devise a page refresh policy that minimizes the total staleness of pages in the repository of a web crawler, subject to a constraint on the amount of carbon emissions due to the processing on web servers. For the case of one web server and one crawling thread, the optimal policy turns out to be a greedy one. At each iteration, the page to be refreshed is selected based on a metric that considers the page's staleness, its size, and the greenness of the energy consumed at the web server premises. We then extend the optimal policy to the cases of 1) many servers; 2) multiple threads; and 3) pages with variable freshness requirements. We conduct simulations on a real data set that involves a large web server collection hosting around two billion pages. We present experimental results for the optimal page refresh policy as well as for various heuristics, in an effort to study the effect of different factors on performance. © 2016 IEEE.
URI
http://hdl.handle.net/11615/73931
Colecciones
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]
htmlmap 

 

Listar

Todo DSpaceComunidades & ColeccionesPor fecha de publicaciónAutoresTítulosMateriasEsta colecciónPor fecha de publicaciónAutoresTítulosMaterias

Mi cuenta

AccederRegistro
Help Contact
DepositionAboutHelpContacto
Choose LanguageTodo DSpace
EnglishΕλληνικά
htmlmap