• English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Deutsch 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Einloggen
Dokumentanzeige 
  •   DSpace Startseite
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Dokumentanzeige
  •   DSpace Startseite
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Dokumentanzeige
JavaScript is disabled for your browser. Some features of this site may not work without it.
Gesamter Bestand
  • Bereiche & Sammlungen
  • Erscheinungsdatum
  • Autoren
  • Titeln
  • Schlagworten

Optimal Web Page Download Scheduling Policies for Green Web Crawling

Thumbnail
Autor
Hatzi V., Cambazoglu B.B., Koutsopoulos I.
Datum
2016
Language
en
DOI
10.1109/JSAC.2016.2520246
Schlagwort
Carbon footprint
Energy utilization
Environmental impact
Green computing
HTTP
Optimization
Web services
Websites
Carbon emissions
Computational resources
Crawling
Greenness
Multiple threads
Optimal policies
Real data sets
Scheduling policies
Web crawler
Institute of Electrical and Electronics Engineers Inc.
Zur Langanzeige
Zusammenfassung
A web crawler is responsible for discovering and downloading new pages on the Web as well as refreshing previously downloaded pages. During these operations, the crawler issues a large number of HTTP requests to web servers. These requests increase the energy consumption and carbon footprint of the web servers since computational resources are used while serving the requests. In this work, we introduce the problem of green web crawling, where the objective is to devise a page refresh policy that minimizes the total staleness of pages in the repository of a web crawler, subject to a constraint on the amount of carbon emissions due to the processing on web servers. For the case of one web server and one crawling thread, the optimal policy turns out to be a greedy one. At each iteration, the page to be refreshed is selected based on a metric that considers the page's staleness, its size, and the greenness of the energy consumed at the web server premises. We then extend the optimal policy to the cases of 1) many servers; 2) multiple threads; and 3) pages with variable freshness requirements. We conduct simulations on a real data set that involves a large web server collection hosting around two billion pages. We present experimental results for the optimal page refresh policy as well as for various heuristics, in an effort to study the effect of different factors on performance. © 2016 IEEE.
URI
http://hdl.handle.net/11615/73931
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]
htmlmap 

 

Stöbern

Gesamter BestandBereiche & SammlungenErscheinungsdatumAutorenTitelnSchlagwortenDiese SammlungErscheinungsdatumAutorenTitelnSchlagworten

Mein Benutzerkonto

EinloggenRegistrieren
Help Contact
DepositionAboutHelpKontakt
Choose LanguageGesamter Bestand
EnglishΕλληνικά
htmlmap