Εμφάνιση απλής εγγραφής

dc.creatorHatzi V., Cambazoglu B.B., Koutsopoulos I.en
dc.date.accessioned2023-01-31T08:28:00Z
dc.date.available2023-01-31T08:28:00Z
dc.date.issued2016
dc.identifier10.1109/JSAC.2016.2520246
dc.identifier.issn07338716
dc.identifier.urihttp://hdl.handle.net/11615/73931
dc.description.abstractA web crawler is responsible for discovering and downloading new pages on the Web as well as refreshing previously downloaded pages. During these operations, the crawler issues a large number of HTTP requests to web servers. These requests increase the energy consumption and carbon footprint of the web servers since computational resources are used while serving the requests. In this work, we introduce the problem of green web crawling, where the objective is to devise a page refresh policy that minimizes the total staleness of pages in the repository of a web crawler, subject to a constraint on the amount of carbon emissions due to the processing on web servers. For the case of one web server and one crawling thread, the optimal policy turns out to be a greedy one. At each iteration, the page to be refreshed is selected based on a metric that considers the page's staleness, its size, and the greenness of the energy consumed at the web server premises. We then extend the optimal policy to the cases of 1) many servers; 2) multiple threads; and 3) pages with variable freshness requirements. We conduct simulations on a real data set that involves a large web server collection hosting around two billion pages. We present experimental results for the optimal page refresh policy as well as for various heuristics, in an effort to study the effect of different factors on performance. © 2016 IEEE.en
dc.language.isoenen
dc.sourceIEEE Journal on Selected Areas in Communicationsen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-84976353333&doi=10.1109%2fJSAC.2016.2520246&partnerID=40&md5=4a46ac4c43927020f1245dfb650ca8de
dc.subjectCarbon footprinten
dc.subjectEnergy utilizationen
dc.subjectEnvironmental impacten
dc.subjectGreen computingen
dc.subjectHTTPen
dc.subjectOptimizationen
dc.subjectWeb servicesen
dc.subjectWebsitesen
dc.subjectCarbon emissionsen
dc.subjectComputational resourcesen
dc.subjectCrawlingen
dc.subjectGreennessen
dc.subjectMultiple threadsen
dc.subjectOptimal policiesen
dc.subjectReal data setsen
dc.subjectScheduling policiesen
dc.subjectWeb crawleren
dc.subjectInstitute of Electrical and Electronics Engineers Inc.en
dc.titleOptimal Web Page Download Scheduling Policies for Green Web Crawlingen
dc.typeconferenceItemen


Αρχεία σε αυτό το τεκμήριο

ΑρχείαΜέγεθοςΤύποςΠροβολή

Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής