Εμφάνιση απλής εγγραφής

dc.creatorHatzi, V.en
dc.creatorCambazoglu, B. B.en
dc.creatorKoutsopoulos, I.en
dc.date.accessioned2015-11-23T10:29:59Z
dc.date.available2015-11-23T10:29:59Z
dc.date.issued2014
dc.identifier10.1109/SOFTCOM.2014.7039136
dc.identifier.isbn9789532900521
dc.identifier.urihttp://hdl.handle.net/11615/28451
dc.description.abstractA web crawler is responsible for discovering new web pages on the Web as well as for refreshing the content of already downloaded pages. During these operations, it can issue a huge number of page download requests to the servers in the Web. These requests, in turn, increase the energy consumption of the servers as hardware resources are used when serving the requested pages. This has the side-effect of increasing the carbon footprint of servers. In this work, we introduce the problem of green web crawling from a set of remote web servers, where the goal is to reduce the carbon footprint incurred by a large-scale web crawler. We consider a scenario where both freshness of downloaded pages and carbon emissions at remote servers need to be taken into account. We present various heuristics for prioritizing the page download requests as a means to study the relative importance of different parameters. We conduct experiments on a real data set that involves a large server collection involving two billion pages. The results indicate that the carbon footprint generated by a crawler during its external operations can be considerably reduced without compromising the freshness of pages. Our work draws guidelines for the design of large-scale commercial search engine companies, which need to comply with certain greenness regulations. © 2014 FESB, University of Split.en
dc.source.urihttp://www.scopus.com/inward/record.url?eid=2-s2.0-84933558291&partnerID=40&md5=790d358cddcd7fbac3386c016d1c9b24
dc.subjectCarbon footprinten
dc.subjectEnergy utilizationen
dc.subjectEnvironmental impacten
dc.subjectSearch enginesen
dc.subjectWebsitesen
dc.subjectCarbon emissionsen
dc.subjectEngine companiesen
dc.subjectHardware resourcesen
dc.subjectReal data setsen
dc.subjectRemote serversen
dc.subjectScheduling policiesen
dc.subjectWeb crawlersen
dc.subjectWeb Crawlingen
dc.subjectSocial networking (online)en
dc.titleWeb page download scheduling policies for green web crawlingen
dc.typeconferenceItemen


Αρχεία σε αυτό το τεκμήριο

ΑρχείαΜέγεθοςΤύποςΠροβολή

Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής