• Exploratory analysis of a terabyte scale web corpus 

      Kolias, V.; Anagnostopoulos, I.; Kayafas, E. (2014)
      In this paper we present a preliminary analysis over the largest publicly accessible web dataset: The Common Crawl Corpus. We measure nine web characteristics from two levels of granularity using MapReduce and we comment ...