Hacker Newsnew | comments | show | ask | jobs | submitlogin
hosay123 571 days ago | link | parent

Can you tell us a little more about what the 'ranking metadata' is, as there's not much to go on from the announcement. It's also not clear whether the data is available only for Common Crawl's operational purposes, or whether it's intended to become an integral part of the public data set.

greglindahl 571 days ago | link

The ranking metadata consists of: domain ranks, url ranks, and booleans for whether blekko considers the domain or url to be webspam or porn. This list will expand in the future.

The data is currently available for Common Crawl's operational purposes, and is eventually going to be part of Common Crawl's public dataset. We're currently ironing out a useful format for making it efficiently accessible, compatible with some other metadata which Common Crawl is planning on making available.


Guidelines | FAQ | Lists | Bookmarklet | DMCA | News News | Bugs and Feature Requests | Y Combinator | Apply | Library | Contact