Hacker News new | comments | show | ask | jobs | submit login

Can you tell us a little more about what the 'ranking metadata' is, as there's not much to go on from the announcement. It's also not clear whether the data is available only for Common Crawl's operational purposes, or whether it's intended to become an integral part of the public data set.

The ranking metadata consists of: domain ranks, url ranks, and booleans for whether blekko considers the domain or url to be webspam or porn. This list will expand in the future.

The data is currently available for Common Crawl's operational purposes, and is eventually going to be part of Common Crawl's public dataset. We're currently ironing out a useful format for making it efficiently accessible, compatible with some other metadata which Common Crawl is planning on making available.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact