Hacker News new | past | comments | ask | show | jobs | submit login

There are ~40k words in English. You don't need a full URL, but only a hash. The words could similarly be hashed, most-frequent words to smallest values.

There are slightly shy 2 billion websites worldwide, 200 million are active. A 32-bit integer could index each site. A further hash for site paths.

http://www.internetlivestats.com/total-number-of-websites/

There were 30 trillion unique URLs as of 2012

In August 2012, Amit Singhal, Senior Vice President at Google and responsible for the development of Google Search, disclosed that Google's search engine found more than 30 trillion unique URLs on the Web, crawls 20 billion sites a day, and processes 100 billion searches every month [2] (which translate to 3.3 billion searches per day and over 38,000 thousand per second).

http://www.internetlivestats.com/google-search-statistics/

There are terabyte MicroSD cards, so this looks viable.




> There are ~40k words in English.

The Second Edition of the OED had 171k+ full entries for words in current use, 47k+ for obsolete words, and ~ 9500 sub-entries.

And there have been a number of supplements since.

40k is low by a significant multiple.


For key-value lookup, to rouggh magnitudes are 10^4 - 10^5 (words) vs 10^8 (active sites).

This is OOM level analysis, not higgh-precision estimation.

Though I appreciate the correction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: