Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Storing millions and billions of URLs?
12 points by gerenuk on May 4, 2018 | hide | past | favorite | 10 comments
Hello Everyone!

Currently, using ElasticSearch for storing the meta data and other raw data information but it is a very small scale around 500,000 domains.

I have been tasked to scale it to 20-40 million domains and storing their internal/external links while building a page rank/domain authority score for each domain which we are adding to our database.

What do you guys suggest/recommend for storing this data at a very large scale as web page internal links/external links will be stored which will lead it over 100M-1B links database?

Any kind of feedback/suggestion would be appreciated.

Thanks.




I don't think that any proper database technology will have issues with that amount of data. It all depends on how you use it.



I personally have used CouchDb to store tens of millions of documents. If you can find a way get the data you want using CouchDb views, the number of documents simply doesn’t matter with CouchDb (may be just the disc usage grows with additional documents/views). And that too with excellent performance.


Elasticsearch should be easily able to handle your scaling needs. Why do you think that it would not? What are your concerns?


The answer will depend primarily on how you expect to query it.

Cassandra can do many orders of magnitude more than 1B, but would limit you in your query patterns.


Have you considered sharding the data to multiple independent ES instances? Each of them could handle amount of data that does not cause problems?


We've found Elasticsearch to be quite performant with hundreds of millions of documents. What are your concerns with scaling it?


Building an ahrefs/moz/majestic competitor?


BuzzSumo competitor with a different set of features.


I'm actually interested in hearing more about this if you're willing to share it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: