Hacker Newsnew | comments | show | ask | jobs | submit login

I'd be really curious to know more about Twitter's database setup. There's clearly something that they've done for scale that makes this a difficult thing. Otherwise, implementing a button that SELECT * FROM tweets WHERE user_id = 1; wouldn't take long to implement.



Lots of guessing, hypothesizing, and imagineering for information that is readily available. Here's a start:

http://engineering.twitter.com/2012/04/mysql-at-twitter.html

http://highscalability.com/blog/2011/12/19/how-twitter-store...

http://www.percona.com/live/mysql-conference-2012/sessions/g...

http://www.slideshare.net/yousukehara/introduction-of-twitte...

http://engineering.twitter.com/2010/05/introducing-flockdb.h...

http://www.infoq.com/news/2009/06/Twitter-Architecture

http://engineering.twitter.com/2010/05/introducing-flockdb.h...

https://github.com/twitter

-----


Twitter moved on from having a single table of tweets a long long time ago. I imagine archived tweets (which aren't necessarily that old) are spread out in multiple archive databases and perhaps not even in traditional db formats. And I imagine that, as a rule, no user facing code is allowed to touch any of the databases, but rather goes through services with heavy caching. So that a seemingly simple feature is an engineering challenge really isn't that surprising. Twitter is optimized for very recent content.

-----


I can only guess the format for archived tweets is very different from newer one, maybe just a compressed text file, which the engineers have to automate extracting and parsing at scale. Doesn't seem that hard, but the format was probably a hack job back in the day and there are probably a lot of considerations, e.g. metadata like RTs and favorites.

Plus, CEO's comments sound like it's been set up as a side project. The kind of management style books like Mythical Man Month and Peopleware warned us about.

-----


Indeed, this would also explain why the search function only allows one to view tweets going back seven days.

-----


I interviewed with a Twitter search engineer and he told me that that's because their in-memory search index is not big enough hold all tweets ever posted. On top of that, their efforts to scale up the index are counteracted by the ever-increasing volume of tweets posted per day.

-----


And the reason it's not big enough is because they don't want to invest in it. They found out search is not as monetizable as they thought it would be, and at the same time they cannot take away the functionality. The compromise is to keep what they have.

-----




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: