Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Flickr Engineers Do It Offline (flickr.com)
48 points by joshwa on Oct 2, 2008 | hide | past | favorite | 14 comments


The point of having a backend queue is that on very large scale systems an easy way to boost the "speed" of your site is never do synchronous writes (writes being loosely defined as anything that needs to be changed or executed, examples: inserts, updates, email notifications). If a users updates a title or description, return the result page right away and queue the actual operation to update the value. Locking a row or table for update is relatively slow in large databases (ones measured in TBs for instance). The extra tens or hundreds of milliseconds you gain from using a queue like this helps your users perceive your website as "fast" even though technically it takes longer than you've led them to believe.


I wonder how they use php as a queuing system? It must be a CLI application, what do you think?


You can use php, pretty much like any other scripting language(without involving web), so yeah it could be a command line tool running in the background, started by cron etc


We use a lot of PHP cli cron jobs at my work. It works fairly well, but isn't as stable as I'd like. Any bug which could cause a fatal interpreter error can't be caught by your code so it's difficult to diagnose problems when they crop up in production.


Agree, I have the same problem with my backend apps. Maybe php5's error handling should do the trick - but haven't migrated to php5 yet anyhow.


We're using PHP5 and the exception handling is better, but interpreter errors just kill the process, full stop. We're writing watchdog processes just to monitor the nightly batches just so we have better information if one of them goes haywire.


You can just use MySQL as the backing store for the queue.

I think pretty much all systems of a sufficient size end up reinventing this...


We are using BDB, dead simple and insanely fast.

Relational database seems a bit overkill.


It may be overkill, but there is typically a great deal of expertise around setting up and running them.

Also, MySQL etc have a great deal of network connectivity and concurrency support that is not provided by BDB. (In the mentioned example, they say they used PHP. Can you imagine doing concurrency in PHP?)

So it's more a matter of expediency than aesthetics. At scale, everything is painful and you'd really rather not write anything you don't absolutely have to.


Concurrency is one of BDB's strongest points. (As for networking connectivity, BDB has an RPC server which works pretty well, although I'd personally probably roll something higher-level and stick that in front of the actual database.)


Yep. We looked at the BDB's RPC system but are probably going to end up going with Thrift for RPC. Other than the lack of documentation, Thrift seems pretty killer.


It seems that key-value database is still hot. Agree that "relational database seems a bit overkill". In fact, I am considering refactor a whole site to with key-value backend.


Sounds like what Erlyweb can do natively in Erlang.


Having small systems that do one or a few related things well is much better for scaling than a "does everything" system.

It's also very generally an architecture that is easier to write/debug/extend/grok.

It is nice though that Erlang gets you into programming for scale from day 1 without making it more work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: