Flickr Engineers Do It Offline

timtrueman · on Oct 3, 2008

The point of having a backend queue is that on very large scale systems an easy way to boost the "speed" of your site is never do synchronous writes (writes being loosely defined as anything that needs to be changed or executed, examples: inserts, updates, email notifications). If a users updates a title or description, return the result page right away and queue the actual operation to update the value. Locking a row or table for update is relatively slow in large databases (ones measured in TBs for instance). The extra tens or hundreds of milliseconds you gain from using a queue like this helps your users perceive your website as "fast" even though technically it takes longer than you've led them to believe.

schtono · on Oct 2, 2008

I wonder how they use php as a queuing system? It must be a CLI application, what do you think?

bigbang · on Oct 3, 2008

You can use php, pretty much like any other scripting language(without involving web), so yeah it could be a command line tool running in the background, started by cron etc

LogicHoleFlaw · on Oct 3, 2008

We use a lot of PHP cli cron jobs at my work. It works fairly well, but isn't as stable as I'd like. Any bug which could cause a fatal interpreter error can't be caught by your code so it's difficult to diagnose problems when they crop up in production.

schtono · on Oct 4, 2008

Agree, I have the same problem with my backend apps. Maybe php5's error handling should do the trick - but haven't migrated to php5 yet anyhow.

LogicHoleFlaw · on Oct 4, 2008

We're using PHP5 and the exception handling is better, but interpreter errors just kill the process, full stop. We're writing watchdog processes just to monitor the nightly batches just so we have better information if one of them goes haywire.

joshu · on Oct 3, 2008

You can just use MySQL as the backing store for the queue.

I think pretty much all systems of a sufficient size end up reinventing this...

aschobel · on Oct 3, 2008

We are using BDB, dead simple and insanely fast.

Relational database seems a bit overkill.

joshu · on Oct 3, 2008

It may be overkill, but there is typically a great deal of expertise around setting up and running them.

Also, MySQL etc have a great deal of network connectivity and concurrency support that is not provided by BDB. (In the mentioned example, they say they used PHP. Can you imagine doing concurrency in PHP?)

So it's more a matter of expediency than aesthetics. At scale, everything is painful and you'd really rather not write anything you don't absolutely have to.

jrockway · on Oct 3, 2008

Concurrency is one of BDB's strongest points. (As for networking connectivity, BDB has an RPC server which works pretty well, although I'd personally probably roll something higher-level and stick that in front of the actual database.)

aschobel · on Oct 6, 2008

Yep. We looked at the BDB's RPC system but are probably going to end up going with Thrift for RPC. Other than the lack of documentation, Thrift seems pretty killer.

liuliu · on Oct 3, 2008

It seems that key-value database is still hot. Agree that "relational database seems a bit overkill". In fact, I am considering refactor a whole site to with key-value backend.

iamwil · on Oct 2, 2008

Sounds like what Erlyweb can do natively in Erlang.

njharman · on Oct 2, 2008

Having small systems that do one or a few related things well is much better for scaling than a "does everything" system.

It's also very generally an architecture that is easier to write/debug/extend/grok.

It is nice though that Erlang gets you into programming for scale from day 1 without making it more work.