This actually rings true with my own experience. Let me share an anecdote:
Last year, we built a search service on EC2, which used SQS[1] and SNS[2] as a way of feeding data from our core architecture into the index.
So, user creates/updates/deletes a message, that information gets sent to a particular SNS topic over HTTPS, a subscribed SQS queue receives message, then search index polls the queue.
SQS and SNS are cheap, scalable and highly available, so we thought it was a great system. We knew that messages were going to disappear occasionally, though (network partition, etc), so we had also decided to spend time building a simple API to resolve these issues.
We would occasionally poll the API directly to effectively perform a diff, by storing a hash "fingerprint" of the data with each entry. Fingerprints would be compared, then a full request would be made for any entry which differed.
At some point, we realized that we actually had obsoleted almost the entire SNS/SQS message queuing mechanism by building a system which made API calls inexpensive; so we removed it from the equation.
If you have a situation where a message has to get there (eventually), in our situation at least, it seems an unnecessary extra step.
Would like to hear if anyone else has had a similar (or completely opposite) experience.
I must say I don't completely understand his point. What comes in the place of messaging middleware such as RabbitMQ?
Even web "2.0" applications still need to send each other messages, both synchronously and asynchronously. For example, to trigger backend work that should not be handled on the web server, and receive the results. Or to notify when changes in data happened.
Incredibly generalized notion based on questionable assumptions. The OP seems to have identified that distributed state is hard. Not news and not in anyway unique to MOMs.
I guess it depends on the specific architecture, if you use something like Mongrel2 messaging is used for everything, even for routing HTTP requests and responses to the processes/servers that handle it.
Also, hasn't it always been the case that a lot of web applications communicate with the database directly to fetch information? What's so "web 2.0" about that?
The major thing that changed with AJAX/Comet and Websockets and such is that the javascript running on the end user's browser can now be regarded as an endpoint for message-passing. This increases the amount of messaging, doesn't it? In so far that relying on just polling a database is no longer good practice in our "realtime notificiation" world...
"And if we’re going to use a persistent store, why add a layer of complexity (and cost) to the architecture when Web 2.0 has shown us it is not only viable but inherently more scalable to go directly to that source?"
Well, there's a reason everyone and their mom is using memcached: Going directly to the db does not always scale that well ..
I (novice) got confused a little, because first I thought you really were into the middleware thing and are sad about web2.0 killing your star. But the article seems to say that middle ware these days is probably not a good solution and also shouldn't be. So, which one is true now?
Last year, we built a search service on EC2, which used SQS[1] and SNS[2] as a way of feeding data from our core architecture into the index.
So, user creates/updates/deletes a message, that information gets sent to a particular SNS topic over HTTPS, a subscribed SQS queue receives message, then search index polls the queue.
SQS and SNS are cheap, scalable and highly available, so we thought it was a great system. We knew that messages were going to disappear occasionally, though (network partition, etc), so we had also decided to spend time building a simple API to resolve these issues.
We would occasionally poll the API directly to effectively perform a diff, by storing a hash "fingerprint" of the data with each entry. Fingerprints would be compared, then a full request would be made for any entry which differed.
At some point, we realized that we actually had obsoleted almost the entire SNS/SQS message queuing mechanism by building a system which made API calls inexpensive; so we removed it from the equation.
If you have a situation where a message has to get there (eventually), in our situation at least, it seems an unnecessary extra step.
Would like to hear if anyone else has had a similar (or completely opposite) experience.
[1] http://aws.amazon.com/sqs/
[2] http://aws.amazon.com/sns/