As many people said, there's a wide variability in experience at Amazon depending on the team. And I would say even more, depending on where you sit in the graph. The bottlenecks at the center have more clients, higher TPS, more stringent latency requirements. And their support burden is worse and the engineer's life is worse. It's hard to move everyone forward together. Once you add enough constraints the problem gets too hard to solve. But like working at Microsoft, you pay these prices in order to have high impact, a high number of customers, and high influence. A big question for large service federations like Amazon is how to smooth out these bottlenecks. Like Stevey's rant about code size though, first you have to admit you have the problem, service size.
I joined with a team that was not service oriented. It was like a collection of cron jobs that ran single threaded applications directly updating the DB. It was painful and very hard to alter these stateful applications without breaking things.
I moved to a team that ran a collection of services and it was so much better, like night and day better. The path forward for us became obvious when we started thinking about how to migrate between APIs and decompose our services still further (and by the way, our support burden is comparatively low).
What makes service oriented architecture at Amazon great is that it is cheap. The other two Amazon advantages Steve mentioned are not coincidences, they are what you need to make service rollouts low-friction. They are what makes it possible to shoot first and rollback later. With rare exceptions they are used by the entire company.
Remember Sinofsky's "don't ship the org chart"? It is a lie. You cannot avoid it. You always ship the org chart. So the real question is, what is the org going to look like so that we ship something good-looking? That's the real purpose of Steve's rant. No matter how much politicking or boosting you do for this important service-oriented architecture, it doesn't work unless you have a service-oriented org chart. And Google does not, apparently.
The big big question for the internet and decades in the future is, you say you're going to organize the world's information. What is the organization going to look like? I think it'll be more interactive. The API will be there, there will be writes. It will be less centralized, with the appropriate authorities owning data and providing an interface to their small piece of the world's information. I think that's eventually going to mean you own your identity and provide as much interface as you care to. The arc of the internet is long but it bends toward decentralization (assuming we keep it out of the hands of the fascists).
For me Amazon is a microcosm of that future, and it's going to be interesting to lead the way there.
What I'm wondering next is, What is the practical take-away for startups and relatively small efforts that are looking to scale? Regardless of tools-stack, what should a forward-thinking developer do? Is the answer to design around a RESTful API specification right from the beginning, then building layers of server-side and client-side code exclusively using that API? etc. etc.
So first, take that stuff Steve said about extensibility to heart. He has another blog somewhere, oh here it is
about software that is alive because it's extensible. That is true of your startup too. You don't want to be a "site", you want to be a "service". And that means you want to be an authority for a unique kind of data, that you want your users to create and use.
I think the Google+ data is pretty unique and cool. I like the user experience. But you can't call it a service, which is bad news until they get their crap together.
I'm a strong believer that flowing data puts pressure on software to work correctly. You want a public API because you don't assume that you and your team are world class geniuses who have exhausted the search space of valid use cases for your data... but your customers can, close enough. (A very Amazon virtue: start with the customer.)
You want to have a well-designed interface for yourself and your users because it's so painful to scale, migrate, control security, etc. without it. So sure, I would say start with it as early as you can stand. Make it public as soon as you can. Allow your users to contribute and build on your data and service.
You'll probably treat your public-facing interfaces with different levels of scrutiny than internal-only ones. This is convenient, but it might be a mistake. You don't want to put off security or user data integrity until it's too late.
Having multiple services means that you can scale them independently. This costs some overhead but you'll be able to right-size your hardware, say with appropriate fleets in EC2.
Sorry that's all kind of generic, but that's about as deep as I would go without a real-world example to talk about.
The High Scalability blog is one I would recommend at the leading edge of this thing. I see posts on the front page alone that cover all I've been talking about and more.
It would be fantastic if someone - maybe you, since you've got a high-level 10,000' view down to the on-the-ground detailed experience - could consolidate this into an article and provide a guide on how to build software as a platform from the get-go. Any chance of that? :)