One of GAE's biggest downsides is lack of control: unlike EC2 where you get to mold vanilla Linux installations to suit your needs, your application must fit GAE's service model through and through. But by giving up some of those freedoms you receive a whole lot of awesome in return. Khan Academy has successfully leveraged App Engine to scale to millions of users without hiring a single sysadmin or spending too much time worrying about anything ops-related. We're able to handle traffic spikes like our 60 Minutes appearance and the launch of our new computer science curriculum (http://khanacademy.org/cs) with no sweat. To deploy the site, any developer (or our friendly CI bot) can simply run our "deploy.py" and wait a few minutes, then get back to spending time on the product. We haven't had to think once about whether or not the database can handle the write load we throw at it; the App Engine Datastore is uniquely worry-free in that regard. (Well, I'm sure Google SREs worry about it plenty, but we don't have to.)
The GAE platform is a moving target, which is a Very Good Thing because it demonstrates Google's investment in the product and indicates that the App Engine team truly understands developer pain points and is working to solve them. I'll point to backends and Guido van Rossum's work on the NDB datastore interface as features that give GAE developers a little more freedom to use the platform in the most efficient way possible. If you're going to complain about lock-in, you might as well go all the way and take advantage of everything you get in return. You might find that it's not worth your time to replicate all of that elsewhere, especially when doing so on top of an IaaS provider like AWS introduces its own set of inefficiencies and points of failure.
P.S.: If you'd like to come work with great people on great stuff without worrying about keeping the site up, we're hiring: http://khanacademy.org/careers
Of course you guys are already at the millions-of-users use case, but KA is a (the?) great example of an app where GAE worked out from start to finish, which I find encouraging.
If you want more from us about GAE, our lead developer Ben Kamens' blog is full of good stuff: http://bjk5.com
As it would be with any platform, making the wrong assumptions about your data model will bite you, and do so a lot more painfully if you wait too long to correct them.
But, it has to be said setting up a similarity scalable infrastructure on a IaaS would also cost you a lot, even if you are not counting your own hours as the sysadmin.
Finally, if you succeed and your hosting bill starts to cost a fortune, you might as well roll out your own Typhoonae or Appscale.
A bit of sub optimal design expense is nothing compared with hiring employees and it is relatively easy to refactor code to fix parts of your app that are costing money. I'm sorry the OP has to actually spend a bit of time thinking about how much implementing this new feature is going to cost, but that sounds like basic sound engineering principles to me.
In the end, I'd rather focus on adding features to my webapp than getting paged in the middle of the night when AWS/Heroku goes down or my site suddenly gets a lot of requests. From that aspect, GAE has been a lifesaver for us. =)
Just curious as a Heroku user who wants to keep tabs on GAE.
Recently I've been doing a fair bit of traffic and I'm still below the $2.10/week minimum. If I get even more traffic I don't have to touch anything to deal with it. Heroku doesn't auto-scale quite the way that GAE does, it isn't transparent. Yes, I know there are tools to help with this, but GAE does it automatically.
A key point for me is that Heroku doesn't have the GAE Datastore. This might be fine for you, but the hope for my company is to get as large as possible and the last thing I want to think about is replication, backups and scaling of databases. I've done that before and it sucks.
People seem to quickly forget about the recent outages on Heroku. That said the past year on GAE has been pretty darn stable. I think they've really fixed a lot of the issues they were having early on.
I do use Heroku for two apps running in the free tier and I can't complain because they are free and they wouldn't be possible in GAE. I have a NodeJS webpage scraper (which is controlled from RESTful hits from GAE) and a reverse nginx proxy so that I can serve up mapquest OSM tiles over https. =)
So, maybe it boils down to the needs of your application and your tech stack comfort level (there is no Ruby or PHP on GAE).
On a side note, I love your business! =)
The way I do it: I've set the daily limit to something really high, but something that won't make me go bankrupt (like $100/$200, while I usually spend $5/day).
I don't worry about spikes in payments - as the article says, it's not worth to waste my time analysing all the border cases. If a border case happens, I'll deal with it then.
It's much worse with Amazon Web Services which has no such easily set limits. I've heard about a guy getting slapped thousands of bucks bacuse of bandwith usage due to some bug.
most people want that killer app that is offered for free to the millions of users and you figure out how to monetize it later after you get funding and eventually acquired.
appengines pricing model insists on profitability sooner. if you instead built that killer app for millions of users that pay $12/year, the amount you paid to appengine wouldn't nearly sting as bad.
to say the cost is hidden is almost absurd, the cost is staring you in the face every day you log in to the admin console. Every month you get an invoice in your email. I see this as a good thing. A developer that can deliver an application that can meet the scalability needs at a low cost would be a tremendous asset for any company looking to make money.
We new that some of the costs we were incurring were do to bad design on our part (We were all new to No-SQL databases when we started). But after going through our internal cost of resolving these issues versus rewriting our app to use Mongo we decided on the latter. We haven't looked back since...
If your hosting cost isn't yet an issue, ignore it then optimise when it is.
App Engine is premium service -- it's replicated across at least three data centers, scales automatically, and has a world-class team managing it. They provide a free memcached service, and the pricing model is designed so that you use it. If you design your app to use memcached from the start, you can eliminate much of your read costs.
I find myself implementing a caching layer for my models (this was pre-ndb) instead of the feature I intended to.
This problem is nothing new in the world of programming. That's what libraries try to solve. GAE just happens to be a platform for which few abstractions exist because some of its technical choices render it slightly different than what the industry has been doing in other areas. If anything, this post is a scream for higher-level libraries targeting GAE.
Besides instant roflscaling, another advantage is that a site (basically) won't go down for the life of the PaaS. I can throw a site up and not worry about: logs filling up my server's HDD, forgetting to update my OS and exposing an 0-day letting people inject malware into my homepage, DDOS attacks, routing misconfigurations, shared hosting companies rebooting your server without notifying you (yes, I've had this happen), or other transient things. Heroku has a philosophy that an app should effectively run forever without maintenance on their platform, but I can't find a link to that now.