I'm a summer intern at Khan Academy working on various App Engine projects. My uneducated guess is that we're one of the platform's bigger customers.
One of GAE's biggest downsides is lack of control: unlike EC2 where you get to mold vanilla Linux installations to suit your needs, your application must fit GAE's service model through and through. But by giving up some of those freedoms you receive a whole lot of awesome in return. Khan Academy has successfully leveraged App Engine to scale to millions of users without hiring a single sysadmin or spending too much time worrying about anything ops-related. We're able to handle traffic spikes like our 60 Minutes appearance and the launch of our new computer science curriculum (http://khanacademy.org/cs) with no sweat. To deploy the site, any developer (or our friendly CI bot) can simply run our "deploy.py" and wait a few minutes, then get back to spending time on the product. We haven't had to think once about whether or not the database can handle the write load we throw at it; the App Engine Datastore is uniquely worry-free in that regard. (Well, I'm sure Google SREs worry about it plenty, but we don't have to.)
The GAE platform is a moving target, which is a Very Good Thing because it demonstrates Google's investment in the product and indicates that the App Engine team truly understands developer pain points and is working to solve them. I'll point to backends and Guido van Rossum's work on the NDB datastore interface as features that give GAE developers a little more freedom to use the platform in the most efficient way possible. If you're going to complain about lock-in, you might as well go all the way and take advantage of everything you get in return. You might find that it's not worth your time to replicate all of that elsewhere, especially when doing so on top of an IaaS provider like AWS introduces its own set of inefficiencies and points of failure.
P.S.: If you'd like to come work with great people on great stuff without worrying about keeping the site up, we're hiring: http://khanacademy.org/careers
App Engine teaches you to do things right from the start. While it's true you may have a horrendously sub optimal design if you don't take care, there are nice profiling tools you can use to optimize the hottest paths.
As it would be with any platform, making the wrong assumptions about your data model will bite you, and do so a lot more painfully if you wait too long to correct them.
But, it has to be said setting up a similarity scalable infrastructure on a IaaS would also cost you a lot, even if you are not counting your own hours as the sysadmin.
Finally, if you succeed and your hosting bill starts to cost a fortune, you might as well roll out your own Typhoonae or Appscale.
I'm saving $100k+ a year by using GAE and not having to hire a fulltime sysadmin (or pretend to be one myself). When I hear 'my company had me switch to AWS/Heroku because of GAE costs'... I keep thinking... well, they could probably stick with GAE and get rid of you (and your manager) and save even more money.
A bit of sub optimal design expense is nothing compared with hiring employees and it is relatively easy to refactor code to fix parts of your app that are costing money. I'm sorry the OP has to actually spend a bit of time thinking about how much implementing this new feature is going to cost, but that sounds like basic sound engineering principles to me.
In the end, I'd rather focus on adding features to my webapp than getting paged in the middle of the night when AWS/Heroku goes down or my site suddenly gets a lot of requests. From that aspect, GAE has been a lifesaver for us. =)
Do you feel like GAE and Heroku make different trade-offs in this regard? I ask because the benefits of GAE you describe sound like reasons I like Heroku, but you see to be suggesting that Heroku is higher maintenance in some way.
Just curious as a Heroku user who wants to keep tabs on GAE.
Beyond the free tier of Heroku, I feel like it is significantly more expensive than GAE.
Recently I've been doing a fair bit of traffic and I'm still below the $2.10/week minimum. If I get even more traffic I don't have to touch anything to deal with it. Heroku doesn't auto-scale quite the way that GAE does, it isn't transparent. Yes, I know there are tools to help with this, but GAE does it automatically.
A key point for me is that Heroku doesn't have the GAE Datastore. This might be fine for you, but the hope for my company is to get as large as possible and the last thing I want to think about is replication, backups and scaling of databases. I've done that before and it sucks.
People seem to quickly forget about the recent outages on Heroku. That said the past year on GAE has been pretty darn stable. I think they've really fixed a lot of the issues they were having early on.
I do use Heroku for two apps running in the free tier and I can't complain because they are free and they wouldn't be possible in GAE. I have a NodeJS webpage scraper (which is controlled from RESTful hits from GAE) and a reverse nginx proxy so that I can serve up mapquest OSM tiles over https. =)
So, maybe it boils down to the needs of your application and your tech stack comfort level (there is no Ruby or PHP on GAE).
Well, you can't get thousands of bucks of costs in GAE if you don't set limits that high.
The way I do it: I've set the daily limit to something really high, but something that won't make me go bankrupt (like $100/$200, while I usually spend $5/day).
I don't worry about spikes in payments - as the article says, it's not worth to waste my time analysing all the border cases. If a border case happens, I'll deal with it then.
It's much worse with Amazon Web Services which has no such easily set limits. I've heard about a guy getting slapped thousands of bucks bacuse of bandwith usage due to some bug.
while it's true that appengine by nature insists on optimization towards roflscale a lot earlier in development than what we devs would normally prefer, I think that the pricing model is designed for a different class of app than what most people are building.
most people want that killer app that is offered for free to the millions of users and you figure out how to monetize it later after you get funding and eventually acquired.
appengines pricing model insists on profitability sooner. if you instead built that killer app for millions of users that pay $12/year, the amount you paid to appengine wouldn't nearly sting as bad.
to say the cost is hidden is almost absurd, the cost is staring you in the face every day you log in to the admin console. Every month you get an invoice in your email. I see this as a good thing. A developer that can deliver an application that can meet the scalability needs at a low cost would be a tremendous asset for any company looking to make money.
i don't see it as hidden. if you read the docs, they make it pretty clear what the limitations are on things like the datastore and why. The api's start from the assumption that you will need roflscale. It can be a pain in the ass, but it's a pain in the ass that you know about going into it.
My company was using AppEngine to host our product but we switched over to AWS (python, nginx/tornado, mongo) last year because of similar problems. On a small amount of load the problems you are describing can be high. On a large scale like we have these problems cost a fortune.
We new that some of the costs we were incurring were do to bad design on our part (We were all new to No-SQL databases when we started). But after going through our internal cost of resolving these issues versus rewriting our app to use Mongo we decided on the latter. We haven't looked back since...
He does make a good point about the mental structuring you should do for your projects. I remember reading somewhere the VP of Google had a 'line' of criticality. If it was above the line, work on it until it's below the line. Then even if it's unfinished, so long as it's below the line go and pick something else above the line.
If your hosting cost isn't yet an issue, ignore it then optimise when it is.
This can be solved with a slightly redesigned data-model. Go back to 2.
App Engine is premium service -- it's replicated across at least three data centers, scales automatically, and has a world-class team managing it. They provide a free memcached service, and the pricing model is designed so that you use it. If you design your app to use memcached from the start, you can eliminate much of your read costs.
This problem is nothing new in the world of programming. That's what libraries try to solve. GAE just happens to be a platform for which few abstractions exist because some of its technical choices render it slightly different than what the industry has been doing in other areas. If anything, this post is a scream for higher-level libraries targeting GAE.
I don't see how this is in any way unique to GAE. Sure, GAE requires some specific optimizations to get the most out of it, but so does everything else. Someone with a lot of GAE experience and close to zero classic LAMP experience could similary complain about how time consuming it is to learn all the optimization tricks of that stack, compared to how easily everything just works with GAE.
For most of the websites, currently on the web, I don't think AppEngine or Heroku is the right choice - they enable you to grow from nothing to millions of requests in seconds - but I can't think of many sites that have this issue.
> For most of the websites, currently on the web, I don't think AppEngine or Heroku is the right choice
Besides instant roflscaling, another advantage is that a site (basically) won't go down for the life of the PaaS. I can throw a site up and not worry about: logs filling up my server's HDD, forgetting to update my OS and exposing an 0-day letting people inject malware into my homepage, DDOS attacks, routing misconfigurations, shared hosting companies rebooting your server without notifying you (yes, I've had this happen), or other transient things. Heroku has a philosophy that an app should effectively run forever without maintenance on their platform, but I can't find a link to that now.