I, and many others, spent a lot of time figuring out how to write apps that do it the "app engine way":
* Fast completes (30 second timeout)
* Offloading to task queues when you cant
* Blobstore two-phase upload urls
* Mail eccentricities
We believed them, because it seemed reasonable. We laughed at those who complained that django would hit the 30-second limit: "Its not a general hosting! Figure out the App Engine way!" And we educated on how to do it right, and many were happy.
Well, it turns out that it is general purpose hosting, with all of the costs, and yet also with all of the (once rational, now bullshit) idiosyncrasies.
But that's not the biggest complaint. The biggest complaint is that when my friends and peers objected to App Engine, its strange requirements and its potential lock in, they were right and I am a fucking naive idiot. And I really don't like to be proven a naive idiot. I put my faith in Google's engineers and they have utterly destroyed my credibility. THIS more than anything is the cost to me.
But mostly, GAE doesn't make sense for larger apps. You can't buy your way out of trouble, by putting your db on a dedicated server with fast drives and tonnes of RAM. You can't really use relational data without performance and reliability issues.
It's not just about the "app engine way". It's not like learning C or Haskell, and having to find a new way to write the code. You fundamentally cannot do big ad-hoc database operations.
And consider this - it was July last year that they introduced the MapperAPI. Before then, I don't think you could do Map-Reduce without manually re-implementing it yourself (on top of the cantankerous Appengine Datastore). Just think about that for a minute - how were you meant to do stuff the Appengine way without map-reduce?
Anyway, I don't think your credibility was "utterly destroyed". It was really hard to know whether or not the learning curve was worth climbing until you had tried. You just had to judge the book by its cover, and the "Google" brand is pretty compelling to an engineer. It's not the first time someone has been fooled into buying something because the provider has a good reputation.
In fact if you look at the recent comments of certain GAE engineers, they seem to believe that GAE is precisely for scaling, and that's why it now costs so much: its only for the big boys.
The problem is that I can never become one of the "big boys" on their system, because pretty much as soon as I get any traction, I have to move to EC2 or heroku or go broke. Their new found belief in the scalability of their system is just arrogance. Anyone can claim to handle lots of traffic when you require that your customers run 20 times as many frontends as they should reasonably need.
Good luck to them in the Royal Wedding market.
EDIT: on the nature of "relational"
Care to elaborate?
There are no longer, in my view, any situations where a SQL db is the best idea. You either want a giant NoSQL database, or you want a massive in-memory object-graph using pointers. Or you want something for $20m from Oracle or IBM.
Not saying shit cant happen. Now look me in the eye and tell me you never had some noob drop a constraint and forget to put it back.
If Michael Arlington changes his job from "editor in chief" to "founder, former editor, occasional contributor, and CEO of Arlington Investments", and his old posts aren't all updated, it's not the end of the world.
It really depends on the problem domain. You wouldn't run a bank's ledger off MongoDB. On the other hand, a bank's ledger should be radically simple, with little need for normalization.
That's obviously an example of something that will practically never happen, which is why it doesn't work all that well as a justification for ditching SQL databases altogether.
I've never used NoSQL for anything, so there must be a lot that I'm missing, and that's why I asked. But it seems to me like you'd be digging up necessary information through quite a few steps if everything is "flat".
Digging kills you. I assert that SQL does the digging automatically, and thats exactly why it doesnt scale.
Your app will most likely have some kind of "entities", and then records to represent them. How much information can and should you cram into records of various "types"?
How much information do you typically end up duplicating across all those "entity records", and is it not a problem?
That included learning NoSQL. At least that part was not a waste. There are no right answers to your questions, there are only right actions, starting with stepping outside the SQL box and writing an app using NoSQL. I started by thinking of a simple app that would be useful to me personally. I knew java servlets, I knew SQL, I knew all sorts of things, but after several iterations my app is architected like no app/server I've ever written before. Almost every iteration involved starting doing it the way I knew how, running into either roadblocks or major cognitive dissonance, and then rewriting it to fit these new-fangled constraints. Its been a huge learning experience. You might like to try it.
Could you just give me a brief description of how you arrange things (like "entities") with NoSQL?
> That's obviously an example of something that will practically never happen
Women changing their name when they get married? A tiny assumption like that can make our software brittle. Now every model that caches the old name needs updating and you need to make sure there aren't any overlapping saves in any of those models that'll overwrite any items in your bulk update. If a single linked model has the wrong old-name cached, your data update process is buggy.
Well, that sounds like the kind of stuff I'd like the other guy to talk about. How does he avoid the bad sides of having all your data in a key - value store?
How would/do you?
Come to a Microsoft event some time and let a Silverlight developer buy you a sympathy beer.
From what else I've read, it sounds like engineers who didn't also wear green eye-shades (or good enough ones, or who didn't possesses or use good enough crystal balls) set up this debacle. And it was people wearing green eye-shades (who we can sincerely hope are also engineers) who aligned it with reality. Causing way too many people way too much pain.
Object lesson: if you're going to sell a service for cash money to others, paying close attention to your costs from the very beginning is not optional.
You might be thinking that in the original measure they did something insane like measure only user time of a process, or only when its executing a request, not booting or whatever (or fuck, I don't know because honestly there is no reasonable explanation). That is to say, that the 31 cpu-hours is a misread, and if the fellow in the article ran his code on EC2, he really would need 879 EC2 instances that day.
But this is not my experience. An extreme example: my app that served 14 pages was rated as taking 0.02 cpuhours, or 720 cpu seconds. This is entirely reasonable, if not excessive (because looking at the app it only took about 200 seconds including warmups). Under the new system, it is claimed that these 14 pages will require 2.8 instance hours.
0.02 => 2.8
31 => 879
The app in the article serves 1.5gb/day and takes 879 instance-hours. What server would you need to do that on EC2: 1mb/s? The hourly cost on GAE is $1.46. Can I do that on a $0.085 EC2 instance? Yeah, I think so.
EDIT: My figures were wrong as I was comparing a $16 (wrong) figure to a $0.8 EC2 figure. The actual figure is $1.46, not $16. So I looked at the bandwidth/cpu numbers to see if a $0.8 EC2 instance is what is required, and I don't believe that it is. I think a $0.085 instance would be enough. YMMV.
This. We always knew GAE was inefficient. There's no doubt about that. Serving 30 or 40 requests per second would spawn quite a few instances and start producing request errors.
This is a load a 4 year old machine could handle with ease.
Why did we put up with this? Because Google didn't make us pay for the crappiness -- the pricing made sense. You don't pay Ferrari prices for a slow car...and during a surge it scales up gracefully. Go from 30 rps to 1000 rps and it'll just work. An old machine co-located someplace won't do that.
Now under the new pricing gouge Google is making us pay for their inefficiencies. All appearances are that this is what this really costs (plus some reasonable markup)...well that's pretty piss poor. Because we're essentially paying to haul cargo in a Ferrari and it's dumb dumb dumb.