Bubblesort is slow on App Engine too, and you'd do much better switching to a different sort algorithm on EC2 as well. Every platform has a way to shoot yourself in the foot. Perhaps the solution isn't to move off, but to just not do that. Or, this blog post could be titled "How we didn't do our due diligence and chose a platform which was unsuitable for our problem space".
That is of not much consequence. The important part is that they evaluate their choice and are ready to move onto a better platform. Without destroying the business.
If you check out the blog, you can see they did it for jQuery as well. Good for them. You could also argue they should have made other decisions from the start. But hey, they learned and it works for them.
What is more, they tell the world about it and try to prevent others from making the wrong choices. Even providing pointers to resources that were useful for them.
What I trying to say, they deserve a bit more credit than an off hand remark like: "our square peg didn't fit into App Engine's round hole".
The comment about bulk uploads/downloads is definitely a problem where GAE needs more work.
On the other hand, GAE forces you to write software that doesn't depend on long running instances, because real world servers do die. The limits about request sizes is also important because it helps guarantee upper bounds to latencies.
Sometimes it's appealing to just roll your own solution, and you might even get better results for when everything works smoothly, until you get stuck.
Of course, not everyone needs a system that is resilient to machine, power, network and datacenter, outages.
Just consider that once you need it, and you design your own solution to accomplish it, you might end up enforcing the same constraints on your application and probably won't get it right for the first couple of iterations.
For instance, I'm 95% confident that the work they did upon instance startup was build the search index based on reading:
"We HAD to use the search API for the last few months
because we couldn’t keep our indexes in memory, because
of the start up time (discussed earlier)."
Not to mention that this index is most likely a deterministic function. In other words, given input data X, the result is always going to be output data X'. The fact that they recalculate the results of a deterministic function upon each instance startup is 100% wasteful. This is the direct cause of their request timeouts. Additionally, it's also the likely cause of the datastore slowness. As they state:
"Pre-computations at start up kept the new instances
busy, and app engine was creating more and more instances
to handle this."
Another thing they criticize is search API performance as being too slow. The current live application appears to do realtime search as you type. If they were implementing realtime search on top of App Engine's search API, I can see how that would be unsuitable as well. The search API is designed to take the results of a single query and fetch the results. If a search operation takes 500ms, that's not very fast but overall isn't a big deal. If that's 500ms per keystroke, then that will be massively slow. Searching upon hitting ENTER or clicking a button would have solved that. Or they could have re-thought their implementation if realtime is required.
App Engine is definitely not suitable for all problems, but it too deserves more credit than their blog entry gives it.
The problem was when it started giving trouble even before 10K records, and not even more than 60 requests per second, which even a very low resource PC would handle without a problem since it won't even take a few milliseconds to compute (probably even without an index, just a sequential search). And we had to make changes to fix this.
It went on and on; every few weeks, users and content would grow and our app would fail. We didn't want to move away; just like you said, we thought the solution wasn't to avoid it, but to solve it.
Finally, after changing/improving the design a number of times, we considered using app engine backends to do central stuff such as maintaining the main index. At the same time, looking back at what we've been doing so far, it was quite clear that we were spending our time, which for sure we should have spent on building something that adds value to users, on learning some platform and trying to alter our architecture to fit into it. And we were going deeper and deeper in the hole, and we knew it would be hard to move.
Our vision is simple, and it has nothing to do with picking up some technology and figuring out how to make use of it. Instead, we try to start from the customer and work backwards. While on app engine, we once stopped taking new low paying customers (listings), until we fixed issues - I think this was a terrible.
Decision of moving away from app engine wasn't easy. We had to literally rewrite everything, and the fear of similar problems coming up was there.
Also, I never recommended anyone not to use it, I was just telling our story. In fact, I still use app engine for some work, and we would switch back to app engine if we are convinced that it's the way to give a better user experience.
About data store being slow, they charge us per data store read. It slowing down as the number of reads increase, for me, sounds like saying it's your fault if your calls drop because you are making a lot of calls.
About search API, we didn't use it for auto completion; we maintain a small dictionary for that.
Just to clarify we were building the index at the start up.
About scaling, we have given some thought to it. But not so much since it's not something we will require in the near future. We can create multiple instances and balance the load as long as the index is small enough to fit in memory.
I'm sure there is a way to get this working on app engine. But we are glad that we moved, and it runs smoothly. And more importantly, we have been able to give the users a lot more benefits during the past couple of months than during an year on app engine, because we had more time to focus on users. And if we had moved earlier, we would have been able to do more.
That's likely many orders of magnitude more than the traffic levels you're experiencing. Clearly the platform works, but you need to architect your application to work with it rather than trying to shoehorn App Engine to work with your architecture.
If you don't have the ability to do that (due to time pressures or other factors), then you made the right call to move to a platform you are familiar with.
This occurred at a time when they had Guido van freaking Rossum on payroll.
I've never worked with App Engine, but the constant discussion of working around its problems reminds me a lot of MongoDB.
I'm guessing that Google Compute Engine wasn't generally available at the time of the decision, but you'd be able to solve most of the issues described with a well thought out front/back-end GAE/GCE architecture since GCE is formally launched now.
I think enough people who're reading this have had a good to fair experience with GAE and, like me, are wondering what hiccups other people are finding with apps on app engine. Subsequently, I also think that we like feeling smart because we've read enough docs and articles to be able to identify ways that their app's design may not mesh with the App Engine way of doing things.
The thing I need to keep reminding myself is that there are many right ways to do it. If vpj and his cohorts are spending less time trying to figure out App Engine and more time on their app, then more power to them. At some point you have to just figure out when the effort expended outweighs the potential benefits and cut your losses, and I think this post is written in a clear and fair way.
The thing that makes me upset though is that there isn't a central place that can clearly explains the pitfalls that he fell in, what the best way to do it on App Engine, and why. Those docs are kind of a mess, amirite?
I applaud the author for making changes to support their business more effectively, I'm just not sure what I'm supposed to take away from this other than someone successfully changed some stuff.