

The Unofficial Guide to Migrating Off of Google App Engine - jrk
http://www-cs-students.stanford.edu/~silver/gae.html

======
powera
A few points (disclaimer: I work at Google on projects including App Engine)

1) It's not fair at all to say that Google "continue to ignore the two
critical issues of uptime and data store latency". The HR datastore is
specifically designed to address the concerns about variable latency of
datastore operations, and to prevent both planned and unplanned downtime.

2) In my experience, high CPU cost for datastore operations is generally tied
to things like having large numbers of indexes, or doing queries that don't
have the right indexes. Each index written involves a bigtable write, if you
are doing hundreds of these per entity it can become very expensive. That
said, in general it isn't easy enough to know when you are doing something
that will be expensive.

3) The bulkloader is definitely painful to use if you aren't familiar with it.
In particular, casting schemaless data to a format such as CSV is hard because
of the problem that rows can have unexpected keys. An improved "Bulk Datastore
Import and Export tool" is on the 6-month roadmap for the product.

~~~
lyddonb
1) Yep. My company moved to an HRD instance this week and our CS reps and
customers have both noticed a significant difference in stability. That
doesn't imply that it's perfect but to say that Google is ignoring the issue
is completely false.

~~~
joshu
How bad was the cost differential?

~~~
lyddonb
Not sure yet. Even though we are one of the bigger clients on App Engine we
weren't getting charged much at all for our usage anyways. But I believe
Google said it would be around 3x for the HRD which isn't bad at all. Our
executives really didn't care though as stability and reliability to our
customers is worth too much to worry 3X (GAE's non-hrd stability has not been
good recently). Btw we all use EC2 for some of our other services and that is
costing us quite a bit more than GAE is at the moment. We'll see if that
continues to hold true as we grow.

------
davepeck
I've been a heavy App Engine user since the beta days -- even gave a talk at
I|O 2009 about scaling with App Engine. I have clients that run complex GAE
sites that handle millions of daily requests.

I somewhat disagree with the author about scalability. There is a very narrow
sweet spot of apps for which App Engine is a quite natural solution. If you're
in the sweet spot, you'll probably scale well without too much up-front
engineering investment.

The sweet spot _is_ narrow, though. For example, as the OP states: geo data
doesn't really belong on GAE -- you can do a limited set of bounding
box/proximity queries with the third party geomodel library, but wow, it's
expensive and dog slow!

I do agree with the author about both the status dashboard (it only sometimes
reflects my current experience with the system) and the surprising variation
it data store latency. Latency has been much improved of late with 1.4 and
beyond.

~~~
smg
Could you please elaborate more on the "narrow sweet spot".? I would love to
know which applications fit this sweet spot.

~~~
thetrumanshow
I must be in it, because I have rarely seen a downside to using AE. Here's the
nature of my apps:

1) Extremely simple or very flattened data-model 2) Few writes, TONs of reads
(I peak at 150 requests/second every day) 3) The occasional
DeadlineExceededError causes me little or no headache. For some people this
would be frustrating.

Also AE is awesome as a CDN. The latency is very tolerable for static assets.

------
thurn
I found that the AppEngine 1.4 SDK addressed many of these concerns.
Personally, I've managed 50 requests per second on my blog without trouble,
and with minimal CPU overhead. That's probably because everything is in
memcache, so the database almost never gets hit. The pricing structure seems
to actively encourage you to memcache as much as possible, too. Things are
probably pretty different in a more write-intensive app, though.

~~~
StavrosK
Which blogging engine do you use and how much do you pay? I'm looking to
migrate my blog to GAE too. Also, how's the latency?

~~~
thurn
I wrote my own. <http://github.com/thurn/ackbar>. If an article gets popular,
I've paid up to $5, but below 6000 hits, it's all free. Latency could be
better, but there's a lot of factors there (Clojure might be one of them?).

~~~
StavrosK
I see, thanks. If you don't mind, how come you pay? The free tier seems very
large for a simple blog, which resource did you run out of?

~~~
thurn
Bandwidth runs out fast (images are the killer). CPU has gone over a couple
times too.

~~~
iffius
I take it that your images are rather static. If so, you should be adding
caching headers which will cache them for free in Google's front end servers.

------
hello_moto
I don't quite understand with some of the commenters: why do you guys talk
about caching immediately as if your app needs to scale from the ground?

Isn't the point of GAE is that it scales (as long as you don't do stupid
queries)?

If we have to put everything in memcached, what's the point of using GAE?

I also don't quite understand the push of using memcached for almost
everything (especially for young startups). How do you handle data integrity?
I'm guessing most data models of young startups are fairly simple and only
contain at most 10 models with almost no relationship? Otherwise data
integrity is painful.

~~~
nl
Your site will _scale_ well - as in it will give exactly the same performance
for the millionth user as for the first user.

That doesn't mean you will have a _fast_ site, though.

On GAE the datastore is pretty slow (it's a lot better now, but it used to be
terrible), so a common pattern it to use a read-though cache to improve
performance.

I wrote a thing that touched on this a couple of months back:
[http://nicklothian.com/blog/2010/11/23/a-pragmatic-
approach-...](http://nicklothian.com/blog/2010/11/23/a-pragmatic-approach-to-
google-appengine/)

~~~
hello_moto
I saw your blog. One question for you if you're using GAE/J: how heavy it is
the GAE/J warm-up compare to GAE/Python?

I heard stories where some people that use Spring MVC would have their request
fails because the call stack is too deep.

~~~
nl
Java is heavy, but there are ways to work around it.

The worst offender is the data access libraries. JPA is bad, JDP slightly
better, but something like Objectify works really well.

Typically, Spring isn't the problem (I guess it could be if it is doing a lot
of classpath scanning stuff or something though)

------
St-Clock
"4) The GAE design patterns in python are ugly. I find our current Sinatra-
based implementation cleaner and easier to understand. Python + django is
verbose, its templating system is obtuse, and its testing framework is, well,
I don't know because I've never seen it. This point is a religious one, so
I'll leave it be"

Should Python programmers write the unofficial guide to migrating off of
heroku?

------
mark_l_watson
A bit off topic: all of my customers but one in the last two years wanted to
deploy to Amazon Web Services. I find this odd, being enthusiastic myself
about AppEngine (no paid work, but I host some of my projects with it and I
have written a few articles on GAE).

Clearly, not every web app is a good candidate for GAE.

I have found objectify-appengine to be nicer to work with than the _official_
Java data store APIs and I think it helps minimize loading request times.

------
bartman
Typhoon App Engine allows you to set up your own GAE compatible environment:
<http://code.google.com/p/typhoonae/>

I never used it, but it sure looks interesting if your application doesn't fit
GAE anymore or you want to make specific infrastructural adjustments. They
support multiple different database/http/.. servers too.

~~~
shimon
This project looks promising. Could be a good basis for a consulting business,
taking apps that have hit some obstacles or design limitations on Google AE,
and getting them running on TyphoonAE-based hosting with app-specific
infrastructure customizations.

------
gcv
I'm starting to notice that App Engine's memcache is incredibly slow. If
serving up a page requires one memcache hit (for, say, caching entire response
objects), it works well. If it starts to require, say, a dozen, the response
time slows to ~700ms at best, and ~4s at worst.

~~~
StavrosK
Oh wow, that's surprising and somewhat disconcerting. Do you have any sort of
reference? If true, it's very serious.

~~~
gcv
I have no exact data at the moment. I looked through my application's logs and
compared the response times in situations when the front page loads from a
full-page cache entry and when it has to build the page from multiple pieces
of memcached data. If I find time, I'll write a small benchmark application
which tests memcache performance. Empirically however, App Engine's "memcache"
is pretty slow.

~~~
marndt
Using appstats, you should be able to get this information without having to
write your own profiling.

------
grandalf
This guy loses credibility when he says that Django templates is ugly and that
there's no unit testing framework.

Those are defensible positions in the first 10 minute impression of app
engine, but building a simple app (and testing it with GAEUnit) should put
both to rest immediately.

Yes, the Datastore is unreliable, but a newer, more reliable version has been
released.

App engine is still my preferred platform of choice -- just waiting for ssl,
naked domains, and per-entity-group selection of which datastore service level
to use.

------
joelburget
On the other hand, GAE looks like a great fit for a static site built with
Jekyll, like a personal blog. In fact I'm planning to migrate my site over
today. Almost any personal site will be free to host. Even though I would only
be paying a few dollars with S3, I see no reason not to give app engine a
shot.

One thing I worry about is the reported downtime. I'm not sure whether or not
that will affect a static site.

~~~
StavrosK
Our company website (<http://www.stochastictechnologies.com/>) and my resume
(<http://resume.korokithakis.net/>) runs on GAE, on _one instance_. However,
the script has a sort of mod_rewrite, so it requires that you declare your
URLs beforehand. If you'd like to check it out, see
[https://github.com/stochastic-technologies/static-
appengine-...](https://github.com/stochastic-technologies/static-appengine-
hoster).

------
realmojo
Django might not be the best tool for GAE.

~~~
jbaker
My experience agrees with this. It should be noted that this is largely due to
cold start time.

~~~
StavrosK
I disagree, this is only an issue if you have almost no visitors, and then you
can pay a nominal fee and keep instances awake. This is definitely worth the
development time you will save and the ability to move off GAE easily if you
need to.

------
bane
Can someone provide some enlightenment about why one would migrate away from
GAE? I only ask because I'm currently building an a web application on GAE and
am starting to look down the road, wondering if building on GAE will allow us
to grow as a business in the ways we want to, or if there are some arbitrary
limitations that will come to bite us later.

(knowing of course that we're more or less tied to Google's infrastructure and
the GAE way of doing things)

~~~
shimon
The original article does point out a few reasons, but the way I see it is
this: AE is great if you want to scale and your app can scale in the AE way.
There is a sweet spot and the tools work pretty well if you're in that sweet
spot, but if you stray outside that spot you don't have a lot of options. Even
getting your data out can be difficult, making the "export data and start
over" nuclear option difficult.

If you know what you're building and you want to scale it using the sort of
tools AE provides -- i.e. the datastore, taskqueue, etc. all fits your app
well -- it's quite good. And quite cheap; it's not really fair to compare per-
resource pricing to AWS because on AE you'll only pay for what you use instead
of paying for idle time on a server instance. But it definitely constrains
what you can do, and also locks you to a single hosting provider. If you're
still exploring what you want to do, that level of design and vendor lock-in
can be a pretty severe liability.

~~~
vsiva68
Another way of saying the same thing is that GAE is perfectly fine if you know
all the requirements of your app before you build it. In a perfect world, you
know exactly what you want to build, everything you want is available with
GAE, then you go build it and it scales nicely.

In the real world however, requirements change: * You come up with a new idea
that requires a certain library. Chances are, the library won't work on GAE
out of the box. * You find out that you need to change your schema. It is
pretty hard to update to the new schema while keeping everything in sync.

Finally, you pay the Google cost. When Google implements a new feature, they
spend enormous time making sure that it scales well. They need to do so since
they could be looking at millions of users on day 1. Most of us however, are
looking to build something as cheap as we can, not knowing whether anyone is
going to bother to look at it. However, you have to do the same performance
optimizations that Google has to do so that your app scales. Chances are, it
will be wasted effort - unless your objective is to just learn. I find it
funny that GAE goes completely against the rule that "Premature Optimization
is the root of all evil". Yes, you should think about your application's
scalability. But your bigger problem should be about finding traction, and
being able to react fast, not optimize for millions of views.

~~~
bane
If I understand correctly then, it would appear that GAE may be poor for
applications that require heavy computation on the server side (say, a
facebook style graph, crawling and computing various metrics on it) but great
for serving tons of dynamic pages?

