
App Engine charges $6,500 to update a ListProperty on 14.1 million entities - branola
http://groups.google.com/group/google-appengine/browse_thread/thread/f85aa58e54ebb8ae#
======
dangrossman
> it cost me a few thousand dollars to delete my millions enities from the
> datastore after a migration job (ikai never replied my post though...) and
> im still paying since the deletion is not completed yet (spending 100-300$ a
> day for the past 2 weeks now!!).

I don't know much about GAE, but a datastore-as-a-service that takes 2 weeks
to delete your data and charges $300 a day to do so just seems... absurd.

~~~
diego
In my opinion Google App Engine is a non-starter for serious applications. I
only know of experiments and toys running there. Look at their gallery of
successful applications:

<http://code.google.com/intl/it-IT/appengine/casestudies.html>

Contrast with AWS:

<http://aws.amazon.com/solutions/case-studies/>

~~~
chucknthem
There are definitely serious applications and businesses built on appengine,
mostly started before their price hike coming out of Beta last November.
<http://www.optimizely.com> (YC W10) and <http://shoesofprey.com> are two
notable examples.

~~~
dextorious
I wouldn't call those two "serious" examples -- especially since the
competitors (Amazon, Heroku, Rackspace etc) have orders of magnitude more
popular, complex and profitable services running on them.

This is like saying, "Famous actors frequent my restaurant" and point to a
picture of Ralph Macchio. OK, somewhat known, but De Niro, Brad Pitt, Clooney
and co eat at the joint across the street.

~~~
jay_kyburz
Actually, I think its great for startups. If you're a small team trying to
work out what your product is, how to sell it and who is going to buy it, you
have more important things to do than learn how to set up and run a server.

Once you've worked out you have a product that people want you can start
thinking about managing your own servers.

People complain about migration, but its really not that big a deal.

~~~
mgkimsal
So instead of learning how to admin a server - a skill that is transferable
anywhere and is pretty much a commodity skill for most web-based startups when
they're - you know, "starting up" - you suggest people tie themselves to a
proprietary app engine and datastore which limits what you can do in ways that
aren't really transportable anywhere else? Spending time learning about GAE
oddities, pricing limits, language limits/restrictions and such... seems a
pretty horrible waste of time for people to engage in.

~~~
stickfigure
Learning how to admin a server is a total waste of time which brings _zero
value to the end user_. If I can hand that drudgery off to someone else, I'll
be adding features and stealing your customers while you're giddy about
colorizing your bash prompt.

~~~
mgkimsal
Setting up a load balanced system and front-end web cache brings "zero value"
to end users? You may as well make the same argument over design or for that
matter development.

Learning how to program a loop is a total waste of time which brings zero
value to the end user. If I can hand that drudgery off to someone else, I'll
be adding features and stealing your customers while you're giddy about
nesting your loops.

Learning how to use photoshop is a total waste of time which brings zero value
to the end user. If I can hand that drudgery off to someone else, I'll be
adding features and stealing your customers while you're giddy about reducing
your PNGs.

~~~
lmm
And both those statements are true. If there's a third party that you can pay
to do your photoshopping, and they deliver high-quality product, damn right
I'm going to hand my photoshop jobs off to them rather than do it myself.
Likewise if there were a third party development shop _that delivered higher
quality than doing it in house_. Concentrate on your USP; for anything else,
if you can get it done competently externally, hand it off.

------
stickfigure
The details have finally been posted in that thread. And while $6500 is a lot
of money, we have to realize that this is a _lot_ of writing - the headline
doesn't give the full sense of it.

For those unfamiliar with GAE, a ListProperty is really a collection of
properties. The author is using the property as a geohash with a significant
number of values, plus he has additional multiproperty indexes defined, plus
he's doing a rewrite (delete + write). All combined it appears to be ~460
writes per entity.

So what we're talking about is $6500 for 6.5 billion writes... exactly what is
printed on the sales brochure. Is that a lot? Most datastores don't charge by
the operation so I don't have a lot to compare it to. It seems expensive but
not crazy, especially considering that the data is replicated via PAXOS to 3+
datacenters with automatic loadbalancing and failover.

~~~
vosper
"Geoboxing is a technique used to search for entities near a point on the
earth in a database that can only perform equality queries (like App Engine)"

So their implementation is a compromise on account of GAE's limitations, and
they have to pay through the nose to use it. This is when I'd be looking at
hosting some features outside of GAE, which is what we do with Full-text
Search.

~~~
stickfigure
No single platform or environment encompasses every use case. GAE doesn't
offer R-trees or other spatial indexes; for that matter neither does MongoDB
or MySQL.

Geohashing is a reasonable solution for some spatial problem domains; it's one
solution along the spectrum of "precalculate a lot up front and make queries
cheap" vs "write in a cheap & easy format but make queries more expensive".
Pre-calculation strategies are usually more scalable when you have large query
loads, but they suck bigtime if you need to fully recalculate a large body of
data (as the original blog author is doing).

Maybe the blogger would be better off using PostGIS; but then, scaling and
synchronizing a large cluster of PostGIS systems is nontrivial. The issues
here are too application-specific to draw any positive or negative conclusions
about appengine.

~~~
jhaglund
FYI, MySQL supports R-trees on MyISAM tables:
[http://dev.mysql.com/doc/refman/5.0/en/creating-spatial-
inde...](http://dev.mysql.com/doc/refman/5.0/en/creating-spatial-indexes.html)

------
ajross
Reading the thread (I'm curious about GAE, not an expert), it seems like the
details aren't clear at all. Neither the original poster nor the Google rep
seem to have a clear idea of what I/O operations are being generated.

But this bit stood out: $0.10 per 100k writes. That price seems to be far too
high. The poster is doing (something like) a reindex of 10M entries (that kind
of data is pretty small really: it's the kind of database you might use as a
test set on your laptop interactively). Figure each modification is atomic,
and that the b-tree height of the storage is ~4. So that's 40M writes to
create an index, or $400!

Seriously? Again, this is the kind of task you'd expect to do quickly and
interactively on your development box, and it costs a price of the same order
as your day's salary (!) to execute in the cloud?

Looking at this from the perspective of the underlying I/O device: this index
consumes just a tiny, tiny fraction of a hard disk drive's capacity. Yet
creating it costs enough to _buy the device_ several times over?

Something is wrong. Is that a misquote or have I misunderstood?

~~~
Strom
$0.10 per 100k writes is just a way of them billing you. Sure it's a lot
compared to your avarage HDD price, but the data is hosted on at least 3
datacenters at all times, running google's software stack & everything is
being constantly monitored by Google's SRE team.

App Engine pricing might seem expensive if you try to do a simple table
comparsion with alternatives, but when you get more deeply into it you'll find
that a lot of stuff that is included in the service with GAE will cost you
extra when you use the alternatives.

~~~
nischalshetty
combine that with the fact that the High Replication Datastore will most
probably never see a downtime and you tend to be ok with the cost (of course
as a app engine customer, you would always want lower costs, but the current
write cost is not a deal breaker).

The only problem here is that "delete" is considered a write and when you want
to delete data you just cannot accept the fact that you need to pay for
something you do not want to keep. I think GAE should definitely look into
this aspect and try to get some cheaper alternatives for data deletion.

------
latchkey
A lot of these AppEngine costs too much notices have been coming up recently,
but upon further inspection, they all tend to boil down to operator error.

Unfortunately, AppEngine isn't forgiving of that and there is a real monetary
value associated with questionable engineering design. Or, design that wasn't
thought through enough in the context of a service like AppEngine.

This leads to a few people getting upset and making a lot of noise when the
reality is that AppEngine is actually an amazing service.

So, to boil down the operator error from a quote in the thread:

"We're running a mapreduce to change the geobox sizes/precision for a large
number of entities."

That is the real source of the problem. Instead of using geoboxes, they should
be using geohashes, which allow arbitrary precision.

<http://code.google.com/apis/maps/articles/geospatial.html>
<http://en.wikipedia.org/wiki/Geohash>

Instead of an indexed property that looks like this (what they currently
have):

[u'37.3411|-121.8940|37.3395|-121.8926',
u'37.3411|-121.8929|37.3395|-121.8916', ...]

They would have an indexed List<String> property that looks like this:

[8, 8f, 8f1, 8f12, 8f12a, 8f12ac, 8f12ac6, 8f12ac60, 8f12ac605, 8f12ac605f,
8f12ac605fb, 8f12ac605fb3, 8f12ac605fb34]

Finding if the location is in a box would be computing the hash from the
lat/lng (there is free code out there to do that) and then doing an indexed
'in' query. The indexes would only need to be updated if the location of the
entity changes, not when they want varying levels of precision.

~~~
thefool
This is incorrect for several reasons.

First off, they mention that when the initial design decision was made, a
similar operation cost ~$160, which is tenable for an operation that only
happens once in a while. This is in fact a case of them getting bitten by the
pricing structure changing after a reasonable design decision (at the time)
was implemented.

Secondly, they mention that this is part of a larger issue: "In our most
common case we might have to add and delete a couple items to the list
property every once in a while. That would still cost us well over $1,000 each
time. Most of the reasons for this type of data in our product is to
compensate for the fact that there isn't full text search yet. I know they are
beta testing full text, but I'm still worried that that also might be too
expensive per write."

This is a real problem that GAE needs to solve.

Finally, their problem doesn't seem to be that they need arbitrary precision,
its that they seem to need fast location centric queries of a large database.

Geoboxes allow you to solve this problem correctly (and quickly), returning
the results in the database that are closest to you. Matching on a geohash can
end up serving the incorrect data unless you resort to hacks involving a
number of queries.

~~~
latchkey
1) You are making an assumption that the original design decision was made
around cost. I bet the more likely fact is that they made the change and found
out that it cost them $160. Remember, that was when AppEngine was a _beta_
product and it seems they got lucky the first time.

2) They seem to have an extreme use case. No one is going to argue that maybe
AppEngine doesn't fit the bill for them. Or, one could argue that doing 6.5
billion writes times a large number of customers, across multiple datacenters
is something that a lot of databases would choke on.

3) Running more queries, while admittedly hacky is less expensive than doing
more writes.

------
endlessvoid94
GAE has become completely infeasible as a hosting solution for me
(ThatHigh.com). My hosting cost increased by 90x, and I did not get nearly
enough notice.

I don't have the time or resources to move the site, so I'm forced to shut it
down. It really, really sucks.

Personally I'm more disappointed by the lack of notice (1 month is nowhere
near enough time) than the actual increase. I totally understand the need to
charge.

~~~
yahelc
If you implement optimizations, you can really significantly curb the cost.

I spent some time tuning SharedCount's API, which would have cost me
$30-$50/day, and its now at about $1-$2/day.

\- Move to Python 2.7 and enable multithreading

\- Setup Cloudflare (this swallows about half of all my requests)

\- Increase minimum latency and reduce the maximum number of idle instances.
(I have 5-8s and 1-2 set, respectively)

\- Setup the semi-undocumented Google edge cache (basically, just a Cache-
Control: public, max-age=[seconds] header.

\- Take advantage of memcache.

With this setup, I'm doing 3 million API calls per day at $2.

~~~
cr4zy
Python 2.7 is experimental and requires migrating to the new datastore (high
replication) which is a lot more expensive. So you might not want to do this
if your costs are coming from datastore writes.

Also, high replication queries can return stale results unless you use
ancestor queries. Ancestor queries require putting entities in groups by
giving them all the same parent (which can never be changed). Basically it's a
very inflexible semaphore and kind of sucks IMO.

Your suggestions in general are very good though. Thanks, I'm switching my DNS
to CloudFlare now.

~~~
stickfigure
The HRD is the same price as the Master/Save datastore. In the old pricing
regime it was more expensive, but now they are at parity.

It's true that eventual consistency of queries on the HRD can be tricky to
program around. On the other hand... your data is replicated to 3+ datacenters
and failed over in realtime. Pretty rad.

------
tuhin
For those questioning the use of Google App engine as a serious platform for
applications, we at Pulse use Google App Engine:
[http://googleappengine.blogspot.com/2011/11/scaling-with-
kin...](http://googleappengine.blogspot.com/2011/11/scaling-with-kindle-
fire.html)

So, yes you can build serious applications on GAE but like everything else it
boils down to, it depends on what you really need.

------
6ren
I like the spot price idea (in the comments
[http://groups.google.com/group/google-
appengine/msg/fe9a05c6...](http://groups.google.com/group/google-
appengine/msg/fe9a05c6868e086d)). It's similar to adwords' automatic auction
for how close to the top your ad is, with the same benefits of getting the
best market price (for buyers and sellers) of a limited resource. If no one
else is using it, it could become close to free.

It also casts the other users as the opponent, instead of google.

------
j45
I wonder what a managed or dedicated server would cost to perform the same
calculations.

Sometime's it's still cheaper to have your own managed / self-managed gear...
and from the looks of this pricing, even hire someone fulltime/freelancing to
manage it all for you.

~~~
charliesome
s/sometimes/all the time/;

I honestly do not get why people are so fascinated with the cloud. It's a very
expensive way to avoid having to know what you're doing.

~~~
wccrawford
In my experience, all ways to 'avoid knowing what you're doing' are expensive.

It's all about time, and lack of it. If you can spend 1/10th of the time and
still make a good profit, you could spend the other 9/10ths doing other
profitable things.

~~~
charliesome
At the end of the day, you're still going to need to A) learn what you're
doing or B) hire someone who does know what they're doing - and given that one
of the major selling points of these PaaS offerings is that you can get up and
running without needing a dedicated sys admin, this seems particularly silly
to me!

------
tszming
Besides the cost, if your startup can survive without SSL support on your own
domain, go for App Engine.

See:

[http://code.google.com/p/googleappengine/issues/detail?id=79...](http://code.google.com/p/googleappengine/issues/detail?id=792)

People requested custom SSL support at 2008, and today is 2012, if you still
believe in App Engine, good luck!

~~~
latchkey
[http://googleappengine.blogspot.com/2011/10/app-engine-
ssl-f...](http://googleappengine.blogspot.com/2011/10/app-engine-ssl-for-
custom-domains-in.html)

~~~
tszming
See the last comment of my link:

>> the "trusted tester program" is a joke . They never respond so it's just a
waste of time .

Even they launch this feature TODAY, so 4 years for a basic requirement, what
you can expect from them?

~~~
latchkey
I can think of TONS of services I've used over the years, both in and out of
the software field that have talked about adding features and never did. Big
deal, don't play the victim.

If you really wanted onto the trusted tester program, you'd bring it up on the
app engine mailing list or contact someone at google directly (their emails
are all over the place, Ikai is a great guy, and they are very responsive).
I'm sure they'd be happy to have enthusiastic beta testers.

I find great irony in your quote on your G+ profile:

"Do you create anything, or just criticize others work and belittle their
motivations? -- Steve Jobs"

~~~
tszming
Thanks for your reminder, I shouldn't belittle the enthusiasm of GAE's
engineers/supporters, my bad.

But as SSL support is not public yet, my statement above is still valid:

If your startup can survive without SSL support on your own domain, go for App
Engine!

Good luck!

------
seanp2k2
Google has a great platform here with lots of potential if they opened it up
more, but they're pricing themselves into a corner for this already-niche
service.

Sadly, this is kind of "typical Google" -- great product, decent execution,
but a bad identity problem -- it really feels like they're not sure yet what
they want to do with this.

~~~
salimmadjd
This is all because of their new pricing model. Overnight our pricing went up
by 5X and that was after 50% discount, which make it really a 10X hike. Here
is a graph of it:
[https://plus.google.com/114790424055754975707/posts/eUMhYDVf...](https://plus.google.com/114790424055754975707/posts/eUMhYDVf6i5)

------
IgorPartola
GAE is a mistake. By that I mean that it's got a big design flaw that's bound
to cost Google money, which means it'll always be expensive than alternatives.
Consider shared PHP hosting: a request comes in, apache finds which PHP file
is responsible for it, then directs the request to the PHP interpreter. The
PHP interpreter will parse the file (or more likely load the parsed bytecode
from a cache) and return a response, which apache will then forward to the
user agent. Notice that aside from the cache the PHP interpreter is stateless.
As soon as it is done serving a request from site foo.com it can immediately
jump on a request from bar.com and the context switch doesn't cost anything
(once again disregarding finite cache size issues).

Contrast this with running a stand-alone application server for each site,
which is what GAE does. Here, even if your code is not serving any requests
it's still waiting to get them. Now, GAE has powerful magic in it to retire
request handlers which aren't frequently used. This way if site foo.com is
getting 1 request/minute, it only really needs one process/thread/hander
abstraction at a time. However, it is expensive to start/stop these
"processes", so instead GAE is forced to keep this "process" around for a
while after a request has been served hoping that the cost of keeping it alive
would be justified by a second request. Thus these stateful, slow-to-start
processes are always taking up resources that could be used to serve other
requests.

Disclaimer: all my knowledge of GAE has been from reading their docs/blog, not
from deploying projects to it.

Disclaimer 2: I am not saying that PHP is better/worse than GAE in any way.
However, I am saying that the model that GAE uses is more costly for a typical
application. This can be easily seen by comparing the cost of running a basic
site on GAE vs $2/month shared hosting.

~~~
jackowayed
Heroku uses the same model. They're doing quite well.

GAE has problems, but I think the root is just how unique everything is. That
manifests itself in people using a datastore that they don't understand, with
Google expecting them to know how many writes an action will take and whether
that feels like the right number of writes or two orders of magnitude more
than if they made a different decision about how to store their data and solve
their problems.

It also manifests itself in the lockin that Heroku mostly avoids (which is a
huge problem if some subset of users get to a point where they realize
"whoops, this would be much easier if I could do things Google won't let me
do, time to leave").

I think a good counterexample is Engine Yard and GitHub. Engine Yard had a
somewhat limited offering (especially for what GitHub was willing to pay) that
didn't really fit with GitHub's heavy direct disk I/O. (Most Rails apps almost
entirely read and write from the db, but GitHub does a lot of direct
operations on the git repositories.) But GitHub was still just a Rails app,
not an app for some specially-designed Engine Yard framework. So it was fairly
painless for them to decide to solve the problem in a way that didn't fit with
what Engine Yard would offer them and migrate to their own hardware. It wasn't
easy, especially since they weren't solving an easy problem, but at least they
didn't have to replace their database.

~~~
IgorPartola
The GitHub example is the reason why I never wrote any code against GAE. The
lock-in into their data store is a hard pill to swallow and was an early
warning sign.

I am not familiar with the internals of Heroku and don't know how they solve
the problems I outlined. Maybe someone else can elaborate.

~~~
jackowayed
Heroku runs a process for each application (or multiple). The first one for an
application is free. But just like with GAE, they'll spin it down if it's idle
long enough. So I know that if I go to a site of mine that no one has visited
in months, where I'm not paying for extra instances, it'll take a few seconds
to load. But having to run those free instances doesn't kill them because all
but the first one for an application is paid for at 5 cents/hour, which
presumable is enough margin to cover the free instances, and then some.

But they run normal applications. It started out being any Rack (Ruby web
standard--Rails, Sinatra, etc) app; they've expanded into other languages now,
but it's always some open framework that they're running for you, not
something they own and keep proprietary. They give you a normal Postgres
database. There are a few restrictions that you might not have on your own
hosting (like a read-only filesystem). But you could basically take an app
running on Heroku, install a webserver and Rails, install Postgres, make sure
any config you had was the same, and run it.

Another huge advantage Heroku has is the ability to give other people access
to their datacenters, since they're just running on EC2. So there's lots of
Addon services that can add various pieces of functionality, like hosting a
different database, many of which are only tractable because they're also
hosted on EC2 and thus have very good latency to Heroku servers.

This means that you're not stuck with Postgres. If you think a part of your
data, or all of your data, would be better stored in Mongo or Couch or Redis
or flat files on S3, there are hosted services for that, and you can even
deploy your own solution on EC2 if you'd rather. This leads to nice halfway
solutions where you use Heroku to have super-scalable application servers, and
maybe to manage your main relational db, but then you can tack on other things
where that doesn't fit with your problem. Now, if you're running some of your
own EC2 instances, you're losing some of the "never have to worry about
hosting again" value of Heroku, but at least it's possible. It could be a
temporary solution that keeps you above water while you migrate off of Heroku,
or maybe you decide it really is the best long-term solution.

~~~
IgorPartola
Thanks for the great overview.

It's interesting, since GAE charges $0.08/hour per front end instance, with 28
front end hour instances free per day. However, I once helped someone debug an
application that on average gets one hit a minute, and yet manages to max the
28 free hours and then some. Closer examination showed that GAE was having
these instances hang around for much longer than seemed necessary after they
fulfilled the initial request.

So either Heroku has more magical magic than GAE, or there is some other kind
of efficiency that they are tapping into. One thing I can think of is that
possibly Heroku is more conservative with spinning up extra processes,
preferring longer response times.

~~~
alloca
I may be misunderstanding what you mean but heroku does not spin up or down
any processes, the amount of workers you have is controlled entirely by the
end user and you will be billed by the worker regardless of how much traffic
you're getting. I'd venture that 95% of heroku apps have a single process, and
that 95% of the time those workers are in some sort of suspended state.

~~~
malyk
If you have 1 dyno, heroku spins that dyno down after 20-30 minutes of
inactivity.

Also, there are services out there that will monitor your queue depth and
increase your dyno count for you. Or, you could use the heroku gem to do that
yourself.

------
dextorious
With Amazon AWS you can handle the scaling yourself, when you need it, with
components tailored to your use cases, and better latency.

And with Heroku you can have it taken care for you, following a few simple
rules.

So, why exactly would one use the crippled GAE platform, that constantly
breaks its promises (re: reliability), forces you to code with very little
flexibility (and, no, not every app that needs to automatically and massively
scale "has to be coded exactly like a GAE app anyway"), costs a fortune (and
sometimes an unexpected fortune), and breaks for you as soon as you need a
technology not on offer?

------
salimmadjd
They jacked up our price by 5X, here is a nice graph of it:
[https://plus.google.com/114790424055754975707/posts/eUMhYDVf...](https://plus.google.com/114790424055754975707/posts/eUMhYDVf6i5)

~~~
sologoub
Hate to take it off topic, but your site (<http://www.f8daily.com/>) is
throwing errors: ValueError: Values may not be more than 1000000 bytes in
length; received 1053462 bytes

~~~
troygoode
The image he linked said he was forced to shut down his site, so I'm not sure
that he cares that there is an error.

