

Google App Engine's Datastore Admin is Terribly Inefficient - marram
http://marram.posterous.com/google-app-engines-datastore-admin-is-terribl

======
peterknego
A GAE datastore delete takes multiple operations because it also updates
indexes:

1 entity delete = 2 Writes + 2 Writes per indexed property value + 1 Write per
composite index value

All from this page:
[http://code.google.com/appengine/docs/billing.html#Billable_...](http://code.google.com/appengine/docs/billing.html#Billable_Resource_Unit_Cost)
And more about why it is so:
<http://code.google.com/appengine/articles/life_of_write.html>

Well, the OP is just another coder who can't read docs, but can write a blog.

~~~
theli0nheart
Regardless of how decent Google's AppEngine documentation is, this is indeed a
bug.

The correct behavior would be to recalculate the indices just once, instead of
reindexing after every single delete operation.

It then becomes

    
    
        2*entities + 2*indexed property values + composite index values
    

operations to delete all entities in the datastore, instead of

    
    
        2*entities + 2*entities*indexed property values + entities*composite index values
    

operations.

~~~
tantalor
To delete _all_ entities should be free. Who cares about indexes? `rm -rf`,
done.

------
stickfigure
This is something that a lot of GAE developers misunderstand: put()ing a
datastore entity is not a single write operation. There are indexes to update
- in your case, lots of them - and updating these indexes can require several
write operations. A simple delete is one write per index but changing a value
can be two operations; one to delete the old index value and one to write the
new one. And since each property has two indexes (ascending and descending),
these numbers are X2.

If you create your own bulk delete method, you will find that it takes exactly
as many write ops as the admin console tool.

You probably have defined more indexes on your entities than you need to - you
will likely be able to make your app cheaper by removing unnecessary indexes.
Managing indexes carefully is a critical part of making apps affordable on
GAE.

~~~
marram
I had "vaccumed" all indices referencing those entities before issuing a
delete. Albeit, there was only once index per purged entity type. So this
would not explain the 20x write operations.

Also, note that the deletions were through the "Datasore Admin" app, which was
recently added. It is different from the classic Datastore Viewer.

~~~
stickfigure
You misunderstand how GAE indexes work.

There are two kinds of indexes:

* multi-property indexes which you configure via datastore-indexes.xml (or yaml). You can remove these by removing them from the xml/yaml and vacuuming.

* single-property indexes, which you decide when you define your data model. You can't vacuum these, and they are defined on a per-entity basis. The only way to make them go away is to re-save the relevant entities without the index defined. Note: multiproperty indexes require single-property indexes on all the properties covered.

These single-property indexes are almost certainly causing your high write op
counts. You really should examine your data model with this new understanding;
by removing unnecessary single-property indexes, you may be able to
dramatically reduce your bill.

------
latchkey
Should Google refund developers when they make an uninformed decision that
costs them money?

One could argue it is a bug in GAE that allows developers to make an expensive
mistake when they don't fully understand how something (fairly complicated)
works.

Someone else could argue that we are all developers and we should know the
costs associated with the systems we are building. There is a real cost
associated with PaaS systems like GAE.

What do you think?

------
cr4zy
I'm pretty sure this uses the map reduce API which has a lot of overhead in
the datastore. In principle map reduce is nice because it could make very
large jobs fast. But since Google engineers don't pay for anything, they
optimized for time, not cost.

And with regards to your script, you can't just delete 3k keys in one request.
If you want I'll send you the script I've adapted for jobs that make large
changes to the datastore.

~~~
sirn
From my experience purging data via MapReduce API use a lot less write quota
than admin interface (but with a bit of instance hour overhead which doesn't
seems like a problem)

I can't remember the exact number but it was about 10 times less than deleting
via admin interface and finish in 5 minutes rather than 3 hours.

------
ch0wn
I ran into the same issue. If you want to purge all data from an app, it's
much cheaper (and sometimes even faster) to start over and create a complete
new app with an empty data store than to use the data store admin and delete
the data from there.

------
ecksor
number of writes also depends on the number of indexes you have on the data

------
Maven911
Can somebody explain the article in laymen terms ? For those not too familiar
with GAE...

~~~
stickfigure
"Blogger misunderstands how indexes work on App Engine."

------
tnuc
There are plenty of things that are wrong with Google App Engine. And there
are plenty of bugs that exist that have cost me money.

Why don't you try filing a bug report/suggesting a warning and send an email
requesting something of a refund. They tend to be a friendly bunch who give
refunds to obvious problems.

Moving to AWS will of course save you lots of money in the longer term,
depending on what your hosting requirements are.

