

AppEngine 1.4.3 released: new file API, concurrent requests & more - yoda_sl
http://googleappengine.blogspot.com/2011/03/announcing-app-engine-143-release_30.html

======
davepeck
The testbed API is exciting -- testing (python) App Engine apps has always
been thorny business.

It appears that support for integration testing, however, is not part of this
release.

How do you integration test your (python) app engine apps?

Tools like nosegae + webtest give the _theoretical_ promise of integration
testing by driving a WSGI app instance. Unfortunately, the _practice_ for any
interesting App Engine app (especially those that call use_library for Django)
is totally different. Basically, integration testing this way is completely
broken: you end up in import error hell. This appears to be at least in part
related to dev_appserver's (mis)use of Python's `imp` module, as described
here: <https://gist.github.com/883676#file_readme.md>

~~~
bslatkin
That's a nice Gist. Not sure if you should send that to the NoseGAE folks or
what. The dev_appserver follows PEP302
(<http://www.python.org/dev/peps/pep-0302/>) and should properly set
attributes on sub-modules (i.e., assert 'subpackage' in
dir(sys.modules['package']) ). The dev_appserver does not set those attributes
explicitly, but the loader driving the PEP302 hook should.

~~~
davepeck
Interesting, thanks!

I'm afraid I'm having trouble parsing your statement on dev_appserver's
behavior. Could you clarify it? It sounds like you're saying that
dev_appserver both does and doesn't set submodule attributes on parent module
instances. Where _should_ NoseGAE be doing this?

As far as I can tell, no code in the App Engine python SDK ever does this. In
today's SDK, dev_appserver.py line 2256 adds the submodule to the sys.modules
dictionary, but shouldn't there also be a line of code along the lines of
setattr(sys.modules[parent_package_name], sub_package_name, submodule)?

Thanks for your help...

~~~
bslatkin
IIRC (and it's been a few years) the hook goes on sys.meta_path, which uses
the HardenedModulesHook to load modules, but then the Python runtime binds the
variable names and attributes and whatnot before returning import statements
back to application code.

~~~
davepeck
The python runtime does not appear to do this in the specific case of
submodule attributes. Right now, the Python bug against this indicates that it
is considered a documentation bug... but my hunch is that it's an actual bug.
I'll try manually inserting the setattr() line into my local copy of the SDK
and see how integration testing goes then.

What do you use for GAE python integration testing?

------
ww520
The File API mapping to the Blobstore really simplifies blob support. The Blob
datatype in Datastore has the 1M limit. It was kind of cumbersome to use both
the Blobstore and Datastore API to store large objects.

BTW, the Java+Play!+GAE+Objectify really makes webapp development fun and
fast.

~~~
netmau5
Using Java+Play!+GAE+Twig for Sparkmuse and Im very happy with the stack
overall. The new Blobstore API will help with handling image uploads which is
something Play! On GAE fails at horribly (GAEs fault mostly).

------
HowardRoark
GAE Java only handled one request at a time till now?

~~~
teraflop
One request per server instance. The change allows multiple threads to run in
each instance, so now your code has to be threadsafe.

~~~
eneveu
I was wondering about this in October, I guess the Vosao CMS needs to stop
storing state in static fields now:

[http://stackoverflow.com/questions/4028787/is-it-thread-
safe...](http://stackoverflow.com/questions/4028787/is-it-thread-safe-to-
store-data-inside-a-static-field-when-deploying-on-google-ap)

Thilo added answer a few hours ago, pointing out the new threadsafe mode for
GAE. That's why I like StackOverflow.

------
orijing
Not to diminish the value of this release, but are there any improvements to
consistency and availability of the data store?

~~~
mccutchen
Hasn't this mostly been addressed by the High Replication Datastore?

~~~
davepeck
I suspect the answer is "yes" for most users.

My understanding of how it all fits together, please correct if wrong: the
default datastore is (strongly!) consistent and partition tolerant but
sacrifices availability. The new HRD gives you availability and partition
tolerance at the cost, potentially, of consistency. You can get an
intermediate state by wrapping writes in taskqueue tasks, which I do for
writes that can wait but must not fail on the standard datastore.

~~~
snewman
To clarify: the HRD sacrifies some consistency, but the sacrifice is limited.
It still maintains consistency within an "entity group", using the Paxos
algorithm for distributed consensus. Consistency is sacrificed only for
queries that span entity groups. Retrieving individual records, or queries
within an entity group, are strongly consistent even in HRD. The primary
tradeoffs for HRD really are cost (because you're maintaining more replicas of
your data), and write latency (because you're updating multiple locations).
For those not familiar with App Engine terminology, each database is
partitioned, by the developer, into shards called Entity Groups -- see
[http://code.google.com/appengine/docs/python/datastore/entit...](http://code.google.com/appengine/docs/python/datastore/entities.html)
for details.

Some additional details on the HRD tradeoffs are at
<http://code.google.com/appengine/docs/python/datastore/hr/>. The technical
underpinnings are described in
<http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf> (warning: this is
not a light read).

[Full disclosure -- I used to work at Google and was involved in some of this
work.]

~~~
bdonlan
How is consistency sacrificed with HRD? Do you mean that, with HRD, you could
potentially see the effects of some write W to entity group A, write some
value dependent on that write to entity group B, then immediately look at
entity group A and see a pre-W value?

~~~
snewman
No, the scenario you describe could not occur -- it would involve a
consistency violation within entity group A. Once you see the effects of write
W to entity group A, all operations on entity group A (across all servers) are
guaranteed to see W.

The reduced consistency guarantees in HRD involve indexes, because indexes
span entity groups. An example: suppose that you set name=foo in record A1,
which is part of entity group A. If you then retrieve record A1, that's an
operation on entity group A, so you're guaranteed to see name=foo. But if you
perform a global query for all records with name == foo, that's an operation
on an index, which is outside the entity group and so is not mediated by the
entity group's Paxos log. Therefore, your query might not return A1. The index
is "eventually" consistent -- it's guaranteed that, eventually, all indexes
will be updated. But AFAIK there's no guaranteed upper bound on "eventually".
In practice, it should usually be very quick, but only usually.

------
herrherr
I really hoped that they would release a full-text search for the datastore.

Guess I'll have to wait for that and use the improvised solution
([http://billkatz.com/2009/6/Simple-Full-Text-Search-for-
App-E...](http://billkatz.com/2009/6/Simple-Full-Text-Search-for-App-Engine)).

------
yoda_sl
One area that raise my interest is with the File API. I wonder if with the
'file' support it will be possible to get Lucene deployed onto GAE.

------
miloco
Still no simple backup/restore feature. All I ask is for them to implement a
dashboard based backup & restore tool!

