

Stop passing naked dictionaries around: Introducing DStruct for Python - dorkitude
https://github.com/dorkitude/dstruct

======
forgotusername
I've also written a similar-ish thing ( <http://py-
datakit.googlecode.com/hg/README.html> ), which was more focused on providing
useful encodings.

The problem with these styles of library is that their modularity is
relatively poor. Once you start using attribute access for simple data, or
relying on extended features of the container, all your callers, and future
reuses of the code are forced to also use that style of access.

While it is easy to mock up a container sporting attributes that can be
substituted in place of the instance type, e.g.:

    
    
        class Bag(dict):
            __setattribute__ = dict.__setitem__
            __getattr__ = dict.__getitem__
    

Another problem will rear its ugly head: namespace collisions, since the field
names of 'user data' with attribute names of the container type cannot
conflict with any method you might (now, or most likely in the future, once
you've already dug the hole) want to add to the instance type.

For these reasons, mantra 3 of 'import this' comes to mind:

    
    
        Simple is better than complex.
    

Why not just use a dict, or something that looks almost identical to a dict,
to begin with? Caller code is not polluted with your types, interoperability
with things that expect dicts is ensured, and future extension is imminently
possible since 'user fields' cannot conflict with methods you may wish to add.

I've recently picked up DataKit work again. The local development version has
a dict-like object whose only extension is to ensure the type of assigned
items matches that of the schema type. This replaces an old 'Entity' class
that required attribute access (essentially chosen only because it was
"prettier").

A hidden benefit of passing a simple type is easier testability: from a unit
test, initializing a class with a dict literal is much easier than needing
some attribute-like stub class that must be declared and initialized first (to
avoid that, you're forced to test your container class at the same time as
whatever caller code is currently under test, which is a broken habit).

As if it wasn't clear: please, please continue passing naked dictionaries
around, they're a universal container protocol evolved over 20+ years that
wasn't in need of fixing.

~~~
wladimir
I think every Python developer worth their salt implemented a class like this
at least once in their career (mine was even called DStruct :-) ). And
probably at a certain later point decided it was not that good an idea.

 _(essentially chosen only because it was "prettier")._

That was my reason as well. It is easier/prettier to write a.b.c. instead of
['a']['b']['c']. But that's all.

~~~
dorkitude
(responded in <http://news.ycombinator.com/item?id=3274914>)

------
gbog
As said in another comment, this seem to be a bit redundant with the readily
available namedtuple.

Moreover, it has dependencies with dorkitude_utils and python_memoize, both of
them I don't have in my python install.

~~~
dorkitude
Namedtuple is a very useful module, but it doesn't enforce typing.
Furthermore, it doesn't expose extensible classes, unless you want to write
the sort of code you see when you pass `verbose=True` into the namedtuple
constructor (<http://drktd.com/C4QF>). So while a game's BaseBuilding may
require the "x", "y", and "building_type_id" fields, your FarmVille Plant may
add "planted_time" in it's schema, plus a "harvestable_after_time" computed
property.

In the abstract, tuples simply arent suites to all problems (any more than are
dicts): order doesn't always matter, and immutability _certainly_ is not
always desired.

(As for those dependencies, they're submoduled in the repo: try `git clone
--recursive`)

------
RyanMcGreal
If that's what you want, what's wrong with:

    
    
        class DictObj(dict):
            __getattr__= dict.__getitem__
            __setattr__= dict.__setitem__
            __delattr__= dict.__delitem__

------
lucian1900
Yuck. Sorry, but this is worse than dicts or namedtuples in every way.

~~~
dorkitude
Trolls are worse than collaborators in every way.

(Only one of our two statements is true.)

~~~
jrockway
He's not trolling. He's just pointing out that Python is not Java, and the
Python culture is not particularly in favor of using instances of user-defined
classes where built-in types will suffice.

Similarly, you almost never see type checking; code typically just calls the
method they expect to be there and hopes for the best. If you come from Java
culture, this Is Not Cool, but it's Python, not Java. (I personally prefer
Perl's roles for duck typing; I get type checking, but no implementation
requirements.)

Anyway, I'm sure you worked hard on this, but it's one of those libraries that
feels like it doesn't belong, like Twisted. You can ignore the Python
community and be successful, but don't expect unlimited praise for doing so.

~~~
dorkitude
I agree with you 100% that it would be foolish to "use instances of user-
defined classes where built-in types will suffice."

If we disagree, it's because I don't see "where built-in types will suffice"
and immediately think "always", as perhaps you do.

This miscommunication is largely my fault. I shouldn't have marked the
repository or its corresponding HN post with "Stop passing naked dictionaries
around!” I must admit, my marketing impulse got the best of me there. Naked
dictionaries with untyped values are obviously one of Python's strengths, and
the very _option_ of using them instead of a heavier system is a big part of
why dynamic languages are so important, both to me and to the craft as a
whole. It would be a more accurate reflection of my perspective if I had
rather said "Stop _always_ passing naked dictionaries around, especially in
cases where you have to write schema validators for them!"

Statically typed languages are a drag because you've got type checking always,
and you don't always need it -- in fact, as any dynamic programmer knows, you
almost never need it.[1]

But sometimes you do. It's a fact that data schemas are useful at times. In
instances where your data is in fact naturally schema’d,[2] I’ll invoke some
PEP 20 dogma of my own: “Explicit is better than implicit.” There are
situations in which you _could_ catch a data problem at the time you’re
storing it, but with dictionaries you only catch it later, once it’s being
used (hopefully your test coverage surfaces it, or it might not even happen
until you’ve deployed to production).[3] Recognizing this, you may then write
a one-off schema validator for each of these situations, but that puts the
onus on every developer to call that validator every time he or she is
populating a dictionary of that classification (not very DRY).

In the six months or so since I wrote the first version, I’ve found DStruct to
be extremely useful for lots of scenarios, like remote resources (I just wrote
a use case here: <https://gist.github.com/1398138>), mock objects (I just
wrote a use case here: <https://gist.github.com/36118077f75eaed8c731>),
future-proofing, backwards-compatibility, and pretty much any time you’re
doing outside-in development (e.g. writing a view or serializer first, and
having it build up its own mock data objects, to be replaced by model
instances once the model layer is complete).

In my heavily content-driven game framework, the game designer imports content
via CSV.[4] With DStruct and a handful of code changes (amounting essentially
to `return Decoration(input_dict)` or `return LevelMilestone(input_dict)`,
instead of just `return input_dict`), we have been able to save enormous human
time, because the server now detects schema-related content problems at
content import time, rather than having to wait for the full content test
suite to run (and hopefully have enough coverage that it can detect a missing
or malformed attribute!).

I put it DStruct out there because it simply _kept on saving_ me time, code,
and technical risk in a variety of projects. Everyone is free to take it or
leave it, but my preference is that they fully understand it before doing so.

-dorkitude

PS: I don't come from Java; I'm a Pythonist born and raised (even way back at
IMSA, where I believe we met once or twice ;) I think in terms of
metaprogramming, multiple inheritance, dynamic typing, and runtime mutation. I
understand the kneejerk "oh god another Java bureaucrat" reaction, because
it's a reaction I myself feel when I see certain patterns being hamfistedly
jammed into Python code. DStruct is not such an occurrence.

[1] The debate about DStruct is not a debate dynamic vs. static typing in
language design (we’re all pythonists here): it's a debate about the degree of
the qualifier "almost" in this sentence.

[2] Sometimes you’re mocking something that’s schema’d; sometimes the
interface is beyond your control; sometimes you’re retrofitting a class that
used to be ORM and now is not; sometimes you’re importing CSV’s; sometimes
you’re writing a feed aggregator; the list of naturally schema’d data sources
goes on and on.

[3] This is particularly true in Ruby, where they always seem to use Hash-as-
argument in an attempt to get the future-proofing power we get for free in
Python via keyword args.

[4] And not just store items, settings, and game objects, but even the rules
and economic structure! The game framework referenced here is the best work of
my career to date. Unfortunately it is a closed source project, but suffice it
to say the concepts baked into DStruct (and a handful of other killer
libraries I will release in the coming year) had a significant positive effect
in terms of code bloat, development time, quality, and risk.

------
orenmazor
passing around naked things in python is one of my favourite things about
python.

I have c# for when I want to pass around well dressed and proper things
around…

~~~
mgrouchy
Agreed, this solves a problem I don't have(at least currently).

------
zacharyvoase
Here's something I implemented a while ago:
<http://zacharyvoase.github.com/urecord/>

There are benefits to using tuples over objects/dictionaries: 1) Lower memory
consumption 2) Faster lookups 3) Can be used as dictionary keys, set members,
anywhere a hashable (immutable) object is required.

~~~
pthatcherg
That's funny. I implemented almost the exact same thing a few years ago,
although mine is arguably more complete :)

<https://github.com/pthatcher/pyrec>

Here's my original post about it:

[http://www.valuedlessons.com/2009/10/introducing-pyrec-
cure-...](http://www.valuedlessons.com/2009/10/introducing-pyrec-cure-to-bane-
of-init.html)

And here's my analysis of how much less memory it uses:

<http://www.valuedlessons.com/2008/10/blog-post.html>

------
tantalor
Why not just use a recursive default dict?

<http://kentsjohnson.com/kk/00013.html>

~~~
tantalor
Here is a simple implementation that I swear by,

[https://github.com/tantalor/megaera/blob/master/megaera/recu...](https://github.com/tantalor/megaera/blob/master/megaera/recursivedefaultdict.py)

------
btubbs
See also <https://github.com/j2labs/dictshield>

------
joshu
Heh, it's like a Model for dicts.

Can we now have a query mechanism for lists of these?

------
zzzeek
Compare to Colander:
<http://docs.pylonsproject.org/projects/colander/en/latest/>

------
marshallp
What does this offer over just using an object

~~~
andrewcooke
it has type checking.

pytyp has something similar -
[http://www.acooke.org/pytyp/pytyp.spec.record.html#module-
py...](http://www.acooke.org/pytyp/pytyp.spec.record.html#module-
pytyp.spec.record) \- although that is just part of a wider framework for
types. pytyp will also do things like use type annotations to map from json to
python objects, for example, or check types on function args, or work with
isinstance():

    
    
      >>> isinstance([1,2,None,4], Sequence(Option(int)))
      True
    

[although explicit types are not really pythonic (no argument there!), pytyp
tries, within the whole "evil" concept of "adding types to python", to be as
"true to the language" as possible. there's a long explanation at
<http://www.acooke.org/pytyp.pdf>]

~~~
marshallp
Why not move on to using ocaml (or shudder haskell) if you need types, adding
types python seems like a pretty big undertaking

~~~
dorkitude
Explicit types are neither pythonic nor unpythonic: _always using_ explicit
types is unpythonic, but _explicit is better than implicit_ is an important
piece of 'import this'.

I released DStruct because it has been insanely useful in my work. Take it or
leave it, but do so pragmatically: dogma is a terrible way to make decisions.

~~~
marshallp
I'm not being dogmatic. I'm a proponent of dynamic/typing and oo, they all fit
perfectly in their use cases. I'm just not fan of mixing things up, the right
tool for the right job. If you need types in python, you're writing a
significant program in which case it's better to switch to java or better yet
ocaml/haskell.

~~~
dorkitude
So you're saying one shouldn't write significant programs in Python?

With that I must disagree :)

~~~
marshallp
Well, either programming language researchers such as simon peyton jones are
wrong, or developers such as google employee working on appengine, guido van
rossum is wrong.

~~~
dorkitude
This is a subject near and dear to my heart, and I'm sure we could have a
lovely chat about this -- got Skype?

~~~
marshallp
i've added my email to my profile. however, i'm an antisocial recluse and
never skype/im. there's lambda-the-ultimate if you're into programming
languages, far more knowledgeable people than me hang out there.

