
Box: Python dictionaries with recursive dot notation access - johnwheeler
https://github.com/cdgriffith/Box
======
jerf
I think every Python programmer implements this at some point.

I think the real reason not to do this is one that as of this writing still
hasn't been mentioned yet, which is that when using dot notation, your key
names have to fit the Python grammar for identifiers, so you can't have a key
"the thing" with a space, or anything else that isn't a Python identifier. So
you can't just say "I can access this dictionary with dots", it has to be "I
can access _some keys_ of this dictionary with dots, but others I have to
access another way". This greatly, _greatly_ reduces its utility. There's a
variety of ways of trying to address this; this package appears to try to
normalize key names, which is very prone to surprising behaviors and makes it
difficult to reason about what will go into what bucket, and also seriously
mitigates the virtue of this entire approach because you, the programmer, must
also run the normalization algorithm yourself in order to use the dot
notation, which rapidly eats away the gains of typing a dot instead of two
brackets and two quote characters. Rewriting keys like that is really icky.

The other problem is that the "dot" namespace, as it were, is as others have
noted here used for method and property resolution, so you also end up with
"dictionaries with random values they can't really contain because they're
being used as method names" or "dictionaries that if you put the wrong key in
them override a method" or something else like that. Also, consider the
ability to subclass these things, making many of the things you might think to
do to hack around this break in subclasses.

It's very superficially tempting but it has a _looooot_ of issues that become
evident over time. This isn't even a complete list.

~~~
cjhanks
There are quite a few nay-votes here, so I'll give a yay vote.

In the past I have had to deal with very complex configuration systems;
several thousands of lines of XML which were supposed to direct many
applications.

And writing:

    
    
        value['this']['that']['the_other']

vs

    
    
        value.this.that.the_other
    

Really starts to matter for code legibility. Of course if you change your XML
configs your Python will break - but it would likely break in any case.

The other case where I have found this applicable is when I have complicated
JSON structures in the form of user input. Typically I augment this style by
using a JSONSchema that ensures attributes _do_ exist and have defaults (or
None).

~~~
smoe
I see the value in the latter for code legibility. My gripe with it is, when
I'm new to a project that uses a dict implementation like that: how do I know
what happens if for example `that` is missing? Does it raise AttributeError,
KeyError or just return None?

Personally I'd prefer a helper functions like

    
    
        deep_get(dict_, dotted_path[, default]) -> value
    

You stil have to check the docs/source what exactly happens, but it is just
one simple function instead of magic methods. And i can keep using plain dicts
everywhere.

EDIT: To clarify a bit. An experienced Python developer should immediately
recognize a call like `deep_get(value,'this.that.the_other')` as something
project/framework specific and not built-in, while `value.this.that.the_other`
is ambiguous.

~~~
CloudYeller
You can get part of the way there with operator.attrgetter:

NT = collections.namedtuple('NT', ['x', 'y'])

nt1 = NT(1, 2)

nt2 = NT(nt1, 'asdf')

assert nt2.x.y == 2

get_two = operator.attrgetter('x.y')

assert get_two(nt2) == 2

If you wrap the operator.attrgetter in a try/catch, you can force a default
too: [https://pastebin.com/bFxGr22E](https://pastebin.com/bFxGr22E)

~~~
smoe
I never thought much about best implementation because I only used it i think
once in ten years. But afaik I used something like this, that does the job and
is easy to read

    
    
      _empty = object()
      def deep_get(dct, dotted_path, default=_empty):
          for key in dotted_path.split('.'):
            try:
              dct = dct[key]
            except KeyError:
              if default is _empty:
                raise
              return default
          return dct

------
tooker
It used to irritate me having to use alternative grammar for accessing values
in a mapping vs accessing object attributes. However this is an area where I
learned to appreciate Python's hardline on consistency. Guido in particular
has been responsible for keeping the language predictable and concise by
disallowing patterns like this in the language itself. So on one hand it's
beautiful that it's so easy to implement things like this (which I am guilty
of doing too), but on the other it is a perversion of the intentional
distinction between attributes and mappings.

I've since learned to love the distinction between object attributes and items
in a mapping and happily implement ['this style'] accessors without complaint
now.

That being said, if the library just did difflib.get_close_matches() on the
key lookup that would be neat for some use cases where you want fuzzy keep
lookup.

~~~
fhood
Honestly Python's hardline on consistency is very nearly my favorite thing
about the language.

------
deathanatos
This feels a lot like Bunch, and a lot of private reimplementations I've seen
elsewhere.

I dislike these structures. Yes, Python lets you pull this trick. Everything,
yes, can be represented as a dict, a list, or one of the primitive
str/int/etc. types. But I find it's a lot clearer in the long run (and even in
the short run), if you have a collection of heterogenous attributes, to define
a class for them. Leave dicts (and the subscript notation) for homogenous
collections of k/v pairs.

A class gives you the benefit of a type: you get a name, so you can recognize
this bag-of-attributes from other, different bags-of-attributes, because it's
been given a name. A class also — usually — gets you a list of attributes, and
_hopefully_ documentation about what those attribute's types and expected
values are.

If you don't like typing (on a keyboard), the attrs package makes things
easier[1]; it has the advantage, however, that you get a real class at the
end.

The only place I've seen something like Box or Bunch work well is in config
files, and even then, only at the uppermost layers of the config (some of the
leaves, esp. when you start having a "list of X" in a config — X needs a
type).

Python lets you do magic, but with great power and all. IMO, but this is one
of those times: Explicit is better than implicit. Simple is better than
complex.

[1]:
[https://pypi.python.org/pypi/attrs/16.0.0](https://pypi.python.org/pypi/attrs/16.0.0)

~~~
keeganpoppen
thanks for the rec on attrs-- cool stuff!

------
mazatta
I've worked on codebases that use a similar thing, and I have yet to see one
where this wasn't a mistake.

Most of the time, this probably gives you exactly what you want, but then
there are times where you discover bugs in production because data you assumed
is your custom class is a plain old dict, and now you're raising
AttributeError all over the place. Another wart is if you are unfortunate
enough to have keys that match the name of one of dict's methods, then you
have to resort to instance['items'], which defeats the purpose of using this
in the first place.

This is a fun trick, but if someone one my team tried to introduce this, it
won't make it through code review.

~~~
fhood
I agree, but when I saw this, my first assumption was that it was for personal
projects primarily. I very rarely see libraries that change syntax like this
in large scale or long term codebases.

------
fuzzythinker
For those who just want a simple a.b instead of a['b'], use this:

    
    
      class Obj():
        def __init__(self, d):
            self.__dict__ = d
    
      d = Obj({
        'a': 1,
        'b': 2,
      })
      print(d.a)

~~~
theptip
These days (Python >= 3.3) SimpleNamespace is a more concise way of doing
this:

    
    
       from types import SimpleNamespace
    
       d = SimpleNamespace(a=1, b=2)
       print(d.a)

~~~
gshulegaard
Haha neat, that means this works very cleanly as a conversion mechanism:

    
    
        my_dict = {
            'a': 1,
            'b': 2
        }
    
        d = SimpleNamespace(**my_dict)
        print(d.a)

------
falcolas
Seems like a lot of work and magic when compared to, say:

    
    
        class Box(UserDict):
            def __getattr__(self, key):
                return self.__getitem__(key)
            def __setattr__(self, key, value):
                return self.__setitem__(key, value)
    

I know a few folks who prefer this method of access, so I can't naysay against
it too much, but personally I just prefer plain dictionaries.

~~~
masklinn
> Box(UserDict):

There's no reason to use UserDict, just extend `dict` directly, or implement
`MutableMapping` instead. UserDict hasn't been useful since the types/class
unification back in… Python 2.3 I think?

------
johnbrodie
In my experience, this always seems like a cool idea until you use it with
keys that an end-user can define. Then you get to watch your code blow up when
they override some other function you've defined on the class.

~~~
wybiral
My thoughts exactly. Wait until someone adds a '__call__', '__iter__',
'__del__' key...

------
coconutrandom
So Django Templates also use dot notation lookups for dict, lists, and
objects[0]

    
    
      Dictionary lookup, attribute lookup and list-index lookups are implemented with a dot notation:
    
      {{ my_dict.key }}
      {{ my_object.attribute }}
      {{ my_list.0 }}
      If a variable resolves to a callable, the 
      template system will call it with no 
      arguments and use its result instead of the callable.
    

Which leads to some interesting and confusing errors if you start iterating
over `.items` and you get a callable and not the list you expect.

    
    
      In [17]: a = {"a": 1, "items": {"b": {"c": {}}}}
        ...: a_box = Box(a)
        ...: a_box
        ...: 
      Out[17]: <Box: {'a': 1, 'items': {'b': {'c': {}}}}>
    
      In [18]: a_box
      Out[18]: <Box: {'a': 1, 'items': {'b': {'c': {}}}}>
    
      In [19]: a_box.items
      Out[19]: <function items>
    
      In [20]: a_box.a
      Out[20]: 1
    
    

[0]
[https://docs.djangoproject.com/en/1.11/topics/templates/#var...](https://docs.djangoproject.com/en/1.11/topics/templates/#variables)

EDIT: This came up because our JSON commonly uses `items` as a key for a list
of items, which I expect to be at `a_dict['items']`, and it has nothing to do
with python's `a_dict.items`.

------
an_hn_reader
I like using NamedTuple for dot notation access, it's a good middle ground
between dicts and custom classes.

Pandas must use something similar under the hood to provide dot notation
access to columns. I wish h5py did the same for hdf5 objects. In py3, I find
myself needing to type list(X.items()) and then list(X['Y'].items()) and so on
when I'm exploring a new dataset... fairly awkward for interactive use.

~~~
mythrwy
Thumbs up for namedtuple. It is a good middle ground.

------
jredwards
This kind of thing has been implemented a lot of times by a lot of people. It
starts with someone who just thinks that addressing deeply nested dicts is
ugly, and ends with people trying to tack on all kinds of features to justify
their modification of the basic grammar.

Probably not a huge issue if it's an isolated use-case. Not something I would
want to see everywhere in my codebase.

------
simonh
What happens if the dictionary has the keys 'John Doe' and 'John_Doe'?

Can you use 'self' as a key?

Ive used a class to provide this kind of dot notation fererencing of
hierarchical data. If you could enclose the keys in quotes I might feel better
about it but that's probably not possible.

~~~
joshuamorton
>Can you use 'self' as a key?

Without looking at the code: yes. `self` in python is just convention.

------
grantmc09
How is this different from addict?

[https://github.com/mewwts/addict](https://github.com/mewwts/addict)

~~~
johnwheeler
I wonder if addict convert keys like "Funky Foobar Key!" into
Funky_Foobar_Key? The README seems to use keys that cleanly convert to python
dict keys, so it's not clear.

------
ericfrederich
I got excited by the heading thinking that there was a change to the language
and we could refer to the dictionary itself in dictionary comprehension.

Just yesterday I commented that if generators could refer to themselves in
generator comprehension (recursion) I could express some algorithm as a single
expression.

[http://stackoverflow.com/a/41617394/180464](http://stackoverflow.com/a/41617394/180464)

... anyway, this just turns out to be what looks like an "addict" clone.

~~~
chris_griffith
Something like that would be handy. Just to clarify the addict mention.

It works similarity to addict, but does have important distinctions and IMO
seniority (was in my 'reusables' pypi project named 'Namespace' before addict
existed. Just finally spun it into it's own project).

Biggest differences:

* Box will convert items added to the object after creation, addict does not

* addict only acts as a defaultdict, Box can act as either regular or defaultdict

* Box updates it’s __dir__ so that attributes (keys) are tab completed in stuff like IPython

* Box repr clearly shows it is an object

------
thesmallestcat
A single class that:

* De/serializes JSON and YAML

* De/mangles, de/encodes keys

* Provides automatic, expensive hashcode

* Blacklists/transforms a bunch of likely keys because they conflict with reserved words

* Overlays attrs (__box_heritage)

* All in pretty complex code that obfuscates what you're really doing (especially to a maintainer)

For the ability to avoid importing json/PyYAML and use clear key lookups? The
author must really like JavaScript syntax, or something, because I'm not
seeing the point in this layer at all. It's the sort of library that you
discover your inherited codebase is using, and just say, "fuckfuckfuck...",
because it implies that the author cares more about pushing round pegs into
square holes than writing the freaking application code; it reeks of
inexperience.

~~~
yeukhon
Why so negative here?

I like Python a lot, and I don't write much Javascript, but one thing I wish I
could do in Python is the dot notation from a dictionary. I sometimes used
namedtuple as a cheap (but "immutable") class, so I can simply use dot
notation when I am passing my object around my functions, instead of always
stuffing the data into a dictionary.

~~~
crdoconnor
"I like Python a lot, and I don't write much Javascript, but one thing I wish
I could do in Python is the dot notation from a dictionary"

Why though?

~~~
yeukhon
Because it saves me a lot of chars.

foo['name']['attr1']['attr2']['morefuckingattr'] vs
foo.name.atr1.attr2.morefuckingattr

More of a personal preference.

~~~
crdoconnor
True, but it saves a relatively small number of chars (~12) and does so at the
same time as:

A) Giving off a signal that you're dealing with an object rather than a dict.

B) Making it cumbersome to swap out some of those selectors with variables.

C) Making it difficult to deal with the attribute not being there (with a dict
you can say .get("attr2", {}) and it returns a default.

~~~
yeukhon
I understand. I have another motivation which is getting JSON and wanting a
OOP object to mainpluate with. Take AWS' boto3 response. Well documented but
the nested response structure gives me a chill and I wish there's a direct
object out of the JSON. I instead had to write my own class for the
conversion. If I could mainpluate dict like mainpluatig attribute in a class
then I could just write my function's contract "input is a response object
from X APi" instead of "dict of this form."

Quite annoying and I know some people considered what I wish is a bad
practice.

------
pkghost
Though many reasonable rebuttals have been mentioned, the one case in which I
find this pattern indispensable is when creating mock-instances for a variety
of classes that need to be interleaved amongst genuine instances (say, ORM
objects, if you're unlucky enough to have those in your life) so that code
operating on collections of the latter doesn't get littered with special cases
for the former.

------
nathancahill
This looks really nice. I might be reading the DefaultBox docs wrong, but does
it support this:

    
    
        box.might_exist.might_exist.might_exist.desired_key
    

where desired_key would return a default value if one of the keys doesn't
exist?

I find the bulk of my dict code is checking for keys before access, or using
.get('', default) recursively. Gets really hairy for deeply nested dicts.

~~~
bgschiller
I also found myself doing that pretty often, so I wrote this:

    
    
        def nget(d, *ks, **kwargs):
            for k in ks:
                d = d.get(k)
                if d is None:
                    return kwargs.get('default')
            return d
    
       >>> d = {'a': {'b': {'c': 12 }}}
       >>> nget(d, 'a', 'd', 'c', default='Not Found!')
       'Not Found!'
       >>> nget(d, 'a', 'b', 'c')
       12

~~~
staz
Your implementation suffer from a bug, if the final item is None it will
return the default instead of None.

A better implementation would be:

    
    
        def rget(d, *ks, **kwargs):
            for k in ks:
                if k not in d:
                    return kwargs.get('default')
                d = d[k]
            return d

------
tjpaudio
How is the performance on this for very large dictionaries vs. standard python
dictionary lookups?

~~~
johnwheeler
A really good question. I haven't looked much at the code but I'd imagine
still O(1) because it's just converting the keys into a different format.

~~~
bmh100
But that says nothing about whether there is a constant penalty being applied.

~~~
johnwheeler
certainly

------
fintanh
Another similar implementation I've used is:
[http://www.stat.washington.edu/~hoytak/code/treedict/](http://www.stat.washington.edu/~hoytak/code/treedict/)

------
mmerickel
If you want readonly access using dot notation instead of keys then I have a
simple gist [1] I use that works well for tracking nested dicts - even nested
inside lists. It's a good way to pass a dict into a function expecting an
object in some scenarios.

[1]
[https://gist.github.com/mmerickel/ff4c6faf867d72c1f19c](https://gist.github.com/mmerickel/ff4c6faf867d72c1f19c)

------
rcarmo
Neat. I've bem using a simpler approach for a few years now:

[https://github.com/rcarmo/python-
utils/blob/master/core.py#L...](https://github.com/rcarmo/python-
utils/blob/master/core.py#L30)

...but this takes that notion much further.

~~~
ericfrederich
Checkout addict

[https://github.com/mewwts/addict](https://github.com/mewwts/addict)

------
StavrosK
I wrote a similar library that is more restricted and "obvious" in what it
does:

[https://github.com/skorokithakis/jsane/](https://github.com/skorokithakis/jsane/)

------
apisarek
How is that different from easydict? \-
[https://pypi.python.org/pypi/easydict/](https://pypi.python.org/pypi/easydict/)

~~~
chris_griffith
First things off top of head: Box does recursion through lists, new dicts (and
lists of dicts) added will also be dot notation without manually converting,
Box can be a recursive default dict, Box will allow keys with spaces or other
issues to be accessible as attributes, and can be frozen, and has YAML and
JSON functions built-in.

------
sampwing
If all you want is the dot and indexing syntactic sugar:

    
    
      class Box(dict):
        def __init__(self, **kwargs)
          super(Box, self).__init__(kwargs)
          self.__dict__.update(kwargs)

------
askvictor
Cute, though the use of x to prepend to integer keys is potentially confusing
with the convention of x as a hexadecimal indicator.

~~~
chris_griffith
That is a very good point, never thought of that. I will have to add a feature
to make that configurable. Thanks for the heads up!

------
safek
I never understood why Python differentiates between key access and attribute
access. Dig down one layer, and they're the same thing anyway: a.b is just
a.__dict__['b'].

For all its faults, JavaScript has some pretty sensible defaults. Accessing a
missing key returns what is basically its equivalent of nil. Extra parameters
to function calls are simply ignored. Unsupplied parameters default to nil.

~~~
zimablue
I can think of a couple of reasons: safety, attr access by .<thing> means you
knew what you wanted at write-time, so it's reasonable to assume that you're
gonna want to know if it doesn't exist, and not just want to roll the dice on
passing a null along. language concision, it seems counterintuitive but attrs
vs items gives you two options for one class: . access and [] access, which
it's very reasonable give different results, so your default mode of writing
classes has a "I think this should be here/nondynamic" mode of access and a
'wonder if this is here'/dynamic mode of access. I think that there are
benefits to that. If you don't eventually do some dynamic attribute access I
think you throw away an advantage of dynamic languages, but it's nice to me
that there's this by default soft line between less and more dynamic code

------
jps359
Useful. My only concern would be performance.

------
burnbabyburn
no thanks.

------
artursapek
This thread reminds me why I don't use Python anymore

