
One-line Tree In Python - kilovoltaire
https://gist.github.com/2012250
======
abecedarius
Very nice! I've been a bit leery though of defaultdict since it contributed to
two different bugs in <http://norvig.com/spell-correct.html> \-- in less than
a page of code, by a first-rate programmer, which was read by many thousands
of others for years before anyone noticed the second bug. I'm not saying don't
use this, I just have a little warning light blink on when I see 'defaultdict'
anywhere.

~~~
mikeash
Both of those errors seem like fundamental category errors. He's using a
dictionary construct which effectively makes it so that _every_ key has a
value, but then relies on dictionary membership when testing. This happens to
work, but it makes little sense. (One could argue that 'foo in bar' should
_always_ be True if bar is a defaultdict.) Seems to me like he should have
checked the value of the word rather than using 'in'.

You make a good point about this being written by a known intelligent person
and seen by many others who didn't see the bugs, but I think it's ultimately a
lesson in thinking about semantics rather than behavior. This code relies too
much on the _behavior_ of defaultdict rather than just what such a thing
_means_. As long as what it means is kept in mind I think it should be safe.

But that's just, like, my opinion, man.

~~~
abecedarius
At some level it was sloppy thinking, yes; _most_ bugs have some obvious-in-
retrospect reason they were stupid. You may be right about how to think about
this one.

I've seen other defaultdict bugs -- this is just the case with the highest
eyeball-count-times-attention product I can point to.

~~~
mikeash
_If_ this bug was simply due to having the wrong concept of defaultdict, and
_if_ the others you saw were the same, _then_ you can avoid the bugs forever
by just having the right concept. Lots of ifs there, of course.

------
bryanh
This is actually quite elegant and useful. I've done similar things time and
time again with code like so:

    
    
        defaultdict(lambda: defaultdict(int))
    

That allows me to organically build dictionaries (mainly for stat building)
using operands like +=.

~~~
joshu
Also check out collections.Counter.

    
    
      x = collections.defaultdict(collections.Counter)
      x['foo']['bar'] += 1
      X['foo'].most_common(10)
     Etc.

~~~
bryanh
Excellent! I didn't know about that one!

------
rauljara
Neat. Works in Ruby, too.

    
    
      def tree; Hash.new {|h, k| h[k] = tree }; end
      t = tree
      t[:foo][:bar] = "foobar"  # => {:foo=>{:bar=>"foobar"}}
    

Probably more idiomatic to do it as a class, though.

    
    
      class Tree < Hash
        def initialize
          super {|h,k|  h[k] = Tree.new }
        end
      end
    

Btw. How do you post nicely formatted code?

edit - Thanks!

~~~
ajross
It works in Ruby because it works in Perl. :)

A little known fact is that it works in C++ STL too as long as the objects in
your containers have default constructors that make sense. So a map<int,
map<int, value_type> > does what you expect when you try: my_map[12][3] =
some_value;

~~~
scott_s
I was so used to the semantics of std::map that I was surprised it _didn't_
work that way in Python.

~~~
ajross
Autovivification through a rvalue (for clarity: the magic here is that
x[a][b]=c works to create x[a] despite that index never seeing an assignment)
has a long history in perl. The C++ feature is more limited in scope and AFAIK
accidental -- it works because the automatic construction is a side effect of
__reading __the value, which is often considered an undesired feature of STL
(you can't use an expression that contains x[a] to test if x contains a, like
you can in many languages).

And you may be less cynical, but I've had to explain this sort of thing to a
huge number of C++ programmers, most of whom write STL code that looks like
Java, with explicit initialization of all the intermediate containers.

~~~
scott_s
Automatic construction is not really a "side-effect" of reading a value in
std::map, it's an explicit design choice. They want users to be able to write
code like this:

    
    
      std::map<std::string, int> dict;
      dict["one"] = 1;
      dict["two"] = 2;
      dict["three"] = 3;
    

If using operator[] on a key did _not_ implicitly mean "create an entry for
this key if it is not there," the above would not work. Rather, you would have
to write the above as:

    
    
      std::map<std::string, int> dict;
      dict.insert(std::make_pair("one", 1));
      dict.insert(std::make_pair("two", 2));
      dict.insert(std::make_pair("three", 3));
    

The reason being that std::map is purely a library, not a part of the
language. It does not "know" that operator[] is actually being used as a part
of an assignment.

~~~
kragen
Strictly speaking, it's a side effect, because it mutates an existing object.
It's also a _surprising and bug-prone_ side effect (as mentioned upthread;
also, Darius Bacon found a bug or two in Norvig's spell-corrector due to the
corresponding behavior of defaultdict in Python) and an _avoidable_ side
effect, although it's not avoidable in C++. In Ruby, Python, Smalltalk, Common
Lisp, Lua, and many other languages, you can make this (or its weird-syntax
equivalent) work:

    
    
        dict["one"] = 1;
    

without causing this to add "one" to the dictionary:

    
    
        if (dict["one"]) { /* ... */ }
    

even for a "dict" that is purely a library, not a part of the language.

The underlying problem is that C++ calls the _same_ operator[] in both lvalue
and rvalue contexts. Which is pretty odd, really, when you think about it; it
certainly doesn't generate the same code for indexing into native arrays in
lvalue and rvalue contexts. All the other languages distinguish between the
lvalue and rvalue contexts. Ruby calls [] or []=, Python calls __getitem__ or
__setitem__; Smalltalk has #at: and #at:put:; Common Lisp doesn't have
separate names for the two things, but one of them is defined with (defun foo
(dict key) ...) and the other is defined with (defun (setf foo) (dict key)
...); in Lua, these are the metamethods __index and __newindex.

So it's sort of true that it's not std::map's fault, but C++'s. But not
really. Stroustrup fixed several things about the way templates worked to make
STL work better; he should have fixed this one too.

------
anonymoushn
This is pretty good. I prefer the name I used when I needed it, though.

    
    
      dicts_all_the_way_down = lambda:defaultdict(dicts_all_the_way_down)

------
groovy2shoes
A quick Lua version:

    
    
        do
          local mt = {
            __index = function(t, k)
              t[k] = tree()
              return t[k]
            end
          }
        
          function tree()
            return setmetatable({}, mt)
          end
        end
    
        t = tree()
        t.foo.bar = 'foobar'

------
kd0amg
If only Haskell would let me do

    
    
      type MTree t = Map t (MTree t)
    
    

This version works though.

    
    
      data Tree t = Leaf | Node [(t, Tree t)]

~~~
scooty-puff
The problem is that the type would be expanded indefinitely:

    
    
      Map t (Map t (Map t ...
    

Try:

    
    
      newtype MTree t = MTree (Map t (MTree t))

------
afhof
I would be interest to see this using the __getattr__ rather than __getitem__,
so that this is also possible:

    
    
      users = tree()
      users.harold.username = 'hrldcpr'
      users.handler.username = 'matthandlersux'

~~~
albertzeyer
Well, it's as trivial as you say. Just set __getattr__ = __getitem__ in your
tree.

(Note though that there is/was a small bug(?) in CPython:
<http://bugs.python.org/issue14658>)

~~~
afhof
Seems like it may take a little more than one line, this errors for me:

    
    
      >>> a = tree()
      >>> a.__getattr__ = a.__getitem__

~~~
albertzeyer
Yea, sure it becomes more than one line (or at least I don't know a good one-
line-way) but I wouldn't count that as an issue.

It errors for you because it is a defaultdict instance and doesn't allow
attribute overwrites.

This should work:

    
    
        class tree(defaultdict):
            def __init__(self): defaultdict.__init__(self, tree)
            __getattr__ = defaultdict.__getitem__
            __setattr__ = defaultdict.__setitem__
    

Edit: It doesn't work with the simple assignment because of the mentioned bug.
Ofc the fix is trivial. See the code from beagle3.

------
rollypolly
Jebus that was simple! I wish I'd thought of it.

------
raymondh
+1 This is a beautiful little recipe.

------
wicknicks
This is a tree with labelled edges, not just a tree. This tree will not hold
multiple values with the same label. I always imagine my trees to have no
labelled edges.

Very cool though!

------
krosaen
related, but not as clever function to help with nested counters (with doctest
showing usage):

    
    
        def incr_nestedctr(d, *keys, **kwargs):
            """
            >>> a = {}
            >>> incr_nestedctr(a, 'a', 'b', 'c', 'd')
            {'a': {'b': {'c': {'d': 1}}}}
            >>> incr_nestedctr(a, 'a', 'b', 'c', 'd')
            {'a': {'b': {'c': {'d': 2}}}}
            >>> incr_nestedctr(a, 'a', 'b', 'c', 'd', delta = -4)
            {'a': {'b': {'c': {'d': -2}}}}
            >>> incr_nestedctr({u'1.0': {u'0': 1, '5': 1}}, '1.0', '5', delta = 2)
            {u'1.0': {u'0': 1, '5': 3}}
            """
            delta = kwargs.get('delta', 1)
            thed = d
            for k in keys[:-1]:
                thed = thed.setdefault(k, {})
            thed.setdefault(keys[-1], 0)
            thed[keys[-1]] += delta
            return d

------
zbuc
In Perl, this is called autovivification:
<https://en.wikipedia.org/wiki/Autovivification>

I've wanted something like that in Python at different times... thanks!

edit: Ha! The Wiki article even has basically the same code:

def hash(): return defaultdict(hash)

~~~
irahul
> I've wanted something like that in Python at different times... thanks!

Python's auto-vivification doens't allow Perl's hap hazard auto-vivification.
Perl allows you to say:

    
    
        my $foo = {};
        $foo->{'blah'}[0]->{'bar'}++;
    

And after this statement, $foo will refer a hash which has the structure as
accessed in the statement.

I don't think this can be done for a generalized case in Python. Whether I
want is a totally different question.

~~~
jrockway
Why is this haphazard? If you don't want reads to change your data structure
shape in a well-defined manner, use 'exists' or make an immutable hash with
Hash::Util.

