Hacker News new | past | comments | ask | show | jobs | submit login
One-line Tree In Python (gist.github.com)
277 points by kilovoltaire on Apr 23, 2012 | hide | past | web | favorite | 36 comments



Very nice! I've been a bit leery though of defaultdict since it contributed to two different bugs in http://norvig.com/spell-correct.html -- in less than a page of code, by a first-rate programmer, which was read by many thousands of others for years before anyone noticed the second bug. I'm not saying don't use this, I just have a little warning light blink on when I see 'defaultdict' anywhere.


Both of those errors seem like fundamental category errors. He's using a dictionary construct which effectively makes it so that every key has a value, but then relies on dictionary membership when testing. This happens to work, but it makes little sense. (One could argue that 'foo in bar' should always be True if bar is a defaultdict.) Seems to me like he should have checked the value of the word rather than using 'in'.

You make a good point about this being written by a known intelligent person and seen by many others who didn't see the bugs, but I think it's ultimately a lesson in thinking about semantics rather than behavior. This code relies too much on the behavior of defaultdict rather than just what such a thing means. As long as what it means is kept in mind I think it should be safe.

But that's just, like, my opinion, man.


At some level it was sloppy thinking, yes; most bugs have some obvious-in-retrospect reason they were stupid. You may be right about how to think about this one.

I've seen other defaultdict bugs -- this is just the case with the highest eyeball-count-times-attention product I can point to.


If this bug was simply due to having the wrong concept of defaultdict, and if the others you saw were the same, then you can avoid the bugs forever by just having the right concept. Lots of ifs there, of course.


This is actually quite elegant and useful. I've done similar things time and time again with code like so:

    defaultdict(lambda: defaultdict(int))
That allows me to organically build dictionaries (mainly for stat building) using operands like +=.


Also check out collections.Counter.

  x = collections.defaultdict(collections.Counter)
  x['foo']['bar'] += 1
  X['foo'].most_common(10)
 Etc.


Excellent! I didn't know about that one!


Neat. Works in Ruby, too.

  def tree; Hash.new {|h, k| h[k] = tree }; end
  t = tree
  t[:foo][:bar] = "foobar"  # => {:foo=>{:bar=>"foobar"}}
Probably more idiomatic to do it as a class, though.

  class Tree < Hash
    def initialize
      super {|h,k|  h[k] = Tree.new }
    end
  end
Btw. How do you post nicely formatted code?

edit - Thanks!


You can use the Y combinator to do this as well, without having to create a class or a top-level method: http://www.eecs.harvard.edu/~cduan/technical/ruby/ycombinato...


Here's the ruby one-liner:

  tree = Hash.new { |h, k| h[k] = Hash.new(&h.default_proc) }
(explanation is here: http://matthew.mceachen.us/blog/multi-value-hashes-in-ruby-1...)


It works in Ruby because it works in Perl. :)

A little known fact is that it works in C++ STL too as long as the objects in your containers have default constructors that make sense. So a map<int, map<int, value_type> > does what you expect when you try: my_map[12][3] = some_value;


I don't think it's a little known fact that std::map works that way, ie. that operator [] inserts if the key doesn't exist (or at least, it's surely not little known among people using C++?).

Also, I don't think you can have an infinitely expanding std::map in the same way, because all the key/value types have to get rolled into the type signature of the top level map. It's not that hard to write something yourself to achieve it, but it's not going to be the one-liner that it is in Python.


I was so used to the semantics of std::map that I was surprised it didn't work that way in Python.


Autovivification through a rvalue (for clarity: the magic here is that x[a][b]=c works to create x[a] despite that index never seeing an assignment) has a long history in perl. The C++ feature is more limited in scope and AFAIK accidental -- it works because the automatic construction is a side effect of reading the value, which is often considered an undesired feature of STL (you can't use an expression that contains x[a] to test if x contains a, like you can in many languages).

And you may be less cynical, but I've had to explain this sort of thing to a huge number of C++ programmers, most of whom write STL code that looks like Java, with explicit initialization of all the intermediate containers.


Automatic construction is not really a "side-effect" of reading a value in std::map, it's an explicit design choice. They want users to be able to write code like this:

  std::map<std::string, int> dict;
  dict["one"] = 1;
  dict["two"] = 2;
  dict["three"] = 3;
If using operator[] on a key did not implicitly mean "create an entry for this key if it is not there," the above would not work. Rather, you would have to write the above as:

  std::map<std::string, int> dict;
  dict.insert(std::make_pair("one", 1));
  dict.insert(std::make_pair("two", 2));
  dict.insert(std::make_pair("three", 3));
The reason being that std::map is purely a library, not a part of the language. It does not "know" that operator[] is actually being used as a part of an assignment.


Strictly speaking, it's a side effect, because it mutates an existing object. It's also a surprising and bug-prone side effect (as mentioned upthread; also, Darius Bacon found a bug or two in Norvig's spell-corrector due to the corresponding behavior of defaultdict in Python) and an avoidable side effect, although it's not avoidable in C++. In Ruby, Python, Smalltalk, Common Lisp, Lua, and many other languages, you can make this (or its weird-syntax equivalent) work:

    dict["one"] = 1;
without causing this to add "one" to the dictionary:

    if (dict["one"]) { /* ... */ }
even for a "dict" that is purely a library, not a part of the language.

The underlying problem is that C++ calls the same operator[] in both lvalue and rvalue contexts. Which is pretty odd, really, when you think about it; it certainly doesn't generate the same code for indexing into native arrays in lvalue and rvalue contexts. All the other languages distinguish between the lvalue and rvalue contexts. Ruby calls [] or []=, Python calls __getitem__ or __setitem__; Smalltalk has #at: and #at:put:; Common Lisp doesn't have separate names for the two things, but one of them is defined with (defun foo (dict key) ...) and the other is defined with (defun (setf foo) (dict key) ...); in Lua, these are the metamethods __index and __newindex.

So it's sort of true that it's not std::map's fault, but C++'s. But not really. Stroustrup fixed several things about the way templates worked to make STL work better; he should have fixed this one too.



Nice. My friend made a fork along those lines as well: https://gist.github.com/2012439

As for code in comments, see: http://news.ycombinator.com/formatdoc


This is pretty good. I prefer the name I used when I needed it, though.

  dicts_all_the_way_down = lambda:defaultdict(dicts_all_the_way_down)


A quick Lua version:

    do
      local mt = {
        __index = function(t, k)
          t[k] = tree()
          return t[k]
        end
      }
    
      function tree()
        return setmetatable({}, mt)
      end
    end

    t = tree()
    t.foo.bar = 'foobar'


If only Haskell would let me do

  type MTree t = Map t (MTree t)

This version works though.

  data Tree t = Leaf | Node [(t, Tree t)]


The problem is that the type would be expanded indefinitely:

  Map t (Map t (Map t ...
Try:

  newtype MTree t = MTree (Map t (MTree t))


I would be interest to see this using the __getattr__ rather than __getitem__, so that this is also possible:

  users = tree()
  users.harold.username = 'hrldcpr'
  users.handler.username = 'matthandlersux'


    class attrdict(defaultdict):
        def __getattr__(self, key): return self[key]
        def __setattr__(self, key, val): self[key]=val

    def tree(): return attrdict(tree)
attrdict is indeed a useful thing to have around. I usually base it off on the built in "dict", but as this example shows, it is useful on top of "defaultdict" as well.


Well, it's as trivial as you say. Just set __getattr__ = __getitem__ in your tree.

(Note though that there is/was a small bug(?) in CPython: http://bugs.python.org/issue14658)


Seems like it may take a little more than one line, this errors for me:

  >>> a = tree()
  >>> a.__getattr__ = a.__getitem__


Yea, sure it becomes more than one line (or at least I don't know a good one-line-way) but I wouldn't count that as an issue.

It errors for you because it is a defaultdict instance and doesn't allow attribute overwrites.

This should work:

    class tree(defaultdict):
        def __init__(self): defaultdict.__init__(self, tree)
        __getattr__ = defaultdict.__getitem__
        __setattr__ = defaultdict.__setitem__
Edit: It doesn't work with the simple assignment because of the mentioned bug. Ofc the fix is trivial. See the code from beagle3.


Jebus that was simple! I wish I'd thought of it.


+1 This is a beautiful little recipe.


This is a tree with labelled edges, not just a tree. This tree will not hold multiple values with the same label. I always imagine my trees to have no labelled edges.

Very cool though!


related, but not as clever function to help with nested counters (with doctest showing usage):

    def incr_nestedctr(d, *keys, **kwargs):
        """
        >>> a = {}
        >>> incr_nestedctr(a, 'a', 'b', 'c', 'd')
        {'a': {'b': {'c': {'d': 1}}}}
        >>> incr_nestedctr(a, 'a', 'b', 'c', 'd')
        {'a': {'b': {'c': {'d': 2}}}}
        >>> incr_nestedctr(a, 'a', 'b', 'c', 'd', delta = -4)
        {'a': {'b': {'c': {'d': -2}}}}
        >>> incr_nestedctr({u'1.0': {u'0': 1, '5': 1}}, '1.0', '5', delta = 2)
        {u'1.0': {u'0': 1, '5': 3}}
        """
        delta = kwargs.get('delta', 1)
        thed = d
        for k in keys[:-1]:
            thed = thed.setdefault(k, {})
        thed.setdefault(keys[-1], 0)
        thed[keys[-1]] += delta
        return d


In Perl, this is called autovivification: https://en.wikipedia.org/wiki/Autovivification

I've wanted something like that in Python at different times... thanks!

edit: Ha! The Wiki article even has basically the same code:

def hash(): return defaultdict(hash)


> I've wanted something like that in Python at different times... thanks!

Python's auto-vivification doens't allow Perl's hap hazard auto-vivification. Perl allows you to say:

    my $foo = {};
    $foo->{'blah'}[0]->{'bar'}++;
And after this statement, $foo will refer a hash which has the structure as accessed in the statement.

I don't think this can be done for a generalized case in Python. Whether I want is a totally different question.


Why is this haphazard? If you don't want reads to change your data structure shape in a well-defined manner, use 'exists' or make an immutable hash with Hash::Util.



wow thanks i was actually really curious if there was a name for this sort of thing! added a note about it to the gist.

i wonder whether it will be made part of more languages as JSON-esque nested objects proliferate in our code and minds.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: