Hacker News new | past | comments | ask | show | jobs | submit login
Dictdiffer: Diff and Patch Python dictionaries (github.com/fatiherikli)
65 points by fatiherikli on May 26, 2013 | hide | past | favorite | 10 comments



Some of these capabilities are already built into Python.

See Guido's "dictionary views" described at: http://www.python.org/dev/peps/pep-3106/ In Python 2.7, those dictionary views are exposed as d.viewkeys(), d.viewitems(), and d.viewvalues(). The key and item views both support set operations such as union, intersection, and difference. Also note, the dict() constructor will accept sequences of items as input. Those tool make it trivially easy to express diffing and patching in native Python:

    # diff from d to e 
    patch = (d.viewitems() - e.viewitems(),   # deletions
	     e.viewitems() - d.viewitems())   # additions

    # apply the patch to f
    # dict(f.viewitems() - patch[0] | patch[1])


Well, this is quite different in the sense that the "patch operations" can also operate on the values, not just add/delete/replace them. In the example, the patch pushes and pops elements from the lists, rather than replacing them altogether.

I haven't looked, but I guess it is also recursive on values that are dictionaries themselves. In this case the "patch" is a sort of sequence of edit operations on the dictionary tree.


This is one of those odd corners of the python ecosystem

I have written at least two dict differs, and I suspect they are like web frameworks - everyone has written a half working one.

However we need good tools that get polished and standardised - this looks nice and I like the full circle ability, but in python's own version of Catch-22, until it succeeds I will stick with writing my own.

I hope it succeeds - my meware sucks


The list diff is really a set diff, if I understand it right, in that it disregards order and multiple instances? That should only happen if the dict member is a set. Members which are lists should be treated using a Levenshtein-distance style patch. For free you then get patching of string members.

I'm getting ambitious now, but what about tree-edit distance (with patch script) for members which are nested lists?

There is Python code out there for these:

https://github.com/timtadh/zhang-shasha

https://code.google.com/p/py-editdist

EDIT I have only used the above libraries for distances, not for patchable diffs. But the Levenshtein and tree-edit distance algorithms are amenable to outputting the patch scripts.


JSON-patch might also serve: http://tools.ietf.org/html/rfc6902

Here's a Python implementation: https://github.com/stefankoegl/python-json-patch


This looks like a perfect complement to a mongo or couch data store. Thanks!


What might be some of the use cases?



Yes I already had read that.


I'm writing a game where I need to sync state between client and server, I think this might come in handy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: