

Sentdiff: Diff for Writing - jackowayed
http://github.com/jackowayed/sentdiff/tree/master

======
gourneau
Howdy, thanks for posting this.

It would be awesome to visualize wikipedia edits overtime. I don't really care
about what the text says, just how blocks of it change over time. I am after
the ascetics present in the ever flowing change of data. I think your script
might be a good starting point. Think of the videos of a flowers growing, that
compress months into a few seconds. Do something similar with wikipedia edits.

~~~
keenerd
I've been doing something similar with my blog, except it is currently for
people who do care what the text says. The diff algo is tricky, I should have
built up a larger corpus of material before designing.

It looks at paragraphs, sentences, sub-sentence structures, words. It even
draws little sparkgraph-ish diagrams. It is not really that long (250 lines by
wc) but it has been a huge time sink for tweaking.

For an example of some heavy editing:
<http://kmkeen.com/inabow/2009-01-07-11-22-00.html>

------
ashr
I was hoping to see a compact implementation of diff algorithm. However, the
script seems to be relying on using the 'diff' utility already present. Not a
bad thing, but I was expecting to see something else.

~~~
jackowayed
I went with a don't-reinvent-the-wheel approach. Anything I did would have at
least doubled the time to write the script and probably yielded a diff half as
good.

~~~
dhotson
This might be worth a look for a decent diff algorithm:

<http://code.google.com/p/google-diff-match-patch/>

~~~
akkartik
Also this: <http://bramcohen.livejournal.com/37690.html>

------
jackowayed
I wrote this as well as submitting it.

Any suggestions, thoughts, etc. would be greatly appreciated.

~~~
boucher
Why sentences and not words?

~~~
anewaccountname
Maybe he didn't design this for phonologists? I doubt anyone else has too much
use for intra-word diffs.

~~~
jrockway
I think it would be useful. Consider this example.

The original sentence is: "He went to the stoar." Person A sends this to
person B for review. While Person B is reviewing this, Person A changes the
sentence to "Bill went to the stoar.". Then, Person B sends back the spelling
correction; "He went to the store.".

If the sentence translated to one line for `diff` to work on, this would be a
merge conflict. If each word was its own line, this would merge cleanly.

~~~
keenerd
You've just described dwdiff. Not the smartest diff algo, though its strong
point is producing diffs of prose that are human readable. I used it as the
core of a wiki with 'perfect' collision resolution. Two people could edit at
the same time, and the 2nd to hit save would have their work merged in with
dwdiff instead of having their work automatically discarded.

------
gcv
For Emacs users: ediff highlights changed words. I've used it to track changes
in text.

------
albertcardona
I could use this as part of git itself for comparing latex document revisions.
The current line-oriented diff has all the problems that sentdiff tries to
solve.

~~~
bbb
Git has something very much like that already built in. Try

git diff --color-words

~~~
albertcardona
I know about --color-words, it's reasonable. But not optimal. Sentence diff
plus color words would be great.

(and a pony--I know, why not implement it myself?)

