Hacker News new | comments | show | ask | jobs | submit login
Diff So Fancy: make Git diffs look good (github.com)
270 points by aram on Feb 9, 2016 | hide | past | web | favorite | 107 comments

A personal pet peeve of mine when reading diffs, is when a file has some functions and you insert one and instead of looking like this:

     int someOldFunction()
         // Function body
    +int newFunction()
    +    // New function body
It looks like this:

     int someOldFunction()
         // Function body
    +int newFunction()
    +    // New function body
It's a small thing, but given that these diffs are equivalent, the one that balances the curly braces within added blocks should be favored. But diff utilities seem to get this pretty consistently wrong.

It is a small thing, but it throws me off everytime i see it, and then it takes a few seconds of looking around before it dawns on me what happened.

The "user" in me would love a language aware diffing (and merging) system, but the developer in me is already groaning about how much work that would end up taking for arguably not that much benefit.

Maybe this will help: https://www.semanticmerge.com/

I really like SemanticMerge, it makes total sense, although the diff experience is unlike other diff tools in terms of immediacy. But I think it will be another great tool to add to the arsenal when you need to pull out the big guns on crazy diffs.

Absolutely. I had it installed for some time and mostly thought "this is really neat but I could live without it." And then there was a monster merge with significant conflicts all over the place. I don't think it would have been otherwise possible to pull it off with as little damage as there was if not for this tool.

I really like SemanticMerge, in its current shape it is litte more than a fancy proof of concept.

Its biggest limitation on real projects is that it works on a single-file level, while all the interesting stuff happens on patch level. You may browse their forums to get an idea of what else is missing.

That said, I wish all the best to Codice and I really hope that they continue to invest in this tool.

Looks a lot like Eclipse's semantic diffing. I haven't used Eclipse since 2006, but it was better than anything I have seen since.

I would love to press that "Buy Now" button, but I just can't seem to get past the emboss effect on that CTA.

I think it might be easier to get to a syntax-aware diff if one approached it reusing the language specific syntax highlighting specs used in various editors. I've almost sat down to start that myself a few times.

I don't think it has to be truly language aware. A diff tool that looks for matched braces, quotes, and indents to figure out where the blocks are would do better most of the time.

It doesn't have to be fully language aware, but it has to understand most, if not all, of the syntax. As soon as you start trying to match braces, you need to handle strings, comments, and probably a whole bunch of stuff I'm not thinking of.

Here's an idea: wouldn't it by trivial for <insert your favourite language compiler here> to expose the AST of your code, and solve this problem the easy way?

    git diff --patience
might give you better results? I've seen this pattern too, but I can't find a reproduction of normal git diff giving it to me at the moment.

I use this in my ~/.gitconfig

   algorithm = patience

I've experimented with patience diff, but not seen it deliver reliably superior results than myers (the current default).

I saw it deliver superior results enough that I spent a while figuring out how to get vimdiff to use patience.

And? What do I need to do to get that? :)

Oh, just saw this, it involves invoking the proper git tools to get the diff, and then converting the diff format from unified to ed format. The later is actually easier than it might seem as unified diffs start all of their special information in column 0; IIRC I wrote an awk script to do this. It's on my work machine though, so I don't have it handy.

I tried GP's example, and in this particular case both --patience and the default (Myers) work the same, both doing the thing you want them to. Which perplexes me, because I know I've seen the bad case too, but can't seem to find a minimal example of it (I tried a couple variations on the example; they all did the 'right' thing).

That's because they all use greedy O(ND) algorithms or equivalent.

But conceptually, no matter what the algorithm, the greediness is usually a requirement to maintaining the theoretical time bound of LCS based algorithms.

Patience trades the time bound for "better" results (patience is worst case O(ND^2).

Histogram is a neatly engineered and extended version of patience with an O(ND) timebound (and in fact, is faster than both myers and patience while providing good patience-like output).

Clojure (lisp) is even more fucky because adding a single outer form can re-indent (and therefore modify) an entire huge block of code.

At least on GitHub we have the ?w=1 URL parameter on PRs which helps a bit.

Protip: ?w=1 on GitHub is synonymous with git-diff's -w flag.

I find Araxis Merge especially useful in these times. This application has a feature to "set the synchronization link" at any place you want in the code. It is not automatic (or language aware), but once you realize the difference, you can 'fix' the diff in real time. That helped me a lot for big diff files (unfortunately in my current company we don't use Araxis :( )

diff --patience uses a different algorithm that works to optimise the diffed number of lines (it's less efficient though), which will probably solve your issue (the way the standard algorithm works is to essentially find common features on a first-come-first-served basis.

The two diffs have exactly the same number of lines here, the difference is which lines are selected as part of the insertion.

[I'm the author of diff-so-fancy, Steve helped with shipping it as a standalone script]

NPM?!? :)

A lot of people below are asking why a bash script (that depends on a perl script) is being recommended to install via NPM? The short reason is that NPM is the most straightforward way to get a script installed as a global binary in a cross-platform manner. This approach has worked quite well with `git-open`[0]. Asking all users to deal with the PATH is not my ideal.

In addition, I wanted a reasonable upgrade path, in case there are neccessary bugfixes. It's not a great experience if users identify bugs but the fix means they manually find it/download/PATH-ify each time. :/

That said, I'll add some Manual Install instructions to the readme so it's clear how to do this on your own. :) ( Edit: Here they areā€¦ https://github.com/stevemao/diff-so-fancy/blob/master/readme... )

[0] https://github.com/paulirish/git-open

> That said, I'll add some Manual Install instructions to the readme so it's clear how to do this on your own

Yeah, that helps a lot :)

I saw this, was like "cool, I want to use this", and then noted that it uses npm. I avoid installing ruby and node apps -- I have nothing against either, just that I currently don't use either language or have a dependency on a major tool written in those; but they pull in a lot of deps which take up space (at least, my experiences have been that many of these tools install way too much -- probably because I don't use either and all the "default libs" aren't on my system). On my previous machine I had lots of issues with this, so as a rule I avoid these things unless absolutely necessary. I know others who are of a similar opinion.

Fortunately I realized that it was just a shell script, and installed it directly :)

> The short reason is that NPM is the most straightforward way to get a script installed as a global binary in a cross-platform manner

Really ? All Unix-like systems (incl OS X) can do this:

    /usr/bin/install myscript.sh /usr/local/bin
PS: Oh, just realized that there is npm uninstall as well, but not /usr/bin/uninstall (though it's just rm -f anyway)

`install` isn't package management, or anything close to that.

Or use FPM (https://github.com/jordansissel/fpm/wiki) and abstract the whole packaging idea. linux and mac packages supported.

I see a lot of projects that's written in other languages on npm, especially bash. Substack even published c code to npm. I think it makes the script more accessible especially for nodejs users. I don't mind publishing it to other package mangers.

Please do not confuse 'more accessible specially for', with 'less accessible and exclusively for' nodejs users.

Last I tried, npm didn't even work on IPv6, even if there's some transition mechanism in place (eg: NAT64).

It's really pretty awful as a distribution mechanism. Why not a simple Makefile?

I had no reasons to use this package, but now I have one to not use it. I only wanted to see what it was because the name suggested something fun.

Why whould someone who can't add $HOME/bin to $PATH be using git?

I realize your question is rhetorical, but there are tons of people. Anyone new to programming, in a CS course that uses git, for example, would be familiar with basic git but many would be unfamiliar with the path (or on Windows).

I concur, but they need not stay unfamiliar with it. The concept is easy: When you type the name of a program and hit enter, I look it up in a list of directories to see the first one that contains a file with that name. That list is $PATH. Any programmer will have to deal with search paths and stuff at some point in their life, and probably very early on, when they'll want to run their own scripts.

I agree that most programmers will run into $PATH at some point, but why force an order on them? Maybe they just want to get started using things like fancy diffs provided through package managers like npm.

git is a content tracker.

People who don't need to edit PATH might still need to track content.

Git is a revision control system.

Nodejs is a programming language.

NPM is a tool to fetch libraries for node programs.

Thus NPM and Git are software development tools.

A software developer or a power user are supposed to know what $PATH or %PATH is.

Wouldn't it be easier for the end user if you used pip/PyPi? Essentially all Linux distros include Python, but there are very few that ship with Node.js installed by default.

I can only speak for myself, but my systems all prohibit installation of packages into the system Python namespace by default.

Yeah, I try to avoid "sudo pip install" for CLI utilities if I can (and discourage its use to others). I put ~/.local/bin on my PATH (nonstandard XDG -like convention) and use "pip install --user" instead.

I've seen too many Python environments hosed by folks who aren't Python experts to keep suggesting that "sudo pip install <CLI tool>" is a thing most users should be doing.

I use ~/.bin/ for what sounds like the same purpose. I'm not sure I'd call that a convention - it's just what made sense to me - but it does ease issues requiring that userspace executables be on my $PATH.

So you need to be root to write to /usr/local/bin/. How does NPM magically solve this? (pip has the `--user` flag to install for just the current user, as tom points out.)

It doesn't solve it, and I don't think I claimed it did - I simply inferred that I don't think installing it into the system (or user) Python is a good idea either.

Ah, I see, I misunderstood your comment. But I think pip gives you better options than NPM, which installs into either /usr/local/(...) or the current directory. The latter sounds like a mess waiting to happen.

And most of distros don't come with pip, so we're back at square one.

Python >= 2.7.9 or >= 3.4 ships with pip installed by default. Those versions are more than a year old now. What are you running, Slackware or something?

I would double check that if I was you. My Debian Jessie machine (from the official Vagrant box) reports Python 2.7.9. No pip.

I've many Debian servers without nodejs _and_ without python.

It requires conscious work as any other dependency, but it's possible (and convenient, if you don't depend on them).

So more than probably, I'll not install npm, to test a bash wrapper to a perl script, that does something that git itself can do without external dependencies.

But obviously, different persons have different concepts of the K.I.S.S. principle.

Being a perl script... why the author didn't use CPAN? it's available in all vanilla installs of Debian, CentOS, Ubuntu, RedHat, etc...

Strange. This is from the official Python docs:

"pip is the preferred installer program. Starting with Python 2.7.9, it is included by default with the Python binary installers."


My thought would be that "binary installers" as such are considered distinct from distro-managed packages.

> What are you running, Slackware or something?


It looks like the diff-so-fancy script is bash, and the diff-highlight script is perl. Why is it set up in npm? WTF?

There's no easy install methods for small simple scripts that doesn't involve multiple manual steps, have an automated upgrade path, is cross platform and is consistent+familiar to a large subset of developers.

Of the options out there for the above, npm - while hassle if you don't have it already installed - is probably the closest balance of maintainer and end-developer convenience.

Thanks for the explanation! I went ahead and installed npm to use this, I think it's a pretty nifty little utility and worth it.

make install?

Nothing wrong with make, but to be fair, it doesn't really meet any of the criteria I mentioned

- no built-in remote repos / app directory so it's just the final step of multiple install steps

- no built-in automated upgrade path

- unless you're fiddling with Cygwin or MSys2, it's not really as cross-platform

- likely to be less convenient for the maintainer than the package.json standard

So that it's web-scale, obviously.

And you use maven to build it.

Why do I need NPM to install this? I guess I will just have to manually get and link them...

Because there's been this NPM virus infecting our systems over the past few years. It has been disgusing itself as a useful utility.

I'm going to assume that is a typo for disgusting.

disguising I think.

Might be better to link to the source directly: https://github.com/paulirish/dotfiles/blob/master/bin/diff-s...

If you don't want your diff so fancy (pun intended, and I'm sorry) but you still want the inline highlights, the script comes with git (https://github.com/git/git/tree/master/contrib/diff-highligh...):

    ln -sf "$(brew --prefix)/share/git-core/contrib/diff-highlight/diff-highlight" ~/bin/diff-highlight
and add to .gitconfig

            log = diff-highlight | less
            show = diff-highlight | less
            diff = diff-highlight | less

I know he gave credit in the README, but why does this ~30 line shell script need its own repo? Seems more like a cheap grab for Github Stars rather than to provide actual value.

Edit: Even the screenshot is from Paul...

A bit ago, I had indicated this script should be more accessible than copying two separate files out of my dotfiles repo: https://github.com/paulirish/dotfiles/blob/master/bin/diff-s...

Steve took the initiative to put this in it's own repo. Seems okay; I was being rather slow to ship it for real.

All good then ;)

Sheesh. The repo allows the stuff like installing it via `npm install -g diff-so-fancy`. It's not like a public repo costs anything, and Github stars don't get you anything either.

I think it makes it easier to add contributors. Git doesn't allow a user partially access a repo so Paul would add a few people only contributing on one script. I added some really talented contributors :) The repo under my name doesn't mean it belongs to me. In fact it belongs to the public. Everyone can contribute to it so everyone is the "owner". That's how I see open source projects. I don't mind giving it back to Paul. I did it only because I needed it. I believe a lot of people would want it more accessible too.

Why doesn't any code you are using need it's own repo, to track history, what does it matter how short it is?

So, the whole npm thing seems weird to me, then it occurred to me that it could be for malicious purposes. Would it be possible to upload a separate package.json to npm that had eg a post-install script? I don't know much about how npm works from the package publication side of things, but I assumed it was similar to pypi where the code in the git repo doesn't have to be at all related to the code in the package

    git diff --word-diff=color
provides a really nice word diff with coloring similar to this project. Just setup an alias in your global gitconfig:

            cdiff = diff --word-diff=color

Looks good, but why does it use npm for installation?

Right: the package.json is almost as long as the script itself.

Because Paul says so in the original source on line 34...


Remove contextual +/- in favour of colour highlights? As a person with red-green colourblindness all I can say is: Lol. Nope.

You know you can customize the diff colors, yes?

From the git book itself: `git config --global color.diff.meta "blue black bold"`

That's a helpful perspective, but hopefully you already have workarounds on the color issue. This is simply making diff pieces easier to copy / pasta -- which I for one have needed to do. Change my mind on a refactor midstream, need to restore part of the file, etc.

You might investigate the tools your editor of choice provides for working with diffs. You might be surprised at how easy it can be!

(In Emacs, if you're using git, magit is an amazing package. You can select a commit from the logs, dive into the diff for a file caused by that commit, highlight a region of the diff and revert the change in your working copy. It's wonderful.)

Came here for someone to shout-out magit. It's amazing

what I'd like to see is that a/b in front of the filenames disappear. Getting rid of that would FINALLY allow me to double-click on the filename (which is configured to select the part between the spaces and copy it to the clipboard) and paste it instantly for the next command... or to be able to do git diff > foo.patch and on another system do patch < foo.patch without having to remember the correct -p value.

git diff --no-prefix

What happened to this line from the second file in the screenshot?

  -        var optionsGlassPane = new WebInspector.GlassPane(document);

An important part of viewing diffs for me is seeing what the old code was.

I didn't quite like this, put it does reference diff-hightlight, which is part of git-contrib (so it may already be installed on your system, but just not in your $PATH!):


For example, here's a diff where it improved readability enormously:


Yes, the code in that sample is horrible triplicate. Please ignore that.

I handle this in a way that is more agnostic to the type of revision control, and fully flexible in coloring (using the most powerful scheme available).

For example, I shouldn't have to put up with basic colors if the terminal can do better.

Here is how it works; starting with:

  if [ -r ".svn" ] ; then
    exec svn diff ${1+"$@"} | my_colorize_diff
    git diff ${1+"$@"} | my_colorize_diff
...where the "my_colorize_diff" script at the end of the pipe is as follows:

  #!/usr/bin/env perl
  # by Kevin Grant (kmg@mac.com)
  my $term_program = (exists $ENV{'TERM_PROGRAM'} && defined $ENV{'TERM_PROGRAM'}) ? $ENV{'TERM_PROGRAM'} : '';
  my $term = (exists $ENV{'TERM'} && defined $ENV{'TERM'}) ? $ENV{'TERM'} : 'vt100';
  my $is_xterm = ($term =~ /xterm/);
  my $is_24bit = ($term_program =~ /MacTerm/);
  print "\033#3BEGIN DIFF\n";
  print "\033#4BEGIN DIFF\n\033#5";
  while (<>) {
    if (/^\+/ && !/^\+\+/) {
      if ($is_24bit) {
        print "\033[48:2:150:200:150m", "\033[2K", "\033[38:2::88:m", "\033[1m";
      } elsif ($is_xterm) {
        print "\033[48;5;149m", "\033[2K", "\033[38;5;235m", "\033[1m";
      } else {
        print "\033[42m", "\033[2K", "\033[30m", "\033[1m";
    } elsif (/^\-/ && !/^\-\-/) {
      if ($is_24bit) {
        print "\033[48:2:244:150:150m", "\033[2K", "\033[38:2:144:0::m";
      } elsif ($is_xterm) {
        print "\033[48;5;52m", "\033[2K", "\033[38;5;124m";
      } else {
        print "\033[41m", "\033[2K", "\033[37m";
    } else {
      print "\033[3m";
    print "\033[0m\n";
  print "\033#3END DIFF\n";
  print "\033#4END DIFF\n\033#5";

For what it's worth, there's a lot of 24-bit-capable terminals that aren't MacTerm. Even xterm supports the 24-bit-color sequences, although it picks the closest entry in its 256-colour palette rather than using the 24-bit colour directly.

Also, you seem to be assuming "xterm" supports 256 colours and everything else doesn't. The best way to figure out how many colours the terminal supports is $(tput colours). tput also looks up other useful sequences; you can "tput bold" to turn on bold mode, "tput setaf 12" to set the foreground to colour 12 (bright yellow), "tput sgr0" to zero all active formatting, etc.

Good point. Although, unless there are shells that have "tput" built-in, that means more subprocesses to obtain basic information (which would slow down the result a bit). In my case, the environment is sufficient to figure out what to do.

I have an alias for the diff params bellow, which has basically the same visual result, without the need to install anything.

  $ git diff --color --color-words --abbrev

Nice improvement to diff but I think I'll still use `git difftool` with Diffmerge https://sourcegear.com/diffmerge/

We've grown and it's out of control. diff-so-fancy moved to an org!!! https://github.com/so-fancy/diff-so-fancy

what are the advantages over colordiff[0]? (or a graphic differ like kdiff3[1])

[0] http://www.colordiff.org/ [1] http://kdiff3.sourceforge.net/

> No pesky + or - at line-stars, making for easier copy-paste.

I wonder if easing copypasting is a good or bad thing..

Vim's fugitive plugin also gives a similar split diff view. The command is "Gdiff".

Even without the plugin, you can spawn a diff in vim via git's difftool settings. Here's a basic version that can be used in one's .gitconfig:

      tool = customvim
    [difftool "customvim"]
      cmd = vim -R -f -d \"$LOCAL\" \"$REMOTE\"

I find icdiff [0] (Improved Colored Diff) better.

[0]: https://github.com/jeffkaufman

Is there any language aware diff tool? I think I saw some commercial product for Java but other than that I haven't seen any attempts to do that.

When writing manuscripts in git, my favourite "trick" is git diff --word-diff (of course aliased to git wdiff).

It looks so good and real, I kept hitting `q` whilst looking at the image.

the netbeans.team.diff tool is similar (showing the specific words that changed), allows interactive editing, and does a good job even with large insertions and deletions

So what does this offer over simply using vimdiff?

this doesn't look useful. the left one is just fine. for everything else you wouldn't need a cli diff tool...

when one or 2 words change in a long or dense line, it's nice to have the specific changes highlighted

imagine a for loop in which a variable (used on every line) was renamed, and buried in the loop an assignment changed slightly (eg, + became -). with a standard diff, it's really hard (for me) to pick up the minor change. with word by word diffs, it's pretty easy

(i use netbeans diff, not this tool, but they appear similar)

git has built-in word diffing with --word-diff, though. --word-diff=color does almost exactly the same thing.

thanks. this is very nice and i wasn't aware of it

note: need to pipe it to "less -r"

Hurts. Hurts bad. Tiny type in many primary colors on a black background. 1985 called, it wants its screen layout back.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact