Hacker News new | past | comments | ask | show | jobs | submit login

What if you have unreasonable quantities of data? I've as yet not come across a really good program that lets me do `bigdiff <(xzcat bigresult-old.xz) <(xzcat bigresult-new.xz)|less` (where the files are gigabytes of text with fairly few differences) in a reasonable amount of time/memory. I've used hacks that only work on a line-by-line basis (or use some hardcoded marker in the input) to try to read both files in parallel and run a real diff on a subsection when seeing a difference between the markers, but it's far from trivial getting it to work well (and I unfortunately don't have time to shave that yak :/)



I always have an old version of the source code of Solaris' bdiff with me (https://github.com/Arkanosis/Arkonf/blob/master/tools-src/bd...), just in case. It might have changed in the meantime in OpenIndiana / Illumos.

It was a very significant improvement in speed a few years ago — though with time I've gotten more RAM faster than bigger files to run diff on, and I haven't had any difficulty with the regular Linux diff for a long time.


Wow, zero memory usage and immediate output on files where GNU diff just sits there eating memory until everything is read! Thanks, that's fantastic.


As far as diff performance go concatenating the two large files using the diff replace all syntax is faster, uses o(1) memory, o(n) time and it's only slightly space inefficient.

I'd also say it might beat the OP alghorithm in performances under certain assumption (i.e large writes vs scan read performances)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: