

Fdupes: A tool to de-duplicate files - evandrix
https://github.com/adrianlopezroche/fdupes

======
wazoox
I personnally use fdupes.pl:

[http://www.perlmonks.org/?node_id=85202](http://www.perlmonks.org/?node_id=85202)

Tested on many millions of files, works like a charm (though it can run out of
memory on a 32 bits machine). I'm using the enhanced version here:
[http://www.perlmonks.org/?node_id=1099194](http://www.perlmonks.org/?node_id=1099194)
which has an autodelete flag and prudently ignore symlinks.

------
TheDong
I personally found fdupes to be slower and more limited than dupfiles [0].

I switched to dupfiles about a year ago and haven't had any problems yet.

[0]: [http://liw.fi/dupfiles/](http://liw.fi/dupfiles/)

------
mmastrac
I used this when I was working on a product that used automated tests to
upload files repeatedly during the day. The volume of test files was so great
that it continually put pressure on the storage -- more pressure than the
uploads from the actual users.

Fortunately the uploads were from a set of a few dozen static files, and de-
duplicating the data via fdupes was able to drop disk usage by a factor of
20-50x.

------
cwilper
I did something similar to this a while back, called qdupe[0], written in
Python. It doesn't do the deleting for you, but is very fast at identifying
duplicates if you have a lot to compare. Based on the fastdup algorithm.

[0] [https://github.com/cwilper/qdupe](https://github.com/cwilper/qdupe)

------
DigitalJack
Is this multiplatform? I think it's interesting how many projects forget to
mention what operating system they target.

~~~
analog31
I found myself asking the same thing, and ended up finding a Python program:

[http://www.pythoncentral.io/finding-duplicate-files-with-
pyt...](http://www.pythoncentral.io/finding-duplicate-files-with-python/)

I plan on giving it a whirl, to help me clean up my backup drive.

------
panzi
Yeah, I wrote something similar a long time ago in Python:
[https://bitbucket.org/panzi/finddup/src](https://bitbucket.org/panzi/finddup/src)

------
kylek
It's not exactly clear, but I'm assuming this is some kind of automated hard-
linking utility? Or does it use its own special magic? (filesystem type
restrictions?)

~~~
phireal
It just finds (and optionally deletes) duplicate files. For hardlinking the
duplicates, there's freedup.

------
theophrastus
not nearly as fancy, but it gets the job done for me:
[http://www.commandlinefu.com/commands/view/3555/find-
duplica...](http://www.commandlinefu.com/commands/view/3555/find-duplicate-
files-based-on-size-first-then-md5-hash)

------
xenonite
a similar tool is
[https://code.google.com/p/hardlinkpy/](https://code.google.com/p/hardlinkpy/)

