You could compare with an append-only-file (with offline compaction), in which c...

pjc50 · on Dec 27, 2014

How is backing up lots of small files harder than a small number of big files? With which software?

herf · on Dec 27, 2014

Using rsync and 20M files is a good enough example: - to check the date of each file on UNIX you must run "stat" once per file (unless you have an external log that says what to skip) so that's very slow. - to backup a big file is "--append-verify" or something like this and one streaming read per file.

electrum · on Dec 27, 2014

This sounds similar to Facebook's Haystack: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/...

pmoriarty · on Dec 27, 2014

There are big files, and then there are Big files. Some files are just too big. At some point, it's easier to deal with smaller files.

herf · on Dec 27, 2014

Yeah, this doesn't preclude breaking things into chunks (typically limited by however many files a non-root user can open).

I just dislike unnecessary "open" calls because it can get kind of crazy. For instance, if you happen to send a short filename to a samba server (which must enumerate every file in every folder up to the root), or use NFSv4 ACLs, etc.