Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You could compare with an append-only-file (with offline compaction), in which case saving tons of "file open" operations would actually be a big win, especially if you ever use a network file system. In some cases, a "simple" log+index can beat both schemes.

Also, I've stored lots & lots of photos in a filesystem, and backup is very hard. Backing up big files is just a lot easier.



How is backing up lots of small files harder than a small number of big files? With which software?


Using rsync and 20M files is a good enough example: - to check the date of each file on UNIX you must run "stat" once per file (unless you have an external log that says what to skip) so that's very slow. - to backup a big file is "--append-verify" or something like this and one streaming read per file.


This sounds similar to Facebook's Haystack: https://www.usenix.org/legacy/event/osdi10/tech/full_papers/...


There are big files, and then there are Big files. Some files are just too big. At some point, it's easier to deal with smaller files.


Yeah, this doesn't preclude breaking things into chunks (typically limited by however many files a non-root user can open).

I just dislike unnecessary "open" calls because it can get kind of crazy. For instance, if you happen to send a short filename to a samba server (which must enumerate every file in every folder up to the root), or use NFSv4 ACLs, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: