That sounds great if you've got large fields with lots of redundancy. In fact, we do this.
But small fields won't compress so well on their own. Often there's a lot of redundancy across records for the same fields which is great for compression. This might be a great way to achieve the benefits of field name tokenization too (which is similar to part of how most compressors work). I'd like to see block compression, rather than field compression .
Hmm...interesting. I don't know if this will work, but you could try storing your MongoDB database on a compressed ZFS partition. Since MongoDB uses mmap, this would have the nice side-effect of your working set remaining uncompressed, and only being compressed when written to or read from disk.
you're not the first person to suggest that to me :) although I haven't thought about using ZFS for this. You're not the only one to suggest ZFS. Why that and not compression in btrfs (or something else entirely)?
_hopefully_ the mmap interface would provide the best of all worlds: mongodb continues to be simple with respect to how it handles getting data from disk and the kernel/fs can do it's magic behind the scenes of mmap. Of course, it could be that mmap + compressed filesystem leads to some unexpected (and bad) perf results. But then again, I've never tried :) have you?