Hacker News new | past | comments | ask | show | jobs | submit login

Strange headline, considering I only said it might be ready for inclusion this year, and saying I tried to get it accepted before is a bit of a stretch. I've talked about upstreaming on the list, but decided - not yet. It's still under heavy active development.

I think the last time I talked about upstreaming on the list was before I added snapshots. Snapshots was the last big feature on my roadmap, but for the longest time I hadn't tackled it because for the longest time some of the algorithmic issues with extents looked unsolvable (many ideas were sketched out in notebooks only to be ultimately discarded).

Then Dave Chinner prodded me again about snapshots after I'd finished reflink, and I realized when I finished reworking handling of extents to be entirely above the proper btree & transaction layer, the extents & snapshots issues that previously looked unsolvable became almost irrelevant. Hooray! But then I had to implement snapshots, which took another year :)

So snapshots is done, and I'm really happy with how it turned out - there's a lot to be said for the slow and incremental way of doing development, of not tackling hard problems until the right time. They're fast and scalable - I've gotten up to a million snapshots in a test VM. There's some things I need to go back and fix - the code that tears down snapshots in the page cache is sucky - but I'll get to it.

Right now, I'm focusing on scalability stuff (just finished an allocator rewrite, and backpointers are right around the corner, which will address copygc scalability), and fixing everything that my current users complain about. And hardening. And debugability. Lots of that.

I'm just not in a rush to upstream before it's _rock solid_ and ready for the masses. It'll get there :)




Speaking of snapshots, are you planning to have analogs to ZFS's send/recv and diff? I didn't see any mention of them on the snapshots page or the roadmap, but I could be looking in the wrong place. Already having reflink support is exciting though; a ZFS replacement that can also do reflinks would be amazing.


Yeah but it's not really designed yet. We have the main pieces - all metadata exists as keys in various btrees, and keys all have version numbers so we can easily do a protocol to transfer all keys newer than x - and then your remote filesystem will be up to date.

To make it efficient and fast we'll want to be able to do it without scanning all filesystem metadata, which will require auxiliary indices, and that'll probably need some careful design to make it fast. But we've got transaction layer triggers, so we've got the model for that stuff in place.

Also reflink support is plenty solid at this point. I can't really recommend snapshots for production use yet - in particular, we're missing per snapshot/subvolume disk space accounting, which you kind of need. But reflink's been in use and getting tested for ~2 years now, and there's no known issues with it anymore.


Okay, good to know it's being considered. For a non-optimized diff then (zfs-diff, which just prints which files have changed, not what about them have changed), it would be the same approach? Scan all the metadata looking for keys with versions between the two snapshots, and print out the associated file names? Would that simple approach be likely to be faster than whatever rsync does to find updated files when doing a backup?


If you just want to know what files have changed, there's no reason to scan all metadata - you'd just want to scan all the inodes (and we'd need to add a version number field to the inode itself that covers all the contents; easily done). And since just prior to snapshots inodes have backpointers back to dirents, so given an inode we can easily get a path.

So yeah, that would be easy and a lot faster than what rsync does. Dunno about plumbing to anything in userspace, that's outside the scope of what I can think about right now. But if anyone wanted to work on such a feature, I'd happily help them get started.


Thanks for the clarification :)

According to https://bcachefs.org/ regarding snapshots:

> Status - Feature status

> Snapshots - Done, still shaking out a few bugs

Any major issues to mention up front, in order to stop someone curious from playing around with it? Or just minor ones that would mostly be unnoticeable? Would obviously not use it for something serious yet, but to get a feel for the filesystem with some data.


I only see references to a BCH2 ECC in ec.c source file. I would have thought a non-binary code like RS would be better suited to filesystem coding since a fault is likely to corrupt more than a single contiguous bit. I don't have any experience with FS ECC, is there a reason BCH2 is desirable for bcachefs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: