
Bcachefs – A general purpose COW filesystem - koverstreet
https://lkml.org/lkml/2015/8/21/22
======
wscott
He doesn't really mention how this relates to the current bcache code. That
one is brilliant. I run a 4 drive raid with slow spinning 2TB drives and a
100G SSD drive as a writeback cache top. It detects strides so reading and
writing large files stream directly to the raid, but little reads and writes
got via the SSD. By filtering all the little traffic, the actual traffic to
the drives is more efficient and spends less time seeking. In that setup
bcache effectively creates a cached block device and I format that with ext4.
It seems now it includes the filesystem. Not sure I like that, but it seems to
be the trend in filesystems since ZFS.

~~~
bbrazil
I've been considering using bcache on one of my systems, last time I checked
it seemed a bit unready in terms of all the steps needed to set it up and keep
it working. What has your experience been?

~~~
jfindley
I've run bcache-backed databases in production for about a year or so - it's
been absolutely rock solid, and I'm very happy with it.

We spent a lot of time prototyping basically every SSD caching tech that
existed at the time and bcache was the clear winner (note that dm-cache, the
tech underpinning lvm-cache was pretty immature at the time).

You do need a workload with a fairly reasonable cache hitrate to get the best
out of it, but that's obviously true of all such technologies.

------
webaholic
Another file system pursuing the lofty goals of ZFS/btrfs on linux (the other
being tux3). It is a catch 22 situation for these guys. If they do not release
it early there will be no users to test and report bugs, if they do release it
early it is half baked. I hope the guy has a large stash in his bank. File
systems take notoriously long to stabilize.

------
santiagobasulto
I think it's great to see new FSs targeted to SSDs. There's an alpha version
to try out:
[http://bcache.evilpiepirate.org/](http://bcache.evilpiepirate.org/)

------
ck2
The performance is so close on all filesystems right now on Linux, why not
just contribute to the code for one of the existing solutions and make it more
mature?

[http://www.phoronix.com/scan.php?page=article&item=linux_rai...](http://www.phoronix.com/scan.php?page=article&item=linux_raid_fs4&num=2)

F2FS is looking really promising for SSD

ps. Kent's benchmark numbers struck me for one particular aspect I haven't
seen in other benchmarks - max latency - and EXT4 is looking darn good in that
aspect - I still use EXT4 over XFS, I simply do not trust XFS enough yet

~~~
wyldfire
> I simply do not trust XFS enough yet

You should give it another look. My team's used it for a mission critical
project for over a decade. It's older and more mature than ext4, IMO. For
sustained write throughput, it does much better than ext4. That said, many
folks have utilization which looks more like random I/O than sequential.

~~~
e12e
I'd also like to hear any rationale for why one would trust XFS _less_ than
EXT4. There might be arguments against using XFS -- but trust/maturity sounds
like a very strange one to pick?

~~~
aidenn0
I don't know how it is now, but XFS was at one point, almost by design, setup
to lose your data on power-failure.

~~~
e12e
Refering to this?:
[http://xfs.org/index.php/XFS_FAQ#Q:_Why_do_I_see_binary_NULL...](http://xfs.org/index.php/XFS_FAQ#Q:_Why_do_I_see_binary_NULLS_in_some_files_after_recovery_when_I_unplugged_the_power.3F)

Are you sure it was worse than ext2/3?

[ed: Looks like it might be better/as good as ext2, but somewhat worse than
ext3/4 -- apparently xfs jourunals only metadata:
[http://superuser.com/questions/84257/xfs-and-loss-of-data-
wh...](http://superuser.com/questions/84257/xfs-and-loss-of-data-when-power-
goes-down)

Still doesn't seem like it destroy's data on powerloss -- just the usual - if
data isn't written to disk, then it's not written to disk.]

~~~
aidenn0
Yeah, still better than ext2. I think part of it is that many people expect
the file to either not have grown, or have the data you wrote to it.

IIRC (and I'm a bit hazy) here were other issues too that have been since
fixed. The long delay between write and commit meant that a lot of bugs that
would otherwise have been vanishingly rare got exposed. Likely the ext systems
have/had similar bugs that just have only happened a single digit number of
times in the past 20 years.

------
ilurk
> PLANNED FEATURES: erasure coding

Now that is something I'd really like to see in Linux filesystems. AFAIK the
GNU/Linux implementations of ZFS do not support ECC. Only the Oracle version.

~~~
acqq
The problem with the erasure coding implementations: patents. An example:

[https://www.techdirt.com/articles/20141115/07113529155/paten...](https://www.techdirt.com/articles/20141115/07113529155/patent-
troll-kills-open-source-project-speeding-up-computation-erasure-codes.shtml)

~~~
coalescence
There are a few different profiles offered in the Ceph Erasure Coded Pool
support, afaik they are based on research papers. Assume due diligence done to
ensure they can be used openly (ianal etc).

[http://ceph.com/docs/master/rados/operations/erasure-code-
pr...](http://ceph.com/docs/master/rados/operations/erasure-code-profile/)

edit: Indeed, checked jerasure site and that's gone.

------
tobias3
And just today I was wondering what happened with this btrfs patch set which
introduces some of the benefits of bcache to btrfs:
[http://marc.info/?l=linux-btrfs&m=129622115023547](http://marc.info/?l=linux-
btrfs&m=129622115023547)

------
KenCochrane
Sounds like a really cool alternative to aufs, would love to see a Docker
storage driver added to support this, once stable.
[https://docs.docker.com/reference/commandline/daemon/#daemon...](https://docs.docker.com/reference/commandline/daemon/#daemon-
storage-driver-option)

Any chance this might get added to the mainline kernel? aufs was never able to
get merged in. OverlayFS was merged in, but IMHO isn't as good as aufs.

~~~
wmf
This isn't a union filesystem; it's a primary filesystem that happens to use
COW internally for better reliability. It's comparable to ZFS, btrfs, or tux3.

------
e12e
Looks very interesting. Would be cool if it was also compared against my
favourite "dark horse" fs: nilfs2[1]. I always thought it'd make a great fs
for flash storage -- but in the end I've generally ended up running ext4 over
LUKS on my SSDs so far.

[1] [http://nilfs.sourceforge.net/en/](http://nilfs.sourceforge.net/en/)

------
mtgx
Does it outperform F2FS on flash storage, too?

~~~
rdtsc
Interested in that too, I was following F2FS progress a bit as well.

------
amenod
> It's taken a long time to get to this point - longer than I would have
> guessed if you'd asked me back when we first started talking about it - but
> I'm pretty damn proud of where it's at now.

This seems true for most of the non-trivial projects. :)

------
mwilcox
the guide for bcache mentioned raid/erasure coding isn't implemented yet -
anyone know if that is up to date and/or when it could be expected to be
integrated?

~~~
insaneirish
> anyone know if that is up to date and/or when it could be expected to be
> integrated?

If btrfs is used as an approximation, approximately three weeks after the heat
death of the universe.

------
danbee
Anybody else just see this? [http://cl.ly/ccI0](http://cl.ly/ccI0)

~~~
coldpie
Unfortunately lkml.org is quite buggy. Try again in the future, or find a
different mailing list archive, subject "[ANNOUNCE] bcachefs - a general
purpose COW filesystem".

------
jeffbe
ext4 seems always to be the best.. no?

~~~
laumars
ext4 has it's place - and deservedly so. But picking the right file system
really depends on your requirements. For example if you're building a storage
server then ZFS would be a better fit.

XFS is also worth taking notice of since the benchmarks I've read rate it for
having faster read and write speeds than ext4. However I don't have extensive
first hand experience running XFS (something which I'm currently addressing).

~~~
wazoox
I've set up several hundred storage servers with XFS (up to 250 TB per FS) in
the past 15 years. There were sour times, 10 years ago or more, but in recent
times it's been rock solid and beat the crap out of other FS. Notice that XFS
always hated crappy hardware, because it pushes it to the limits.

~~~
nickpsecurity
Unsurprising given it was built by the same people who built these:

[https://www.sgi.co.jp/features/2001/dec/fleet_numerical/imag...](https://www.sgi.co.jp/features/2001/dec/fleet_numerical/images/lg_origin.jpg)

And more recently this:

[https://www.sgi.com/pdfs/4555.pdf](https://www.sgi.com/pdfs/4555.pdf)

Because pushing the envelope of HPC awesomeness was just another day in the
office for folks at SGI. :)

------
transfire
Are acl and xattrs active by default?

~~~
koverstreet
Yes

~~~
transfire
Nice. I will be keeping a close eye on this fs for future use.

------
AnbeSivam
Is the support of O_DIRECT available or planned ?

~~~
koverstreet
Yep - it's been done for awhile, works fine.

------
nwmcsween
Ken imo you should rename bcachefs to something else maybe bcfs?

------
SFjulie1
This guy should rush to write documentation first...He will probably "build to
budget" exhausting money and his time in coding.

If he does so and finishes his project with a burnout there is a huge
interrogation on how production people will be able to fix without a good
knowledge of the design during the recovery time (burnin =~ burnout time).

Read the source luke is nice. But 10 pages in natural languages speaks louder
than 10K lines of codes even with comments.

I dislike these kind of coders that goes for the beef and despise the grunt
works like not for them.

Documentation should come first. This code will have to be maintained since he
-like a good drug dealer- is trying to hook people to use his code.

~~~
bizarref00l
You may check
[http://bcache.evilpiepirate.org/BcacheGuide/](http://bcache.evilpiepirate.org/BcacheGuide/)

