Hacker News new | comments | show | ask | jobs | submit login
ZFS for Linux (linuxjournal.com)
84 points by bcbs 7 months ago | hide | past | web | favorite | 72 comments



I keep hearing that the GPL and CDDL are incompatible. But repeating something doesn't make it true. And I have read both licenses, and have come to the conclusion that they're unlikely to be incompatible in the first place. Unfortunately, every time this line of argument come up, it tends to get buried in discussion.

The basis of the incompatibility claim seems to be from 3 arguments:

1) Statements from the FSF website, but those have no legal bearing against the actual text of the licenses

2) Claims that the CDDL was engineered to be incompatible with the GPL, again, it's an interesting hypothesis but has little bearing against the actual text of the licenses

3) Derivative Work argument from the GPL. This seems to be the only one that could hold water; however, I doubt that a court would find ZFS to be derivative work of the Linux kernel.

Furthermore, even if you argue that the "derviative work" clause makes them incompatible, there’s no way actually to prosecute such a violation. Copyright law is something called a tort in law, which means you have to show that someone violated the rules AND also caused quantifiable harm. What would our theory of harm be? (https://blog.hansenpartnership.com/are-gplv2-and-cddl-incomp...)


Actual lawyers have also read both licenses and came to the conclusion that they are incompatible: https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/

The main debate over the incompatibility isn't about GPL vs CDDL; even Canonical isn't arguing that the licenses aren't incompatible. It's specifically about the Linux kernel vs ZFS; ie. does linking the ZFS module with the Linux kernel make the ZFS module a derivative work of the Linux kernel.

If yes, it has to be able to have the GPLv2 applied to it, which it can't. If no, then the incompatibility is irrelevant, and the kernel and ZFS can be distributed together and under their own separate licences. This is where opinions (by legal experts, not people on the internet) differ, the FSF and SFC's lawyers say this makes ZFS a derivative work and you can't distribute them together, Canonical's lawyers say it doesn't and therefore you can.


The crux of the Software Freedom Conservancy argument hinges on their belief that there is no distinction between a statically and a dynamically compiled Linux module. Modules are, of course, a programming term referring to concerns that are separated into logically discrete functions (modules), while interacting through well-defined interfaces.

The Software Freedom Conservancy argues that not only is zfs.ko a derived work of Linux, but really any dynamically loaded Linux kernel module is a derived work (of Linux). To me that feels like a bit of a stretch, and one could make the opposite argument.

By virtue of the fact that zfs.ko is an optional module, rather than an integral part of Linux, it's not a derived work. ZFS.ko MUST be a separate entity from Linux in the first place; SINCE zfs.ko is a module, it MUST be logically discrete, while interacting with Linux through the standard, normal kernel interfaces. Further, it's likely that only if ZFS were distributed as a changeset against the Linux source code (rather than buildable as a distinct module) would it possibly be a derived work of Linux.

The statically vs. dynamically compiled (linking) distinction doesn't hold water as it is an arbitrary technical one that only has to do with how things are layed out on the filesytem and/or loaded into memory, since modules, plugins, add-ons, and similar things almost often can be either statically compiled in or dynamically loaded, of which there are many examples of in many existing software.


I'm surprised there isn't a one-click way for the user to build their own kernel with ZFS inside it.

As long as I don't distribute the resulting binary there should be no legal implications.


There is (at least on Debian) - enable contrib and use apt.

The problem is that neither your install or recover CD won't have it enabled by default, so if you have a problem, your hard-drive won't mount.


So extend the script to build you a recovery ISO or USB image too, and then insist that you reboot and test the recovery image.


For those interested, it appears the current generation of "running system to live image" is:

https://github.com/Tomas-M/linux-live

While the current generation of "make me a magic shiny debian image" is:

https://github.com/larswirzenius/vmdb2/

(I went looking for some tool I've used in the past, which may, or may not, have been "live-build").

In general though, just pulling down a dkms zfs module into any working recovery image via apt should work.

But not if you don't have net access [ed: or a lan apt mirror..].


Or install ZFS from your live recovery instance and load the mod. Not a big problem.


DigitalOcean[1] has a one–click option to compile FreeBSD kernel with ZFS configured (for what its worth).

edit: including my referral link here will probably result in negative karma, but i'm desperate for points!

[1]: https://m.do.co/c/8c882a721944


FreeBSD isnt GPL licensed but BSD and therfor has no licensing issues.


FreeBSD supports ZFS by default. FreeBSD is not the problem.


Freebsd already has ZFS enabled by default?


FWIW, Canonical's legal analysis also concluded the licenses were not incompatible in this instance (though potentially for very different reasons than you do).

http://blog.dustinkirkland.com/2016/02/zfs-licensing-and-lin...

(edit: When I wrote this comment, the person I was responding to had only left a relatively short comment, without the argument or the subsequent bullet points.)


They didn't conclude that the licenses were not incompatible, they concluded that ZFS for Linux is not sufficiently based on the Linux kernel for the licenses to interact.


Their conclusion was that zfs.ko, as a self-contained file system module, is not a derivative work of the Linux kernel but rather a derivative work of OpenZFS and OpenSolaris.


This is the biggest weakness of the GPL IMHO. The fluffy wording which is sometimes so vague only a court can really tell if something is allowed or not.

For example the GPL talks a lot about linking. Now how does that work with languages where there is no linking at build time (like Java)? The GPL's architects tells us that doesn't matter because its all in the same spirit. But that's just an opinion, you need a court to rule on that to be sure.


The CDDL release announcement[0] says: """ Thus, it is likely that files released under the CDDL will not be able to be combined with files released under the GPL to create a larger program. """

So the creators of the CDDL didn't think they were compatible.

What do you mean when you say "have come to the conclusion that they're unlikely to be incompatible"?

Do you believe one could combine the Linux kernel and ZFS into a single source tree and distribute binaries under the terms of the GPLv2?

[0] https://lwn.net/Articles/114840/


The judgement to settle the compatibility will take years and could go either way, costing billions of dollars to either company.

Risks are sky high and no sane company should invest in it. It's worse than being just incompatible.


>The judgement to settle the compatibility will take years and could go either way, costing billions of dollars to either company.

Who would the suit be between? Oracle and Canonical?

>Risks are sky high and no sane company should invest in it. It's worse than being just incompatible.

Companies don't seem to have a problem using Nvidia's binary kernel driver so I doubt it.


I don't think the CDDL gets breached when the two get combined, so Oracle is unlikely. Any Linux contributor could sue though.


Oracle, RedHat, IBM, Google, Facebook. Any telco or embedded hardware manufacturer who has significant revenues and ship their own Linux/Unix derivatives.

Canonical is the smallest company, it has the least to lose.


The author seems very confused about the provenance of ZFS. The ZFS that is in ZFS-for-Linux is OpenZFS, the open-source version, not Oracle's. OpenZFS is what's in FreeBSD and illumos and ZFS-on-Linux and so on and so on, and it is most definitely not under any form of Oracle control.


OpenZFS is a fork of the Oracle code. The (many) contributions to it since the fork are not owned by Oracle, but a good chunk of the code is.


So, just to be clear, the hand-wavy idea I got at some point that there were multiple major implementations is actually wrong?

As an aside, are there any independent implementations?


There's Oracle Solaris ZFS. Since Solaris has been killed off, this original variant of ZFS is finished unless relicensed.

OpenZFS is a fork, and is the upstream for Illumos, FreeBSD, Linux and macOS and all share the same on-disk format. The exact heirarchy, I think it's Illumos and FreeBSD as essentially co-equal upstreams, then ZoL (ZFS on Linux), and macOS is downstream of ZoL. I wouldn't call them different implementations.

zfs-fuse I think is based on the original. https://github.com/gordan-bobic/zfs-fuse


The lack of RAIDz expansion is a real deal killer now that its possible to expand normal linux RAID/LVM/filesystems.

RAID/filesystem expansion is probably the #1 thing that in the past was a huge PITA. These days its possible to reshape a linux array with a single add/grow option to mdadm after installing a new disk.



I don't really understand why this is such a big deal. Just use mirrored disks. What system do you really need to dynamically expand disk by disk.


Couple of small gripes: a. Using fdisk to undo partition is not the correct way to unpartition a drive. You should tear down the layers of a drive's storage in the reverse order they were created, by using wipefs to remove the signature for that layer, so wipefs each partition and then wipefs the drive to remove the partition map itself (there are two signatures on GPT partitioned drives). The suggested method leaves all of the signatures intact, fs and partition map, and can cause later confusion.

b. Near as I can tell, upstream GRUB supports ZFS. The article cites a 2011 freebsd thread. http://git.savannah.gnu.org/cgit/grub.git/tree/grub-core/fs/...


Has there ever been a serious community effort to relicense ZFS? Intuitively, that feels like it would have been a much more worthwhile effort from the start than trying to recreate it from scratch in the form of btrfs.

Is that a reasonable idea? What would it take for someone to go about it today? I would expect that it would be possible if every copyright holder (I'd guess mainly developers and their current/past employers, next of kin, estates, etc.) were asked to sign a document collectively consenting to having ZFS's license changed.

Or is there a reason that some current copyright holders might not want that? I get why Sun would've refused since it was a competitive advantage for Solaris, but what's stopping it from happening now?

This whole situation just seems really silly as I'm not sure what anyone has to gain from it.


Oracle holds the copyright on much of OpenZFS. They'll have to agree to relicensing it first.

With Solaris dead I'd generally agree that it's a little pointless to keep it incompatible. They might as well just place it under something like BSD or Apache 2, that should keep the FreeBSD and Mac OS X ports happy as well as opening up the possibility of inclusion in the Linux kernel. With Oracle Solaris dead, I think there's even a case for Oracle's bottom-line: they could start including ZFS with Oracle Enterprise Linux and boast about all the advantages of using it there.


> "... they could start including ZFS with Oracle Enterprise Linux and boast about all the advantages of using it there."

As the copyright holder, can't they do that anyway?

By not relicensing it, they keep others -- Canonical, Red Hat, etc. -- from being able to do the same thing.


The legal issues are annoying, but ZFS on Linux has been running quite well for years now. Not sure what the 'news' part is here.


The problem is that it isn't upstream, so having it as a rootfs is somewhat dangerous - you need a custom initramfs).


Works automatically on Ubuntu. I'm writing this on a Linux system with ZFS on root which has worked perfectly for nearly two years now.


Perhaps, but the initramfs is custom per distro anyway, and since most of the stuff is package-managed anyway (at least in the RPM and DEB versions) it is no diffrent from say, a LUKS hook.


If you are running a stable version, it works fine, yes.

However, having zfs outside the mainline kernel is a bit of a pain.

There are frequent breakage in kernel APIs, and the ZFS on Linux project always plays catch-up with these changes. Also, they have to maintain compatibility with various versions of the kernel APIs, and it's probably not an easy job for them.

I know it's asking for trouble, but at home, I'm running a server with ZFS under Debian Sid (so basically a rolling release), and each kernel update is a little risky. Each time I upgrade, I pray saint Linus, but that's not always enough, it happens quite often that the zfs or spl module fails to build.


See also: TFS: A file system built for performance, space efficiency, and scalability

https://github.com/redox-os/tfs

https://news.ycombinator.com/item?id=14386331


I'm hoping BcacheFS will be the next COW FS to take over this space in Linux.


Has anybody managed to run native encryption root and boot system with Linux yet? That would be really cool.


I tested that out for an embedded product. It works, and while some things could be better, the biggest hurdle is that since it's not stable yet it can't be shipped. A stable release could be supported for years, but native encryption being unstable means issues like this[0] would be showstoppers.

0: https://github.com/zfsonlinux/zfs/issues/6845


I've done ZFS on LUKS, pretty sure native encryption is still in the testing stage on Linux.


I had a lot of 'fun' trying to get a ZFS root on ubuntu a couple months ago.

Not my brightest idea.

Ended up having a small XFS root partition and putting everything else on ZFS. Works great, but I don't look forward to the day when my root filesystem fails.


Yep. I have a number of ZFS machines, and I always put root on ext4 (or ext4 on md), and just use ZFS for user data partitions.


ZFS root works fine on NixOS.

https://nixos.wiki/wiki/NixOS_on_ZFS


It is possible though, I set up a Ubuntu machine with ZFS root this christmas.


"Definite legal ambiguity remains with ZFS"

This seems to be the biggest problem with ZFS. Even Apple ditched ZFS on macOS.

ZFS is slowing down and impacting the real alternative: btrfs. IMHO Red Hat should be ashamed for abandoning btrfs. ‍️


I'm not sure I blame Red Hat all too much in this.

Both ZFS and btrfs are owned/driven by Oracle. When Oracle bought Sun (and ZFS), it seemed like the pace of btrfs slowed significantly. I'm not sure what Oracle really wants to do in this situation, but they are clearly in the driver's seat.

(This is also alluded to in the article)


It definitely did. btrfs might have been ready years ago otherwise, but instead... it languishes like a sad joke in comparison to ZFS.

The biggest problem btrfs has, really, is the user interface. Nothing is as easy to use as the zfs command!

This is still a pretty good summary of the differences: https://rudd-o.com/linux-and-free-software/ways-in-which-zfs... The only part that's outdated now is that btrfs finally has raid5/6 support... kind of. It's still marked experimental and shouldn't be trusted for production data.

Mind, btrfs does have one advantage: online shrink support providing much better resizing capabilities for its pools. This has always been on the backburner of ZFS; Sun once had a very slow implementation back around 2005-2006, but the Oracle acquisition pretty much killed any planned work on that front and I think most users of ZFS likewise don't consider it a high priority either. Even for the case of btrfs... it only works great when it's in the mood to. It's a bit easy to get into situations where it will absolutely refuse to remove devices despite there being plenty of space for it.


> Both ZFS and btrfs are owned/driven by Oracle.

While ZFS is owned by Oracle they certainly don't drive development. Oracle has their closed source version of ZFS used in the now dead Solaris. Then their is OpenZFS which is where the majority of research and development takes place and where Illumos, FreeBSD, MacOS and Linux developers upstream their work.


I'm not saying Oracle is driving development (of either btrfs or ZFS). That's the problem! Both projects feel very stagnant. You can't argue that OpenZFS has close to the amount of resources to put into ZFS that Sun did (pre-Oracle). While I run ZFS (FreeBSD and ZoL) and prefer it over pretty much anything.... ZFS is not nearly as popular as it might have been if it was released with a GPL-compatible license. At the time, this was intentional by Sun, who knew Linux was a competitor to Solaris, so I can't say that I blame them...


And, from one of the linked articles, SuSE is the biggest contributor to btrfs.

We've been running btrfs in production for our small shop for a couple years. We've had one incident, but not too many problems, really.

Though, next time, I might not be using btrfs RAID-1, and instead use the regular MD. I'd still want the on-disk checksums for data, and the snapshots, which are awesome.


But when Oracle bought Sun, I'm fairly certain Oracle was the driver. (Didn't they employ the initial developer?)

SuSE just doesn't have the same level of resources that Oracle can bring to bear. At the time, btrfs was a crucial feature for the next version of Linux because of ZFS. When Oracle bought Sun, that competitive pressure was gone.


Optimizing on the filesystem is interesting, but the underlying hardware should not be neglected -- and benchmarks should be done before anything else!

I recently needed a large and fast SQL database. I decided to start by doing some simple tests with dd and some more advanced ones with fio.

On a very decent AMD EPYC 7401P 24-Core with 128G of ram and 6 NVMe drives, I found that performance started declining once I added the 6th drive.

Based on my benchmarks, the most I can get out of this machine is about 4GB/s in read/write performance. This makes me suspect it is the speed limit from the CPU to the PCH.

So instead of this 1 server, I will deploy a cluster of smaller servers each equipped with 2 NVME and hope the network drivers do a decent job (just checked the dmesg: eth2: (PCIe:5.0Gb/s:Width x4)


The AMD Epyc 7401P you’re using has 128 PCIe 3.0 lanes, which can provide way more bandwidth than the 4Gbps figure you’re getting (8 Gbps per lane, so 1024 Gbps on paper). The AMD Epyc platform has been benchmarked at 57 GBps (bytes, not bits) of random storage bandwidth: http://www.legitreviews.com/one-amd-epyc-processor-reaches-5...

There’s simply no way a PCIe x4 NIC is going to be able to compete with the bandwidth available within that server.


I just can't get such results at the moment, so I am trying some creative solutions to get over the limit I'm hitting.

Would you have some suggestions to diagnose the issue better?


>So instead of this 1 server, I will deploy a cluster of smaller servers each equipped with 2 NVME and hope the network drivers do a decent job (just checked the dmesg: eth2: (PCIe:5.0Gb/s:Width x4)

What does that output suggest? Unless you have a 10Gbe card, this won't matter.


The output suggest to me that I should do more benchmarks to avoid another bottleneck, as I seem to be hitting PCIe lanes issues :-)

I also want to get my hands on a few intel Optanes to test how well VROC performs in practice.

FYI, here are some previous results where I configured partitions of similar sizes, then assembled them like:

(...)

mdadm --create --verbose /dev/md6 --level=stripe --raid-devices=6 /dev/nvme[0-5]n1p6

mdadm --create --verbose /dev/md5 --level=stripe --raid-devices=5 /dev/nvme[0-4]n1p5

(...)

On repeated benchmarks, the best results were consistently obtained with 5 drives:

Testing md5

randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16

...

fio-2.1.11

Starting 8 processes

Jobs: 8 (f=8): [w(8)] [100.0% done] [0KB/4397MB/0KB /s] [0/1126K/0 iops] [eta 00m:00s]

randwrite: (groupid=0, jobs=8): err= 0: pid=76554: Mon Feb 12 01:47:56 2018

  write: io=966497MB, bw=4027.4MB/s, iops=1030.1K, runt=240002msec

    slat (usec): min=1, max=3852, avg= 3.45, stdev= 1.49

    clat (usec): min=0, max=17533, avg=119.88, stdev=258.75

     lat (usec): min=10, max=17535, avg=123.40, stdev=258.76

    clat percentiles (usec):

     |  1.00th=[   12],  5.00th=[   13], 10.00th=[   15], 20.00th=[   20],

     | 30.00th=[   27], 40.00th=[   36], 50.00th=[   50], 60.00th=[   71],

     | 70.00th=[  106], 80.00th=[  167], 90.00th=[  270], 95.00th=[  366],

     | 99.00th=[ 1020], 99.50th=[ 1992], 99.90th=[ 3536], 99.95th=[ 3952],

     | 99.99th=[ 4960]

    bw (KB  /s): min=402136, max=748520, per=12.50%, avg=515423.89, stdev=44403.70

    lat (usec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=19.43%, 50=30.18%

    lat (usec) : 100=18.95%, 250=19.84%, 500=9.14%, 750=1.12%, 1000=0.32%

    lat (msec) : 2=0.52%, 4=0.45%, 10=0.05%, 20=0.01%

  cpu          : usr=17.59%, sys=50.20%, ctx=50843447, majf=0, minf=71

  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%

     issued    : total=r=0/w=247423104/d=0, short=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):

  WRITE: io=966497MB, aggrb=4027.4MB/s, minb=4027.4MB/s, maxb=4027.4MB/s, mint=240002msec, ma
xt=240002msec

Disk stats (read/write):

    md5: ios=0/247193645, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/49484620, a
ggrmerge=0/0, aggrticks=0/5476932, aggrin_queue=5571618, aggrutil=90.41%

  nvme0n1: ios=0/49483811, merge=0/0, ticks=0/5518696, in_queue=5613760, util=90.41%

  nvme4n1: ios=0/49480722, merge=0/0, ticks=0/5573820, in_queue=5663004, util=90.29%

  nvme1n1: ios=0/49485862, merge=0/0, ticks=0/5390388, in_queue=5483560, util=90.07%

  nvme2n1: ios=0/49488566, merge=0/0, ticks=0/5395484, in_queue=5496916, util=90.12%

  nvme3n1: ios=0/49484143, merge=0/0, ticks=0/5506276, in_queue=5600852, util=90.30%


4027.4MB/s (~32Gbps) with 100GB of random writes isn’t bad at all. Maybe you’re just maxing out the SSDs themselves, not the platform around it?


Unfortunately, with md6 (using 6 drives) the performance start degrading. It is perfectly reproductible:

(...)

Run status group 0 (all jobs): WRITE: io=963949MB, aggrb=4016.5MB/s, minb=4016.5MB/s, maxb=4016.5MB/s, mint=240001msec, maxt=240001msec

Disk stats (read/write):

    md6: ios=0/246593949, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/41128473, aggrmerge=0/0, aggrticks=0/4307426, aggrin_queue=4365219, aggrutil=89.21%

  nvme2n1: ios=0/41137801, merge=0/0, ticks=0/2860992, in_queue=2909124, util=81.31%

  nvme3n1: ios=0/41125375, merge=0/0, ticks=0/2861640, in_queue=2896800, util=80.94%

  nvme0n1: ios=0/41136785, merge=0/0, ticks=0/7713176, in_queue=7809532, util=89.21%

  nvme4n1: ios=0/41142237, merge=0/0, ticks=0/3141888, in_queue=3185880, util=81.73%

  nvme1n1: ios=0/41116666, merge=0/0, ticks=0/6495928, in_queue=6578204, util=87.12%

  nvme5n1: ios=0/41111977, merge=0/0, ticks=0/2770936, in_queue=2811776, util=81.20%
I would welcome any idea. Tell me your email if you want to play with the machine (there's nothing there yet, I was just setting up md5 for the next step: pgsql benchmarks)


It's not degrading, it's the same. You've reached a ceiling on something.

Try increasing the queue from 16 to 64.


Just tried changing a few things with a different benchmark after reading http://middoraid.blogspot.com/2013/01/tweaking.html, no change.


Stopped reading after the title.

RAID5 is long dead. It's too sensitive to data loss and it has performance issues. Use RAID 1 or RAID 10.


I can afford to lose the data. I'm doing RAID0.

The article was interesting for presenting simple tweaks and a nice table of the results.


You say 4 Gb/s here but in the thread below show 4GB/s, i.e. 8-10x faster depending on how you think about link speeds.

What happens when you test each drive individually? Do they all perform identically or is the 6th device slower? An array will be limited by its slowest member even if there is no bus contention.

Also, there are many tunable parameters and I am not sure the defaults are equivalent for RAID5 or RAID6 in the Linux-MD layer. What happens if you use all 6 drives in a RAID5 array? What happens if you use 5 drives in a RAID6 array? Did you try changing the chunk size to match the SSD page size? Are you sure the partitions are all aligned properly to avoid any fragmentation during SSD writes? There are countless variables here...


Sorry for the typo. I will edit the post to show an uppercase B. This is still at least an order of magnitude below the 57 GB/s reported by praseodym

This is for a raid0, as I am only concerned with performance here. The chunk size is the default 512K. I can try 4k, but I haven't found the Samsung NVMe SM961 page size. The partitions are of course 4k aligned:

Number Start (sector) End (sector) Size Code Name

   1            4096       209719295   100.0 GiB   FD00  Linux RAID

   2       209719296       419434495   100.0 GiB   FD00  Linux RAID

   3       419434496       629149695   100.0 GiB   FD00  Linux RAID

   4       629149696       838864895   100.0 GiB   FD00  Linux RAID

   5       838864896      1048580095   100.0 GiB   FD00  Linux RAID

   6      1048580096      1258295295   100.0 GiB   FD00  Linux RAID

   7      1258295296      1468010495   100.0 GiB   FD00  Linux RAID

   8      1468010496      1677725695   100.0 GiB   FD00  Linux RAID

   9      1677725696      1875384974   94.3 GiB    8300  Linux filesystem

  10              34            4095   2.0 MiB     EF02  BIOS boot partition
No drive is slower on independant benchmarks. I created the various partitions to also test that scenario by testing various combination of the 6 drives. Performance increases linearly until the 6th drive.

If you want to try, give me your email and I'll send you the root password to play with SSH. I was setting md5 for psql tests but there's nothing on the server yet, you can also play as much as you want with /dev/md6

You can also get a AX160-NVMe at Hetzner.de. If you cancel before 14 days, there is no charge.


That speed is surprising to me considering that the EPYC is supposed to have 128 PCIe lanes. I think that you should check the motherboard manual, if you have one, or contact the manufacturer, and see how the board's PCIe is laid out. An EPYC has no need to share PCIe lanes through a PCH and shouldn't have one for the NVMe drives.

It's possible that by placing cards in different slots you could gain more speed. There may also be some sort of setting to configure PCIe between 8 16-lane slots (accelerator cards) or 32 4-lane slots (NVMe drives).

If you're running a single-threaded application you may be hitting the limits of the DMA and memory controllers.


I am not familiar with the EPYC. I will check that.

I posted some benchmarks above using fio -- I think it it properly multithreaded.


But how do you distribute your SQL db? Read only followers?


If you go from one big server to a many smaller it's a different kind of problem ( sharding, ha, replication ect ... )


.


There's some ZFS whitepapers that explain the data structures in a code-free way ;)

Not to mention the GRUB driver. It's GPLv3 but I wouldn't be surprised if the GRUB project is willing to relicense it for a Linux kernel port.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: