
ZFS on Linux still has annoying issues with ARC size - protomyth
https://utcc.utoronto.ca/~cks/space/blog/linux/ZFSOnLinuxARCShrinkage
======
twic
There's a good lesson in operability here: log reasons for decisions!

I have had to re-learn this lesson over and over again with my own software.
"Tom, why did the system just do that?!" scream my users. "Er, let me check",
i respond, already feeling that sinking feeling. "EDUNNOMATE" says the log.
So, i add some logging around the decision (the data feeding into it, the
choices made, the actions resulting), redeploy, and wait for my users to start
screaming again, hoping that this time, i will be able to give them an answer.

------
mapgrep
The author calls ARC autotuning “opaque” which surprised me given a well
regarded paper has been published on it:

[https://www.usenix.org/legacy/events/fast03/tech/full_papers...](https://www.usenix.org/legacy/events/fast03/tech/full_papers/megiddo/megiddo.pdf)

[https://youtu.be/F8sZRBdmqc0](https://youtu.be/F8sZRBdmqc0)

...that said the author has been writing on the ARC for more than 10 years
judging from his blog links so perhaps that paper did not answer his
questions.

------
higels
Before I ditched ZoL for persistent storage for a few hundred NGINX caches, I
saw this behavior too.

Setting zfs_arc_min to something like 50% of arc_max stopped it from dumping
the ARC every 10 minutes.

YMMV.

~~~
m11r
Out of curiosity, what did you happen to move to (and why)? Back to ext4/xfs,
or to Btrfs or something else more involved?

~~~
higels
I just mount each SSD as its own XFS filesystem and use NGINX’s split feature
to fill them up.

Not resilient on a system level, but refilling the cache is cheap.

ZFS was generally pleasant from an operability viewpoint once we ironed out
the quirks, but the perf hit from no sendfile was too much.

~~~
m11r
That’s a good point; I never looked at what the perf impact of disabling
sendfile would be on even a moderately-loaded webserver.

It’d be really nice to see that fixed like the recent DIRECT_IO additions.

~~~
Quekid5
Of course the people around these parts tend to have very particular needs and
use cases, but for anything resembling the "common case" the performance
impact of not using sendfile should be negligible.

(I'll just point of that using sendfile means that traffic is unencrypted...
which is probably fine on an internal network, but I've started adopting the
stance that even internal network traffic should be encrypted unless there's a
very good reason not to do that. An absolute requirement for performance might
be a good reason.)

~~~
atomt
If nginx decided to support ktls they could use sendfile for encrypted traffic
as well. Unsure if it is worth it just to make sendfile work however.

~~~
m11r
I was going to mention kernel TLS hopefully enabling sendfile for mostly-HTTPS
workloads, as that’s the direction everything is heading anyway, and without
it we don’t get zero-copy for those connections.

Now I’m more curious about the actual threshold where not having sendfile
begins causing noticeable performance problems… at what point before you
become Netflix?

~~~
namibj
If your cache can face-tank a HTTP-DDoS, you don't need fragile fingerprinting
techniques to distinguish bad from good, thus reducing the user impact (less
accidentally-blocked users). The less cost you have for filling that 100 Gbit
NIC with your TLS cache traffic, the more boxes you can afford. Internet
exchanges are surprisingly cheap to connect to.

Of course sharing resources between a couple services would be good, as NICs
and switch ports are sill a way from free.

------
sneak
I have very weird read performance issues on read using the stable ZoL in
current Ubuntu LTS, on a box with over 200GB of ram and a few TB of L2ARC fast
flash.

The default settings for L2ARC fill rate are also super low.

I haven’t had time to track down exactly why it’s so slow, yet.

~~~
blackflame7000
What’s the topology of your array? How many disks? L2ARC doesn't help with
that much ram because your main memory will be faster than even mirrored nvme
caches

~~~
sneak
8x 10TB HDD, 4x 512TB flash, all 6gpbs SATA. 256GB ram, 40 cores. The HDDs are
all in raidz2, with the SSDs all as L2ARC.

I have an ubuntu mirror on the machine that's around 150gb, and doing a `tar
-c $MIRRORPATH | pv > /dev/null` shows lots of reads from the HDDs, even on
second, third, fourth runs. It confuses me.

------
kissgyorgy
I have the same problem. Sometimes it just drops the whole cache suddenly and
I don't have a clue why:
[https://walkman.cloud/s/zXLp7DF9sDFwr7z](https://walkman.cloud/s/zXLp7DF9sDFwr7z)

~~~
rincebrain
You may find it informative to graph MRU/MFU - I suspect you will find that
the MRU is being dumped. [1]

I personally can't decide whether I think it's a bug or not, since if the MRU
is all old items there is an argument to be had that you don't want it in
cache any more...but dumping 100% of it strikes me as a bug either way. :)

[1] -
[https://github.com/zfsonlinux/zfs/issues/7820](https://github.com/zfsonlinux/zfs/issues/7820)

------
cracauer
I still have this bug.
[https://github.com/zfsonlinux/zfs/issues/8396](https://github.com/zfsonlinux/zfs/issues/8396)

Page faults from NFS client side aren't served by the server when they should
(readonly map, reading a page). I could imagine this is related.

