

Improving Linux performance by preserving Buffer Cache State - osivertsson
http://insights.oetiker.ch/linux/fadvise/

======
binarycrusader
Be wary; the posix interface here is completely advisory. That is, the
specification doesn't require the implementing OS to actually account for the
advice provided via this interface. As a result, use of this interface may not
result in any actual change in system behaviour.

------
malkia
Btw, on Windows if you want to purge a specific file out of the cache, then
all you need to do is to reopen the file with NO_BUFFERING and/or OVERLAPPED,
and close it.

This could be verified with SysInternals RamMap.

~~~
ComputerGuru
I don't know that this accomplishes the same thing. It's not a question of
getting this file/data out of the cache so much as it is about not replacing
existing cache contents with this one.

Removing a file from the cache after the fact doesn't address the problem that
it kicked some other data out of the cache to take its place in the first
place.

~~~
malkia
I agree. But let's say that you have a list of lots of files that are to be
copied from the server to your machine. If one "frees" each file after
copying, then at least the biggest harm done would be the size of the biggest
file.

Now the copying itself could've been done by using NO_BUFFERING, but if it's
done by program you don't have access (or it's not straightforward copying,
but say rsync (DeltaCopy) or something like that).

It's not the same really as you are saying, but related somehow.

We had to do this at our studio, there was a process copying lots of fresh
sound banks for the game, and it was trashing the cache which is normally
filled with the game assets that are used during level building. Originally we
though of directly copying files using NO_BUFFERING, but because the app was
written in #C, it was a bit harder (and we didn't want to introduce
insecurities). So the programmer in charge, just added one more Open/Close
after the file was copied which was done with NO_BUFFERING - this purged the
file from the cache.

Obviously not going to work, if instead of many sound banks, it was one huge
taking all space. But since that was not the case, we took the opportunity.

~~~
ComputerGuru
Thanks for explaining the rationale behind such a use case. Sound reasoning
indeed.

------
paulsutter
This is a terrific post. I've looked elsewhere for specific information on how
Linux deals with posix_fadvise, and haven't found this clarity before.

~~~
cbsmith
The short answer is "poorly". If Linus complains that Linux apps that are
buffer cache sensitive end up using O_DIRECT (which in a lot of ways is worse)
simply because fadvise() and similar functions have never been done properly.

~~~
paulsutter
This post explains some ofthe subtleties that in the past had made me question
whether fadvise() worked at all. Admittedly, I still find unbuffered io to be
simplest and most predictable. But Linus' basic arguments against unbuffered
io have been reasonable, and I feel more resolved about the matter
understanding that fadvise() can be made to work.

How would you improve fadvise()? And what problems have you found with
O_DIRECT?

~~~
cbsmith
The problem with O_DIRECT is it pretty much puts each app in the business of
doing its own buffer cache, which bypasses the ability of the kernel to look
at the system holistically and make decisions about how to buffer data.
Compound that with fairly inconsistent contracts around the interface and it's
semi-synchronous behaviour... ick.

As to improve fadvise()? I'd like to see FADV_SEQUENTIAL (or perhaps a
variant) not just double the read ahead buffer, but also dump pages
immediately after they've been read unless there is another FD open somewhere
else (you can keep calling fadvise with FADV_DONTNEED, but that's lame on
several levels). I'd like to see semantics that make it clear to the kernel
that data you are writing to a file (particularly if it is in append mode)
likely won't be read for a very long time, so it can minimize polluting the
buffer cache with freshly written data. I'd like to see fadvise() calls that
specify a portion of a file _only_ effect the portion of the file. What'd be
REALLY nice would be a way to express "buffer part X and Y of the file, but if
you are under pressure, dump Y before you dump X".

