

Why buffered writes are sometimes stalled - sciurus
http://yoshinorimatsunobu.blogspot.com/2014/03/why-buffered-writes-are-sometimes.html

======
pas
To save others a few minutes with the stable pages patch, it's already in the
mainline kernel, visible in sysfs as /sys/block/*/bdi/stable_pages_required.

Also, on a sort of related note, aio is steadily progressing, so maybe we'll
see more reliance on it:
[http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.g...](http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=aio%3A)
and libaio too:
[https://git.fedorahosted.org/cgit/libaio.git/](https://git.fedorahosted.org/cgit/libaio.git/)

~~~
MrBuddyCasino
"When a dirty page is written to disk, write() to the same dirty page is
blocked until flushing to disk is done."

I'm not sure I got this, but this seems similar to vsync - making sure only
complete 4k pages are written to disk, and then flipping the buffer and
processing the next one.

But what error condition does this guard against? It seems this is only useful
in a non-journaled file system.

~~~
TheLoneWolfling
If you don't lock, you can get a situation where the file on disk is in an
inconsistent state until all writes have completed (for example, half of the
second write applied but the other half not, etc)

As far as I know, at least. If (when) I'm wrong, please correct me.

~~~
MrBuddyCasino
Yes, isn 't that a situation that a journaled FS should prevent? So for e.g.
XFS it should be redundant.

------
userbinator
Large "granularities" on storage devices always suffer this problem - whether
it's sectors on an HDD (which have silently transitioned from 512B to 4KB) or
blocks/pages in flash on an SSD. Perhaps it's become more prominent now that
the granularities have increased while small read/write operations are still
common.

(Aside: The autogenerated spam comments there are also strangely interesting -
they sound almost poetic.)

~~~
meowface
>(Aside: The autogenerated spam comments there are also strangely interesting
- they sound almost poetic.)

Given the right corpus and parameters, Markov chains can do a surprisingly
scary job at producting content that seems profound and/or humorous.

------
riobard
If the task is to just overwrite existing files without blocking, why not
mmap()?

~~~
rdtsc
As others have suggested, if mmap is so fast, wouldn't it be expected that
write/pwrite would just be mapped (pun intended) to mmap inside the standard
library? [Heck for all I know, that might already be the case]

I have had to actually write benchmark tests to show some people I work with
that there was no conlusive difference between the two (with our data access
patterns). mmap having the disadvantage that it allows you more ways to shoot
yourself in the foot (ever seen SIGBUS signals?). Before that they swore up
and down how mmap is this magic performance hack that was burried in there for
ages and only the elites know about it.

~~~
beagle3
My experience has been that mmap is not magical for writing in most workloads,
but it is mostly magical for reading - I have yet to encounter a real life
workload in which mmaping and using memory was NOT easier and at least as fast
as reading. On 32-bit systems, however, it's easy to run out of usable address
space - 64-bit makes it useful again.

~~~
_delirium
One workload where I find it worse: if my access pattern is sequential reads
from the start of a file (no seeking or random access), or could reasonably be
rewritten as such. In that case, using mmap() breaks some expected Unixy
flexibility, because it demands real files, while there's no reason in this
access pattern that your program should die if it finds a named pipe instead.
Of course you could test for pipe and provide an alternate read path, but then
you might as well just use that for real files too, instead of maintaining two
paths.

------
jzwinck
Significant typo: the second memset() in each pair needs to be memcpy()
instead.

