
Remove PG_ZERO and zeroidle (page-zeroing) entirely - protomyth
http://lists.dragonflybsd.org/pipermail/commits/2016-August/624202.html
======
jdub
When a program needs memory, the operating system maps a physical page of
memory (4KB in most cases) into its virtual memory space.

That page might have been in use by a different program only moments before,
so the new program could go poking around in it to find interesting stuff.
Like SSH keys or passwords or whatever. So the operating system has a
responsibility to tidy up pre-loved pages before giving them out.

Once upon a time, filling 4KB with zeroes was a costly operation. It took a
bunch of CPU cycles, and worse, would have trashed memory caches closer to the
CPU.

So operating systems tend to have queues of discarded pages, background
threads to zero them (when nothing more important is happening), and queues of
zeroed pages to hand out when a program wants more memory.

This change is DragonFly BSD saying, "Fuck it, we'll do it live". It turns out
that with the speed of modern CPUs and the way memory caches work today, they
reckon it's faster to just zero the memory synchronously, when the program
requests it. No more queues, no more background work, no more weird cache
behaviour.

~~~
vardump
> Once upon a time, filling 4KB with zeroes was a costly operation. It took a
> bunch of CPU cycles, and worse, would have trashed memory caches closer to
> the CPU.

Even Intel 486 can do that in 50 microseconds. _Without cache_. Probably in a
few microseconds assuming L1 cache. So I wonder if the assumption has ever
been true.

~~~
gumby
> So I wonder if the assumption has ever been true.

This fear went back to the PDP-11 in Unix and even predates Unix. It was true
even when disks were really really slow, as CPUs had small or no cache and
lacked multiple functional units, much less hardware multithreading. A big
PDP-10 mainframe might still have been .7 VAX MIPS -- the first one I
programmed's only "high speed" memory was its DTL registers -- core was
literally core memory.

You can get a feel for what a big deal this was in the fact that it felt
radical and expensive for Stroustrup to decide that all new objects would be
zeroed. Or you can see in the original posting that there was special memory
handling support for special devices -- finally eliminated by this patch!

~~~
vardump
> You can get a feel for what a big deal this was in the fact that it felt
> radical and expensive for Stroustrup to decide that all new objects would be
> zeroed. Or you can see in the original posting that there was special memory
> handling support for special devices -- finally eliminated by this patch!

That's a pretty different issue. Zeroing freshly allocated memory in usermode
causes all pages to become committed. Actual zeroing won't take much time, but
getting all those pages physically mapped... that's another issue. On memory
pressure, that can take arbitrarily long time.

~~~
DSMan195276
I think that your concern is valid, something that a lot of people aren't
aware of, but it probably doesn't really apply in a lot of cases most people
deal with. Unless your dealing with entities that take up a page or more, then
chances are the rest of the page is being used by something else. If you write
to anything in the page, then the entire page has to be committed, and if the
page is reused by the allocator then it is also probably already committed. So
the chances you can avoid commuting memory for small objects seems likely, and
it probably not really worth worrying about.

Now, if you're instead allocating large objects, or allocating a large array
at one time, then in theory the allocator would get some fresh memory for that
large entity (Assuming no piece of memory exists that can fit it). It should
be smart enough to know that the OS will give back zeroed memory and avoid
zeroing it a second time, so those pages don't have to be committed right
away.

All that said, if such a thing is actually a worry then you should probably
skip the default allocator move to an interface like `mmap()` anyway, which
can provide pages of memory that you can be sure are untouched and read as
zeros.

~~~
gumby
That's right; my point was that the time to zero memory was considered
significant overhead regardless of where it occurred.

> I think that your concern is valid, something that a lot of people aren't
> aware of, but it probably doesn't really apply in a lot of cases most people
> deal with. Unless your dealing with entities that take up a page or more,
> then chances are the rest of the page is being used by something else.

vardump's objection addresses VM pressure, to which you respond. But zeroing
objects on allocation is likely to have negative implications on the cache,
slightly mitigated only in the case where you immediately initialize nonzero
values. RAII can in theory alleviate some of that but I am unaware of any
compiler that does this.

> All that said, if such a thing [large objects] is actually a worry then you
> should probably skip the default allocator move to an interface like
> `mmap()` anyway, which can provide pages of memory that you can be sure are
> untouched and read as zeros.

... zeroed by the mechanism of this patch!

------
blinkingled
DragonFlyBSD is slowly but steadily becoming more attractive as a
Linux/FreeBSD alternative. 4.6 has good support for various newer Intel GPUs,
hammer is stable and the SMP and Networking performance is stellar.

I had a 2.8TB backup archive that I have repeatedly tried to dedup with ZFS
and the realtime while-you-are-writing architecture of ZFS dedup just kills
the performance so badly that I have never actually succeeded going over the
full dataset. Past week I made a 4.6-rc2 based DragonFly VM, attached a 3TB
disk with RDM, formatted it with HAMMER, started rsync of the backed up data
and ran hammer dedup every 30 min - I am now at 1.9TB used after the whole
thing was done and a final dedup ran - no noticeable speed drops and I got
around 900GB back!

~~~
rincebrain
Did you try something like bup? (Of course, depending on your use case, that
might not work out so well.)

Not that I have anything but respect for DFBSD, but I'd have suggested trying
a userland archiving tool with dedup before spinning up a VM + FS in it.

~~~
blinkingled
Yeah I did try it on btrfs IIRC - but it's far too complicated (as opposed to
hammer dedup where everything is included and it just works) and doesn't cater
to my use case too well - I have multiple machines backing files up to a SMB
share with lot of potential for duplicate data. I also need to be able to
access those backed up files back over SMB. Just making a HAMMER mount point,
running dedup on it via cron job at night and exporting it via Samba does
everything I need.

~~~
rincebrain
Yeah, that's much nicer for your use case than bup.

(I should point out that Windows apparently has similar after-the-fact dedup
capabilities on NTFS in Server 2012/R2 and up [1], though I suspect you'll
find DFBSD much easier to run with low overhead in a VM.)

[1] - [https://msdn.microsoft.com/en-
us/library/hh769303%28v=vs.85%...](https://msdn.microsoft.com/en-
us/library/hh769303%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396)

~~~
blinkingled
It's not present in the R2 essentials sku though - so cost is also a major
factor.

------
vardump
The main takeaway from this:

    
    
          - Pre-zeroing a page only takes 80ns on a modern cpu.  vm_fault overhead
            in general is ~at least 1 microscond.
    
          - Multiple synth and build tests show that active idle-time zeroing of
            pages actually reduces performance somewhat and incidental allocations
            of already-zerod pages (from page-table tear-downs) do not affect
            performance in any meaningful way.
    

Not surprising. I've seen so much code with comments explaining how expensive
copying or zeroing memory is (but not actually measuring it), that that's why
"zero-copy" techniques are used. Ending up doing so much work (spending even
microseconds) to "save" the cost that turns out to be 100 ns or less.

------
Animats
Fear of zeroing is related to fear of copying. One of the big arguments
against message-passing systems is that there's extra copying. But today,
copying is usually cheap, especially if the data was just created and is in
cache. On the other hand, futzing with the MMU to avoid copying usually
results in lots of cache flushes and is a lose unless you're remapping a huge
memory area.

This is a total reversal from the situation back when, for example, Mach was
designed. Or, for that matter, Linux. It makes message passing microkernels
faster than they were back then.

It helps to do copying right. If a user process does something that causes the
operating system to copy user-created data, the copy should take place on the
same CPU, where the cache is current. These sorts of issues are why message
passing and CPU dispatching have to be integrated to get good performance from
a microkernel.

------
milcron
I always love net-negative commits.

~~~
unixhero
So zen

~~~
milcron
The best code is no code!

Related, one of my favorite programming stories: the 0-byte program
[http://peetm.com/blog/?p=55](http://peetm.com/blog/?p=55)

------
duaneb
I wonder if this has issues with recent research showing reducing the
temperature of memory allows reading it after a restart—swap computers to
dodge the fault and read away.

------
tener
Interesting, anyone knows if Linux does any of this?

~~~
jdub
Yes, Linux (and Windows for that matter) does everything DragonflyBSD just
removed.

~~~
bboreham
Windows does more of it, in the sense the heap manager likes to give freed
blocks back to the OS, whereas UNIX programs traditionally never gave memory
back until program exit.

I had a massive production outage a few years back involving this behaviour on
Windows; symptom was 100% CPU usage in-kernel on exactly two threads and no
other processes doing anything much. Still recall that as one of the most
interesting bugs to track down.

~~~
viraptor
How did you track it down? MS's performance debugging toolkit? (Can't remember
the actual name) Or something else?

~~~
bboreham
Hours and hours of narrowing down the symptom from a program that would
recreate it, plus disassembling the Windows heap manager and tracing through
it in the debugger.

------
daveloyall
I don't get it.

Edit: Ahh, now I get it.
[https://news.ycombinator.com/item?id=12228540](https://news.ycombinator.com/item?id=12228540)

~~~
protomyth
I thought it was interesting for:

    
    
      Remove the PG_ZERO flag and remove all page-zeroing optimizations,
      entirely.  Aftering doing a substantial amount of testing, these
      optimizations, which existed all the way back to CSRG BSD, no longer
      provide any benefit on a modern system.

~~~
cgag
I think for many people without any context that explanation isn't quite
enough. jdub's comment explains it quite well though.

~~~
protomyth
The joys of linking to the original source. Slashdot does have one interesting
advantage over HN in that putting a large explanation, often with multiple
links, is expected when submitting something. I probably could have found a
blog or article with a fuller explanation, but then you get into accusations
of submitting "blog spam". I figured this is HN and someone like jdub would
come along and write a great explanation if the topic was worth consideration.

------
wolf550e
EDIT: Wrong thread. Meant to post here:
[https://news.ycombinator.com/item?id=12227507](https://news.ycombinator.com/item?id=12227507)

~~~
tomjakubowski
That's "PG" as in "page" not as in "postgres". It's a Linux kernel patch.

~~~
Sanddancer
Not Linux, DragonflyBSD.

------
ryuuchin
If I understand this change correctly it removes background zeroing of freed
pages.

What effect will this have on security? Grsecurity adds a feature which allows
the linux kernel to sanitize freed pages[1]. I realize that the idle time
freeing isn't the same as the immediate sanitation that grsecurity offers but
I'm curious if this change has any effect on security in this manner.

[1]
[https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity...](https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Sanitize_all_freed_memory)

~~~
PDoyle
Nope. I draw your attention to this paragraph:

> Zeroing the page at fault-time...

