
How Bad Can 1 GB Pages Be? - jetlej
http://www.pvk.ca/Blog/2014/02/18/how-bad-can-1gb-pages-be/
======
StillBored
The whole article is based around the idea the machine is dedicated to one
large process, and that the working set fits in RAM. If that is the case,
great turn on boot time 1GB page preallocation and have don't look back.

If either of those two assumptions are false, then the TLB miss times are
swamped by the page in time. Paging a 1GB page in is _NOT_ a fast operation,
especially when only a tiny percentage of the data is going to be touched, or
its promptly going to be paged out again. If he has a machine with 32GB of ram
he should retest with a 64GB working set.

~~~
wfunction
You seem to be one of those people who thinks whether you have 1G, 2G, 4G, 8G,
16G, 32G, 64G, or 128G... of RAM, you'll still page out to disk eventually.

How much exactly do you consider to be the minimum 'enough' RAM where you
won't need to page out to disk? 128 exabytes?

~~~
StillBored
Paging to/from disk is currently a fundamental portion of all the major OS's,
and significantly increases ram efficiency (even for executable pages). I
still see a fair number of server applications oversubscribing RAM.
Particularly in DB, or VM'ed environments.

I'm not saying 4K pages are good and 1G ones are bad, but there are a fair
number of applications that probably benefit from something in between. 1GB is
probably on the extreme side of things.

If you have a 50TB database you cannot put it in ram on anything common. The
largest machine I've seen for sale takes 16TB of ram
[http://www-03.ibm.com/systems/power/hardware/795/specs.html](http://www-03.ibm.com/systems/power/hardware/795/specs.html).

That doesn't mean you need 50TB of RAM, because 99% of the records could be
inactive. Instead you let the hardware page things in, and the portions of the
database that are regularly used will stay in RAM, while the rest remains on
SSD. The arches with more page selection choices can actually be a big selling
point for non x86 servers in certain cases.

In the end, retained data is still growing, and a fair number of applications
don't fit into mapreduce or other partitioning schemes. So, I would say paging
is going to remain useful for some portion of the servers in existence for at
least a few more years.

BTW: I recently worked on an application which would pretty much eat as much
RAM (enormous hash table) as it was given and still ask for more. In the end
shipping a fairly normal machine (32G-64G) with a 4TB PCIe based SSD provided
sufficient performance that we didn't need to spend 100x on a machine that
could take 4TB of RAM. So there are economic arguments as well.

~~~
sakai
Can you share any more about your use case here? Intrigued, as a colleague and
I have recently been working on highly space-efficient hash maps for a
bioinformatics application that it strikes me could be relevant (similar
problem -- huge reference set that needs to be accessible with very low access
times).

Email is in my profile. Thx!

------
brandonhsiao
Site's down? I thought it was making a point by making the page 1 GB to
download.

------
karangoeluw
Site seems down. Here's the cached version
[http://webcache.googleusercontent.com/search?q=cache:Frz8Fde...](http://webcache.googleusercontent.com/search?q=cache:Frz8FdesJfwJ:www.pvk.ca/Blog/2014/02/18/how-
bad-can-1gb-pages-be/+&cd=1&hl=en&ct=clnk&gl=us)

~~~
marcosdumay
The cache does no work for me either...

Ok, from the comments, it's advocating for 1GB pages at the main memory. Not
cache, not disk, not network. Main memory.

For me it looks too big - entire servers will have about 32 pages, swapping
will take ages at 400MB/s disks. Current PCs use a too small page, but 4MB
seems a much more realistic number.

~~~
caf
The available page sizes are set by the hardware. On x86-64, your current
options are 4kB (4-level page tables), 2MB (3-level page tables) and 1GB
(2-level page tables).

------
etep
I think it is much simpler. 4 KB pages are small today, and 1 GB is still very
large for most processes. 2 MB sounds about right (gut check, how much memory
does the average process allocate, and for those small processes, is 2 MB that
much overhead really?). Unless the number of TLB entries drops as page size
increases, then larger pages make sense. It's simple: 1 GB risks wasting
memory when the process doesn't need that much. 2 MB is good in 2014.

~~~
msandford
I think it's a bummer that there aren't some option in between. I could see 64
meg pages being really nice.

Ultimately it would be handy to be able to tune page size for the loads that
you see. I could see page sizes jumping by 4x or 16x (2 bits or 4 bits) each
time being reasonable.

The real issue the author is talking about isn't "how much memory does a
process allocate" but rather "how many total pages does the OS have to keep
track of and what percentage of those fit in the TLB at any one time?"

~~~
etep
If the real issue is simply to minimize the number of pages to track (and
thereby maximize TLB hits) then its very simple: go to 1 GB (or higher!)
pages.

This hints that there is a tradeoff happening here. At the highest level, the
tradeoff is between having efficient use of memory and TLB hits. Big pages
give TLB hits, small pages make efficient use of memory.

Since the TLB is in hardware, it is more difficult to have the fine-grain
tuning you desire.

------
MrBuddyCasino
Interesting, if one ever needs to boost a memcache/Redis instance, this might
actually work.

But what about virtualization? Can I use 1GB pages in a guest OS, or will the
host OS still handle everything with 4k pages, nullifying any advantages?

~~~
etep
Hardware support for virtualization is actually one of the main things driving
huge pages. For a TLB miss in a guest, you end up doing a nested page table
walk. This is much more expensive with 4 KB pages (i.e. as opposed to 2 MB).

Short answer: huge pages are a big win for virtualization.

~~~
MrBuddyCasino
Thanks, but do I have to enable them on the host or the guest OS? Or both?
Sorry, I'm not familiar with the details of HW virtualization.

~~~
etep
Both would be best, but they are actually independent.

------
nickthemagicman
The day I fully understand this entire blog post is the day I will consider
myself in the leet category.

~~~
aaronblohowiak
Given a background in programming, you can ramp up on this in an afternoon.
Here's an intro "crib sheet"

Your application believes that it has all the RAM to itself. This is a lie
that the operating system and hardware tell your application to decouple the
physical RAM addresses and the ones your application uses (virtual RAM
addresses). Learn more about virtual memory here:
[http://en.wikipedia.org/wiki/Virtual_memory](http://en.wikipedia.org/wiki/Virtual_memory)

In order to keep this mirage working, the computer needs to map from virtual
address to physical address. Instead of tracking every single address, it
tracks spans of addresses. So, the address your application sees as 0 to 4096
will map to physical address 5000 to 9096. Keeping this map using fixed-size
spans keeps the size of the mapping down and the performance fast.

This article is about using bigger spans (0 to about 1 billion) instead of the
standard 4kb. The advantage of this is that the mapping from virtual to
physical is stored in memory as a tree and bigger spans mean you need fewer
nodes in the tree. Fewer nodes means you have less traversals/indirection to
find the node you are looking for. Less work means faster performance.

The details about the caching and the counts of TLB in the processor has to do
with how much dedicated space is in different parts of the CPU for this
mapping information.

The details about offsets and changing how the memory was accessed in order
get positive / negative performance in the tradeoff of 4kb vs 1gb have to do
with wether the mapping information was in the cache or not. it is similar to
alignment:
[http://en.wikipedia.org/wiki/Data_structure_alignment](http://en.wikipedia.org/wiki/Data_structure_alignment)

A lot of the obscure parts of the code are just how the author is calculating
addresses to read using pointer arithmatic
[http://en.wikipedia.org/wiki/Pointer_(computer_programming)#...](http://en.wikipedia.org/wiki/Pointer_\(computer_programming\)#C_and_C.2B.2B)
and bit-shifting
[http://en.wikipedia.org/wiki/Bitwise_operation](http://en.wikipedia.org/wiki/Bitwise_operation)

Finally, in order to use these 1gb maps instead of 4kb maps, the programmer
has to leverage special way of allocating memory from the operating system
called mmap
[http://en.wikipedia.org/wiki/Mmap](http://en.wikipedia.org/wiki/Mmap)

------
zippie
Supporting benchmarks that might be interesting alongside the OP:

[https://github.com/johnj/llds#wall-timings-in-
seconds](https://github.com/johnj/llds#wall-timings-in-seconds)

Same concept applies, reducing translations for page lookups reduces latency.

------
blueskin_
I was wondering if they meant pages as in websites. I was thinking "That's a
lot of superfluous javascript rubbish to load...".

The site is loading as if the page was 1GB though.

------
theon144
I admit I was a bit afraid to click this link on mobile...

------
lafar6502
1GB pages are great if you don't mind waiting 10 minutes for page being
flushed to swapfile.

------
stickhandle
Answer: Bad. Baffled by the 25 upvotes on the article (?)

~~~
dllthomas
Analyzing points outside typical (or even desired) use can be informative.

------
isaacb
Not too bad, actually. For a moment I was thinking about how it might be
useful to embed large datasets directly in the page, but it wouldn't be even
remotely worth the sacrifice in usability. Just make one extra HTTP request
and give the user a nice spinny icon.

~~~
effn
This article is not about web pages.

~~~
isaacb
Oh, then it just loaded extremely slowly for some other reason... Obviously I
didn't actually read the article before commenting. :P

