
Reducing DRAM Footprint with NVM in Facebook - gwern
https://blog.acolyer.org/2018/06/06/reducing-dram-footprint-with-nvm-in-facebook/
======
bogomipz
>"Facebook swap 80GB of DRAM for 140GB of NVM. The lower latency of NVM is
therefore compensated for by the higher hit rate from having a larger cache."

Should this be "the higher latency of NVM"? I'm assuming the comparison is to
DRAM and not Flash in this sentence. Or am I misunderstanding something?

~~~
Rafuino
Someone pointed that mistake out in the blog comments. Yes, MyNVM has ~20%
higher P99 latency

------
theincredulousk
Maybe this is obvious to those working with this, but did something change
about what "NVM" means aside from "non-volatile memory", i.e. what "Flash"
memory is? Is the assumption that "NVM" == "NVMe" or?

~~~
geeio
NVMe is the spec for controllers supporting NVM.

Think PCI vs PCIe
[https://en.m.wikipedia.org/wiki/NVM_Express](https://en.m.wikipedia.org/wiki/NVM_Express)

~~~
theincredulousk
Thanks. I guess that still doesn't clear up why the whole article considers
"Flash" and "NVM" as two distinct technologies/concepts/whatever.

Edit: even on that wikipedia page:

"The acronym NVM stands for non-volatile memory, which is often NAND flash
memory"

I feel like I'm taking crazy pills here

------
Rafuino
I submitted this before as a link to the research itself about a month ago,
but here it is again just in case:
[https://dl.acm.org/citation.cfm?id=3190524](https://dl.acm.org/citation.cfm?id=3190524)

~~~
godelmachine
Even better → [https://research.fb.com/wp-
content/uploads/2018/03/reducing-...](https://research.fb.com/wp-
content/uploads/2018/03/reducing-dram-footprint-with-nvm-in-facebook.pdf)?

------
ggm
Why not just increase L1 Cache size, and avoid off-chip references? Sure, non-
volatile isn't met, but as bang-for-buck, the win would be many hundreds of ns
better surely?

~~~
z3t4
How do you avoid off-chip references? As mySQL do not use query caching by
default, only indexes, I guess most of the memory is HDD cache managed by the
file-system.

~~~
lathiat
That’s not actually true. With InnoDB the data is largely cached specifically
by the database in the Buffer Pool.

Of course they are using MyRocks here and while I don’t actually know I’d be
very surprised if it wasn’t the same. File system cache only really helps you
with reads in an ACID-ish system and write speed ups/journaling require a more
actively managed cache.

------
ksec
That is roughly $1000 Saving per server. But is it really worth the effort?
Even for Facebook.

~~~
pixl97
(How many servers they use it on * 1000)

If that is 1000 servers, then it's a million in savings. Also think about it
in the way of all the servers they have now + all the servers they'll use it
on in the future.

~~~
StudentStuff
Plus, strategies like what is described in the article will become more useful
as the usable bandwidth to NVMe improves. The write endurance limitations
won't disappear or see massive improvement, but NVMe is apt to keep droping in
price, and perhaps Intel will figure out how to manufacture XPoint at a
reasonable price, providing flash memory on the low latency, high bandwidth
DDR bus. Not holding my breadth on XPoint, but it would be a sea of change if
Intel can figure out its production issues (and price it significantly cheaper
than DRAM)!

[https://semiaccurate.com/2018/05/31/intel-dodges-every-
quest...](https://semiaccurate.com/2018/05/31/intel-dodges-every-question-at-
apache-pass-xpoint-launch/)

~~~
ksec
To answer the above comments.

1 Million for 1000 Addtional Servers. And that is precisely my point. A saving
of 1 million is a drop in the bucket for facebook. They will have to grow out
of their current additional capacity, and ADD another 1000 Database Sever to
achieve this saving. That is 3PB of Data, I doubt if they store video or
images in it. So this is still quite a large amount of data, even for
Facebook.

NVMe isn't going to see price drop, not in 2019 and 2020. Right now Intel
aren't making much money for Optane. So any yield or other improvement would
only increase Intel's margin. For it to see a significant price drop would
require some ground breaking changes. Which I doubt will happen in 2 years
time frame. Testing the NVMe on DIMM would be a much more interesting test.
For the 128GB Server price that could have 1TB of NVMDIMM, although I think
the speed would be decreased, the question is, would the speed still be
acceptable?

I think this is merely an exercise for Facebook engineers to play around with.

------
PaulHoule
I find this positioning for Optane to be a little scary.

If you replace one storage tier with one that is unambiguously faster (say HDD
with SSD) that is definitely a win.

If you add a new storage tier which is supposed to be cheaper than another
tier but also slower, you can't take it for granted that real life performance
is better. For Facebook which has talent on hand and a very large scale to
amortized R&D efforts it is one thing. But for the rest of the market it is a
very different.

The use of Optane for an HDD cache is a case in point. Although macers seem to
believe that Fusion drives are good, people in the PC space have been
disappointed in hybrid hard drives, to the point that I think they'll be
skeptical about Optane cacheing solutions.

A big part of it is that it is by no means certain that cacheing will provide
a perceptible improvement in performance. It is almost the ideology of the
corporation that nine women can have a child in one month, that throughput
matters, latency not so much. However, people perceive that their computer is
fast or slow in terms of latency, not throughput. In particular, it is 99% or
higher latency that causes your computer to "go to lunch", become
unresponsive, causes you to get frustrated with.

Caches don't help with 99% requests as well as they do at the median. If the
cache does anything to waste time, it could make it worse.

You could make the case that Optane has advantages over SSD as a cache medium
(for one thing you can write to it a byte at a time so you don't have to deal
with the overhead of an SSD which needs to write in 16 MB or larger pages) and
also Intel has been barking up the "storage cache in microprocessor firmware"
enough to work out the bugs even if they have discredited the concept in
people's minds so far.

Still, if Intel had decided to put an "Intel Inside" stickers on computers
that had Intel SSDs, they would have build up their brand because people would
experience Intel being associated with performance. Instead Intel is
associated in people's minds with logos, stickers, and brands that are
mindless and meaningless -- and with the stagnation of the PC platform even if
they don't understand the many specific own goals that Intel has committed
against it.

(Just for instance, PCs have been starved for PCIe lanes ever since Intel
switched to PCIe. I think the intention behind that was to kill off NVIDIA,
AMD and the PC gaming industry by not giving you enough bandwidth to really
use an advanced graphics card or card(s) Thanks to cryptocurrencies and AI,
NVIDIA and AMD have done well or stayed alive. However, Intel lost mindshare
to phones by removing a reason why you would buy a PC instead of a phone.)

~~~
Rafuino
It looks like the research team has submitted some commits for parts of their
work from the paper to be merged into RocksDB at large. If the work is shared
for others to use, it shouldn't be so scary.

[https://github.com/facebook/rocksdb/pull/2795/commits](https://github.com/facebook/rocksdb/pull/2795/commits)

Personally, I think research papers like this should open source their work
completely so others can replicate their results. That's how science is
supposed to work, after all.

~~~
PaulHoule
I don't think the research is scary, but Optane isn't the droid Intel is
looking for.

~~~
Rafuino
_If you add a new storage tier which is supposed to be cheaper than another
tier but also slower, you can 't take it for granted that real life
performance is better. For Facebook which has talent on hand and a very large
scale to amortized R&D efforts it is one thing. But for the rest of the market
it is a very different._

But you said the positioning is scary, seemingly because of the additional
work that's needed to make it work. If it's shared, does that not help?

~~~
lathiat
Facebook’s database team has many smart folks.

