
RAM Is the New Disk - nikhilgarg28
https://medium.com/thought-frameworks/ram-is-the-new-disk-b94df9bbd7b
======
contingencies
Well yes, I think RAM has been the new disk for awhile now, and not because
(anecdote about database disk structures) or (any recent change to cost of
RAM).

If you use Linux, the fastest way to test how much faster your application is
off disk is to simply make a filesystem in RAM, and run the whole thing from
there. Because library-chasing to build a chroot is a hassle, I would
recommend simply putting a container on a RAM-backed block device, then
installing your application on the container.

I have personally designed, built and managed large clusters of diskless
machines and find that the mix of RAM-only and PXE[1] boot is an excellent one
for maintaining state (and security) across well managed infrastructure. Disks
be damned. For permanent storage, consider sharing a DRBD[2] cluster from
dedicated nodes.

[1]
[https://en.wikipedia.org/wiki/Preboot_Execution_Environment](https://en.wikipedia.org/wiki/Preboot_Execution_Environment)

[2] [https://en.wikipedia.org/wiki/DRBD](https://en.wikipedia.org/wiki/DRBD)

~~~
old-gregg
I don't recommend this anymore. With a typical developer machine containing
16GB of RAM, and especially on Linux, you will that all of your daily-touched
files are in FS cache after a few minutes of work. Even with default kernel
settings Linux is pretty good with eating up all of your unused RAM for
speeding up disk access.

Here's my anecdote based on 16GB workstation with NVMe SSD (Samsung 960 Pro):

Watching my project compile I occasionally open iotop in another terminal and
don't see anything above occasional write flushes. To confirm, I did create a
tmpfs volume and did not observe any improvement. `free` reported my buffers
to be at ~4.7GB, which is basically all of my /bin, /usr and all of Golang
sources+libs.

~~~
FeepingCreature
One memory leak and your cache is gone.

[edit] Not sure if ramdisks are pinned though.

~~~
ars
> Not sure if ramdisks are pinned though.

Ramdisks will go to swap. A memory leak will force the entire ramdisk into
swap, and reading it back into memory afterward is 10 to 100 times slower than
reading normal files off of a disk.

~~~
majewsky
> Ramdisks will go to swap.

Assuming that you have swap. I don't; I want my SSD to stay alive.

~~~
ars
If you have a memory leak, and a ramdisk, and no swap then the OOM killer will
trigger.

Hopefully it will target the program with the memory leak, but this is not
guaranteed.

Swap is useful because you can shift unused memory onto disk. There are many
programs that allocate (and write) a lot of memory that they never afterward
use.

By having swap you make more room for cache in memory.

SSD doesn't matter here - this is not swap thrashing, but rather occasional
writes.

------
arielweisberg
...

I was the third engineer at VoltDB and spent six years making that bet. It's
not a good bet.

Maybe there are other factors, but if VoltDB could page out cold data to disk
I think it would be at least 2x if not more successful. No one agreed with me
so it never happened.

I saw so many use cases go out the door because hey you know what? RAM is
expensive and it's cheaper to page out cold data. The scale where that cost
starts to matter is not that big.

~~~
AndyNemmity
Have spent 6 years I think working on SAP HANA. The one feature I've always
asked for is seamless paging of even warmish data to disk.

In memory is fast and awesome, but it doesn't have to be as mind boggling
expensive as it is. Why are we all making the same mistakes?

~~~
Pamar
I would really like to hear something about your experience with SAP HANA. Do
you have a blog or anything you could share?

------
tw04
The price of disk has dropped at nearly the same pace as ram. As has the cost
of compute. At the same time data growth has increased faster than either has
dropped... so I'm not really sure the price argument holds water. If I can buy
ram at 1/100th the cost but I need to store 500x more data... that isn't a net
win on cost.

From ~$1000.00/gb to $0.03/gb

[http://www.mkomo.com/cost-per-gigabyte-update](http://www.mkomo.com/cost-per-
gigabyte-update)

~~~
runeks
It would be interesting to see that chart updated to 2017 data. It appears the
downward slope becomes significantly less steep around 2009 (looks like the
price dropped as much from 2006-2008 as it did in the five years 2009-2014),
and I’d be interested in seeing how recent SSD prices affect this. As far as I
can see, rotational HDD technology is at the end of its S-curve, whereas SSD
technology is still relatively new.

~~~
notpeter
Backblaze recently updated that data through Q2 2017 [1]. Hard drives have
only dropped to $0.028/GB. For comparison SSDs are still ~$0.40/GB (~14x).

[1]: [https://www.backblaze.com/blog/hard-drive-cost-per-
gigabyte/](https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/)

[2]: [https://pcpartpicker.com/trends/internal-hard-
drive/](https://pcpartpicker.com/trends/internal-hard-drive/)

------
sumanthvepa
The problem with asking a programmer to keep track of the locality of their
data, is that most modern programming languages make reasoning about locality
hard to do. With the exception of C and C++. Even for those languages, unless
all relevant data is in simple arrays, making assertions about locality is
hard.

For interpreted languages like Python or Javascript, figuring out RAM storage
and access patterns of data is very hard. So we probably need programming
language mechanisms to help with understanding the locality patterns of our
programs and probably tooling to help change it.

~~~
humanrebar
Even for C++, most designs are OOP, which treats data layout as an
afterthought.

------
0xbear
And 99% of developers spend their entire careers without giving as much as a
passing thought to cache locality. Quick, how long, in cycles, does it take to
retrieve data from RAM? About 200 cycles. 200 cycles is a very long time if
you miss cache often. Scattered RAM reads can be _slower_ than sustained
linear disk reads (that is, once the disk actually gets around to reading,
which takes a while).

~~~
flukus
90% of developers are working in languages where you can't really do much
about cache misses, or doing so will at least involve some very non-idiomatic
code. If you can't do much about the problem it's not really helpful to be
thinking about it much.

~~~
0xbear
I don't disagree. And for 90% of them worrying about cache locality or branch
mispredictions on a daily basis would be a waste of time. It's fine to
deliberately ignore such concerns. It's somewhat less fine to know absolutely
nothing about how programs are actually executed, and what makes them go fast.

------
adrianratnapala
If a terabyte is a lot of data to you -- and it is for many, many things, then
this post is right; you should buy as much RAM as you have data, and access it
accordingly.

The commenters who are saying disk has a different price/performance trade-off
that is still valuable are also right, but that applies to large data sets.

~~~
AndyNemmity
I worked on a petabyte in memory hana cluster. It all depends on what you're
doing, and how important it is to you.

I don't even know what a large data set is anymore. I think my general
definition is one you won't put into memory, whatever your threshold is for
that.

------
vbezhenar
May be with very high-end servers it is. But generally it's not. I can buy 4TB
HDD For $200. I think, I'll have to add 2-3 zeros for 4TB RAM machine, and I'm
not even saying only about RAM, I need some server motherboard, some server
processor, while I can use 4TB RAM HDD with pretty much any computer. And SSD
isn't going to be even with HDD in near future as well for $/Byte. So
optimizing software for HDD won't go anywhere. But, of course, it's awesome to
have some alternatives if you have money and need more performance.

~~~
zeusk
NVMe drives which are approaching RAM speeds are a good compromise if RAM and
server components are outside of your financial reach.

~~~
dis-sys
many NVMe drives on the market are useless jokes. try some from the now
biggest semiconductor company, test their fsync() performance and don't get a
heart attack for seeing those ugly numbers. ;)

~~~
olavgg
This!

[https://forums.servethehome.com/index.php?threads/did-
some-w...](https://forums.servethehome.com/index.php?threads/did-some-write-
benchmarks-of-a-few-ssds.15231/)

~~~
dis-sys
hi olavgg, I was searching for fast fsync & low cost NVMe SSD a few months
ago, so I looked into consumer NVMe SSDs. Samsung 960 Pro was the first I
tested, the results were just shockingly bad. It was so bad to the extent that
I started to question whether my kernel/installation caused the slowness
issue. Searched online and found a few of your posts talking about the exact
same problem you saw. That saved me quite a bit time. :)

Yes, totally agree with the conclusion in your link above, consumer SSD (NVMe
or not, high end or cheap) doesn't worth a dime. Cheers!

------
olegkikin
Are you sure latency stayed around 100ns?

[http://pics.crucial.com/wcsstore/CrucialSAS/images/campaigns...](http://pics.crucial.com/wcsstore/CrucialSAS/images/campaigns/c3-speed-
vs-latency-table.png)

~~~
nikhilgarg28
That's very interesting. The estimate of 100ns came from here:
[https://people.eecs.berkeley.edu/~rcs/research/interactive_l...](https://people.eecs.berkeley.edu/~rcs/research/interactive_l..).
and is probably not very precise (maybe because it is only capturing rough
order of magnitude). I have now updated the post. Thanks for the feedback!
Specific constant aside, the point about latency not improving much still
holds.

~~~
nayuki
Your link got truncated with ellipses. Here we go:
[http://www.eecs.berkeley.edu/~rcs/research/interactive_laten...](http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html)

------
sand500
Ive wondered in estimation how much of "my" data is stored in various types of
memory. CPU cache if I am actively browsing a website, those last couple IMs
stored in the RAM of some server? Any content i have ever uploaded to the
internet is probably on a hard disk ready to be brought to cache at a moments
notice. Then there are all the backups on tape.

------
WalterBright
Extensive use of "RAM" disks was commonplace in the 1980s.

~~~
slackingoff2017
And old is new again :). The ratio of ram to disk cost has historically varied
wildly. The pendulum will come around again in a few years when huge SSD's are
cheap.

I have a feeling with multi terabyte SSD's at cheaper prices we'll be
shuffling all our data back to "disk" again :).

------
toast0
I think the insight here may be that ram should be optimized for sequential
access, just like the disks of old.

~~~
ajross
It is, that's what "fast page mode" was in the late 90's and what prefetch
queueing is in DDR.

------
godelmachine
This might be the single most important article I have read in the past one
week, because Adrian Colyer is on vacation! May I add that even SAP HANA is
designed for in memory computing? As far as disks are concerned, NVM should
soon replace them.

------
jlebrech
with "serverless" applications you can read the whole app into memory and run
it and then clear it for the next app, which i'm sure speeds things up.

~~~
icebraining
That's not how it works; programs are kept "warm" for some time after each
requests, or indefinitely (e.g. in App Engine you can choose dynamic or
resident instances).

~~~
kthejoker2
Just chiming in to say this is also true for Lambda and Azure Functions.

What Id really like is for them to scale up to full fledged VMs once some
usage or performance threshold was hit.

------
amelius
If only programming languages supported "offsetted pointers", we could use
mmapped files and store arbitrary data structures in them without hassle.

~~~
icebraining
Many programming languages use references to other objects liberally. Wouldn't
it be hard to keep it all contained so that you could restore it later?

------
sriram_iyengar
Excellent insights. If cost is not a factor, how does improvements in SSD
space stacks against RAM ? Thanks

------
jsudhams
Not sure i agree, on Server spec the ram cost 10% more than CPU like if CPU
cost 2000 then ram cost would be like 2200. Also it is not scalable for amount
of data and not sure if I agree on laptop as well , 8gb ddr3 is about $80
while I can get 128gb ssd or 1tb magnetic disk so really can't use memory
instead of disk. Except in few cases

------
drudru11
This post is 10 late

------
rodgerd
> If anything, I would suspect that the developers have become costlier over
> time, at least in the last 10 years or so.

Really? Have developer costs _actually_ increased in real terms in the last 10
years? Have _your_ developer costs (if you're outside the VC/SV bubble)
increased in real terms? And how much?

This seems like a terrible assumption.

~~~
jerf
The real point is that developer time has not kept up with the rate of RAM
price decrease, and unless you plan on seriously defending the claim that
developers only cost 1/6000th of what they used to twenty years ago, the
points in the blog post stand.

