Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Memory is slow, Disk is fast – Part 1 (bitflux.ai)
54 points by ashvardanian 8 days ago | hide | past | favorite | 47 comments




The data does not support the premise much less the grandiose conclusions at the end of the article. By pure construction memory is always going to be faster than disk, right up to the point where disk is inplemented as memory which makes this a pointless semantic discussion. Such a fact would hardly invalidate the core pillars as computer science as the conclusions imply. Waste of time.

> By pure construction memory is always going to be faster than disk, right up to the point where disk is inplemented as memory

You can get servers with terabytes of RAM these days. How many startups’ and small to medium businesses’ entire production database could fit in memory on 1 server?

Hint: the answer is “a large majority” (numerically, not by market cap)

Yes RAM is faster than Disk, but for most people it may simply not matter anymore. Put everything in RAM, sync to storage periodically in the background.


>right up to the point where disk is inplemented as memory which makes this a pointless semantic discussion.

Yeah hope we are going back to days where developing software means one needs to understand the underling hardware. I know that makes the barrier of entry higher, IMHO that is a good thing. We don't need more people doing software development just quality people.


A title of "Memory is getting slower, Disk is getting faster" might remove a lot of the initial confusion this post causes.

Except that the article doubles (or rather, triples) down on claiming that reading from the disk is faster than from the memory despite all of CS (CS, really? That has nothing to do with CS on reality) "dogma" claiming otherwise.

It's a weird article. It looks at an insightful set of numbers, fairly accesses them, and then goes on to conclude the stupidest thing.


Maybe they discovered DMA transfer from SSD on PCIe being less bottlenecked than from DRAM, and fell into a rabbit hole from there.

Switching hubs always has more bandwidths than a port that it is part of, and modern CPU resides on the PCIe equivalent of an uplink port. It makes sense that there would be situations where peripheral-to-peripheral DMAs would be faster than RAM-to-peripheral transfers to my low-paygrade brain.


Yes, there exist situations where periferal-to-periferal is faster than RAM-to-periferal.

But that doesn't make the disk access optimizations on your OS harmful, nor even useless. The disk-to-CPU access is best done through those optimizations for several reasons, one of them being that the author's conclusion isn't entirely supported by the data.


> Data used for the charts was researched and compiled by ChatGPT, I spot checked it and found it was accurate enough to for the narrative.

Last I checked, that wasn't how citations worked.

I only sampled one data point (2017 AMD EPYC Rome Clock rate) which was significantly off, because in 2017 it was the Naples chipset that was released, and, unless 2017 was desperate enough to clock from 2.2Ghz to 3Ghz on the regular (Boost up to 3.2Ghz), the 'research' was a fair bit off...

Doesn't undermine or contradict the authors (bots?) point, but a strange way to provide 'evidence' for an argument.


I feel like our planet’s long-term academic health would dramatically improve if something resembling the following statement was posted on the wall of every K-12 and college classroom:

ChatGPT is not a primary source. Wikipedia is not a primary source. Google Search is not a primary source. Microsoft Encarta is not a primary source. The Encyclopedia Brittanica is not a primary source.

Information aggregators are not primary sources. Identifiable people are primary sources.


I read the whole thing but I got a strong AI vibe about half way through, felt like it was significantly padded with extra words to extend its length. Kind of annoying that blurb wasn’t placed at the beginning of the article…

And we don't know what the prompt was. Maybe:

Hey ChatGPT, here's a narrative. Please provide some plausible stats to support it.


Maybe part 2 will be useful, but part 1 absolutely does not answer the question in the headline.


Thanks! Much appreciated!

There's some sleight of hand being perpetrated here through the use of an inappropriate scale to trick the reader into thinking that NVMe latency is roughly zero when it's not. Let's just stipulate that you have an unreleased, hypothetical NVMe device with consistent 10µs read latency. The same device will have a write latency 10-100x higher, which it hides, under transient conditions, with DRAM. It's a very poor mental model to assume that "I/O" latency has gone to zero. It only looks that way if you're polluting your chart with disk access times from 1980, and it assumes a false symmetry between "I" and "O".

The article feels like LLM output. Some correct facts strung together to support an incorrect thesis the author asked it to validate, but completely ignoring inconvenient facts and reaching a wild conclusion that isn’t actually true.

That's what it says it is at the very bottom.

Well it says the charts were generated this way. But I feel like the narrative was also done this way.

Relevant to this: memory cost per GB dropped exponentially every year from the days of the earliest computers until 2010 or so, reaching $4/GB in 2011. A decade and a half later it's still in the $2-$4/GB range.

Note also that SSDs started out only slightly cheaper per GB than DRAM - the 80GB Intel X25-M had a list price of about $500 when it was released in 2008, and references I find on the net show a street price of about $240 for the next-gen 80GB device in 2009. Nowadays you can get a 1TB NVMe drive for about the cost of 16GB of RAM, although you might want to spend a few more bucks to get a non-sketchy device.


Disk hardware may be faster relative to RAM, but if you're using typical serverless PaaS offerings to run a hosted application at modest operational scale, it's a heck of a lot cheaper to get a node with "normal" RAM and CPU, than it is to get decent IOPS. If you're a big iron company with big iron problems, you may need to think in different scaling terms, but for the SaaS hoi polloi the economics of hosting have somewhat preserved traditional notions of performance.

Ai generated slop. Constantly summarising various parts of the memory hierarchy, graphs with no x axis, bad units, no real world examples, the final conclusion doesn't match the previous 10 summaries.

The big problem is that it misses a lot of nuisance. If actually try to treat an SSD like ram and you randomly read and or write 4 bytes of data that isn't in a ram cache you will get performance measured in the kilobytes per second, so literally 1,000,000 x worse performance. The only way you get good SSD performance is reading or writing large enough sequential chunks.

Generally random read/write for a small number of bytes is similar cost to a large chunk. If you're constantly hammering an SSD for a long time, the performance numbers also tank, and if that happens your application which was already under load can stall in truly horrible ways.

This also ignores write endurance, any data that has a lifetime measured in say minutes should be in ram, otherwise you can kill an SSD pretty quick.


SSDs have so many cases of odd behaviour. If you limit to writing drive sector chunks, so 4k, then at some point you will run into erase issues because the flash erase size is considerably larger than the 4k sectors. But you also run into hitting the limits of the memory buffer and the amount of fast SLC as well which limits the long term sustained write speed. There are lots of these barriers you can break through and watch performance drop sharply and its all implemented differently in each model.

Yes, it can be quite brand/technology specific, but chunk sizes of 4/8/16/etc MB usually work much better for SSDs, but the only data I've found to read/write that easily lines up with those chunk sizes are things like video/textures/etc or cache buffers you fill in ram then write out in chunks.

This from exprience or any sources on what's sane to use today? Building a niched DB and "larger-blocks" has been design direction, but how "far" to go has been a nagging question (Also are log-structured things still a benefit?).

You are also going to cause a lot of write amplification with bigger blocks and at some point its also going to limit your performance as well. What really makes this hard is it depends on how filled the drive is, how heavily the drive is utilised and for how much of the day. Time to garbage collect results in different performance to not.

When you start trying to design tools to use SSDs optimally you find its heavily dependent on use patterns making it very hard to do this in a portable way or one that accounts for changes in the business.


This project is not "business" bound, it's a DB abstraction so business concerns are layered outside of it (but it's a worthwhile pursuit since it rethinks some aspects I haven't seen elsewhere in all the years of DB announcements here and elsewhere).

And yes, write amplification is one major concern but the question is that considering how hardware has changed, how does one design to avoid it. Our classic 512byte, 4k,etc block sizes seems long gone and does the systems "magically" hide it or do we end up with unseen write amplification instead?


In part 2 https://www.bitflux.ai/blog/memory-is-slow-part2/ the last diagram still shows that memory is faster than disk.

This was largely a waste of time to read. It doesn't really support its own point or honestly even make a point or provide anything actionable. I'm not even sure I can trust the numbers given it's been vibes-researched.

"Which is funny because that describes AI and you’d be doing this kind of work on a GPU which leans entirely into these advantages and not a CPU anyway"

Damn thats harder to parse than it needed to be.


"Who cares? Produc"

This one is even more hard to parse :)


That was the typo that made me bail on reading this article.

That's just philosophy.

who downvoted me for that?

> clocks, IPC, and latency flatlined

IPC has definitely not flatlined. Zen 4 can do something like 50-70% more IPC than a CPU from 10-15 years ago. Zen 5 is capable of ~16% more IPC than Zen 4.


It's all good. The author spot-checked ChatGPT. Many or possibly even most facts in the article might be true.

In relation to other core specs, like transistor and core counts and memory bandwidth, 50% growth in 10 years is pretty much flat-lining, I’d say.

Is there a reason why are not seeing more use sram (other than cost)?

It costs more because it's less dense than DRAM - the same transistor count that produces 2GB of DRAM can only fit a fraction of that in SRAM because it's 6 transistors per SRAM cell vs. 1 + capacitor for a DRAM cell.

Power usage is also generally worse since the SRAM cells use continuous power to hold their state, while DRAM cells only use power during read/write/refresh (relying on the caps to hold their charge for a short time while not actively powered).


We are seeing more SRAM in the form of CPU caches, this is one of the things that is actually scaling still.

Are you asking why not use SRAM in something like a DIMM? You could do this. Here's why I wouldn't advocate for this. Assume you had zero latency SRAM in your DIMM. It still takes ~40ns to get out of the processor by the time you go through all the memory controller and phy. So you'd have an incredibly expensive but small DIMM taking up limited pins on the processor package/die. Even then you'd only cut the memory latency in half, and we'd still be stuck at a new lower flatline.

Incorporating the SRAM in die is different story, you get to scale the latency and the bandwidth closer to the other capabilities of the cores.


Do we need another reason?

For interested folks, see also "42 Years of Microprocessor Trend Data" from Karl Rupp.[1]

[1] https://www.karlrupp.net/2018/02/42-years-of-microprocessor-...


> most traditional software will be stuck in the past, missing out on the exponential improvements

Do I need exponential improvements and vector operations in a text editor though?


Well yes! You see that your text editor is now running on top of an entire hidden webstack, which includes a GPU-accelerator compositor and vector-accelerated font rendering just in case you decide to zoom or rearrange your sidebar so that reflow and layout and drop shadows and transparency and animation all happen smoothly. You'll need a ton of memory and CPU to crank through all the dynamic optimization that is necessary to make the oodles of JavaScript not crawl.

Kindness would be plotting all of this with a log scale. The plots could be drawn on napkins for how much they explain.

Original author AMA

Why not CXL?

In this context CXL is kinda storage and kinda higher latency RAM. It's latency is worse than standard DDR5 DIMMs but the bandwidth is on the same trajectory as storage/networking.

I'm inclined to think of it like storage in this context. It's scaling, but it will require new thinking to take full advantage of.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: