Hacker News new | comments | show | ask | jobs | submit login

This is a timely analysis. The virtual memory system, with its concept of paging to disk, is obsolete in the sense that hardly anybody that does bigger-than-ram computations rely on the kernel's algorithms to manage it (https://scholar.google.com.au/scholar?q=out+of+core+algorith...).

The current paging system doesn't have a sensible mechanism for flash-as-core memory (10x RAM latency, e.g. DDR4 12ns for first word, so 120ns), persistent memory in general, or using SSDs as an intermediate cache for data on disk. ZFS has some SSD caching but it is not really taking advantage of the very large and very fast devices now available.

So we do need new paradigms to use this effectively. I'd like to be able to reboot and keep running a program from its previous state, because it all sits in flash-core.

Also there is huge potential to move to more garbage collected memory storage systems. This goes hand in hand with systems which can progress concurrently, without the overhead of difficult multi-threaded code, such as parallel Haskell.

On the negative side, I find the use of the term 'warehouse scale computing' to be stupidly buzzwordy.

From https://gist.github.com/jboner/2841832

L1 cache reference 0.5 ns

Branch mispredict 5 ns

L2 cache reference 7 ns 14x L1 cache

Mutex lock/unlock 25 ns

Main memory reference 100 ns 20x L2 cache, 200x L1 cache

Compress 1K bytes with Zippy 3,000 ns 3 us

Send 1K bytes over 1 Gbps network 10,000 ns 10 us

Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us

Round trip within same datacenter 500,000 ns 500 us

Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory

Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip

Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

IMO part of the reason is that DRAM is cheap and you can get a lot of it. How many applications have a working set that is in that relatively small region between DRAM and SSD.

I close the lid of my laptop and my memory is saved to SSD, I open it and it comes back pretty much immediately, what more do I need from that perspective?

One thing that tends to happen with caches is that you tend to get smaller returns as they grow larger. You're saying my 64GB RAM can be another level of cache for my 500GB SSD but I'm not quite sure what we'd do with that and why we need more than what we can already do with this SSD at the application layer. I agree that SSD paging can probably be improved. Maybe support can be moved out of the OS into hardware to get better latency. I'd still think that if you're thrashing the SSD you're likely not getting good performance just like if you're thrashing DRAM you're not doing as good as you could be doing.

> How many applications have a working set that is in that relatively small region between DRAM and SSD.

Actually I'd say quite a few. Because not everyone has a server with 40 cores and 1TB RAM because that is quite expensive. But many people have 8 cores and 32GB RAM, and could conceivably add 512GB of fast flash-core (by which I mean fast flash memory accessibly via the memory bus, rather than PCIe, although that may be fast enough). So, your laptop could search a ~fast-as-RAM key value store with 200GB of data, even with only 8GB of RAM.

But I don't think any of this is particularly relevant to desktops/laptops as such. This is more of a programming paradigm change. Main memory is still going to be unbearably slow (many clocks to fill a cache line), but next level storage will only be 10 times slower than main memory, instead of 1000 times slower. What do we do with that? How do we orchestrate inter-processor and inter-chassis cooperation on solving problems? (For example, if inter-node flash-core IPC is about the same speed as intra-node. Distributed flash core could be hundreds of TB.) What can we do if memory is persistent? How will we adapt algorithms to reduce flash wear problems?


There are systems like Druid that memory map files and rely on the OS for paging in and out segments: http://druid.io/docs/latest/operations/performance-faq.html

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact