I don’t see the reason for the focus on the amount of memory given the potential theoretical advantages of a unified memory pool that extends beyond the DRAM.
PCIe gen 4 SSDs have obscene data rates and tons of IOPs. In fact the whole SSD is now an extension of your RAM is a selling point of the new consoles.
Is it as fast as having more DRAM in all possible scenarios? No, but with good memory management the real world performance might very well be identical to a system with the same capabilities and even greater than traditional memory systems with larger non-unified pools.
Well, DRAM still has 2+ orders of magnitude lower latency than SSDs. Whether that matters clearly depends on your application. On a desktop system switching tabs in your browser, swapping the content in from SSD will be fine. A ML application on a server accessing memory quasi randomly won't.
Using SSDs for ML is what vendors like NVIDIA and AMD are trying to solve because this is a fallacy as you will never have as much RAM as you want / need compared to the size of your datasets.
It’s the same fallacy as RAM size
impacts video editing, it doesn’t.
8K RED raw is 4.374 terabytes per hour... at that point it doesn’t matter if you have 8, 16 or even 256GB of RAM there if your storage is too slow to support it you’ll drop frames and or have huge seek times, more memory won’t really help you there is no amount of prefetch you can do to bridge gaps that big.
There is a huge advantage of having a unified memory architecture where the application can address a single memory address space to access your data and you do not need to perform memory copies, allocations and all that stuff between your storage, CPU and GPU...
>It’s the same fallacy as RAM size impacts video editing, it doesn’t.
Computers are used for much more then just Video Editing...for Fluid Simulation Ram definitely has a impact. And Talking about SSD (QLC)...if they run out of SLC-Cache they are sometimes slower then HDD's. And i bet your Video editing is faster if you have 4TB of RAM.
Wouldn’t having more ram give the process more runway to prefetch data into ram before it’s needed ?
Without having new prefetch evicting unprocessed older prefetched data
Ok I misread the numbers somehow. This is interesting - I wonder what it means for the future of conventional desktop computing, when considering peripheral chips like GPUs that already have DMA capability
GPUs are going above DMA, DMA still needs memcopy and or address space translations which adds latency slows things down, with unified memory in theory you have a single address space and cache coherency across everything which means you can operate at the maximum bandwidth of all interfaces without wasting any cycles.
Not just Apple we have multiple other examples of when a unified memory pool outperforms larger non unified memory pools just due to the overhead of having to copy memory manage multiple memory address spaces with all the translation and lookups you need to do etc.
We have cheap consumer SSDs today offering 5GB/s of read performance and more than enough I/O to satisfy caching.
This is absolutely a factor, though one I didn't focus on in this post. The sorts of large binary blobs that can take a lot of memory tend to be, well, large and contiguous, so perfect for moving between SSD and RAM at maximum speed.
While I haven't seen benchmarks yet, the claim was 2x relative to their current systems, IIRC, which were already crazy fast at > 2GB/s.
So it would probably do OK even just plain swapping, but really, really well moving these large blobs in and out, for example using mmap() and madvise().
PCIe Gen4 gives you sequential 5GB/s reads and even random reads at well above 4GB/s, this isn’t a page file on an ATA133 drive.
In theory this can also be extended over TB/USB4 and ofc it can be extended over the network (however that’s the slowest part) this going to be quite a big thing especially for use cases like video editing if they offer a true unified memory as a lot of the latency doesn’t necessarily comes from the interface latency but from memcopy and translating between multiple address spaces.
Does shelf life play a role in this? My understanding was that SSD has a short lifespan [0] and so a focus on memory might be better for long term planning, and reliability.
Also the big culprit in shortening the life of an SSD are writes, unified memory doesn’t mean that you have to write more often to the SSD in fact it means the opposite, unified memory isn’t a page file.
I'm curious if the same drawback mentioned for shared applies to unified
> A side effect of this is that when some RAM is allocated for graphics, it becomes effectively unavailable for anything else, so an example computer with 512 MiB RAM set up with 64 MiB graphics RAM will appear to the operating system and user to only have 448 MiB RAM installed.
Mixing tracing GC with other forms of memory is an open research project. Doesn't mean it hasn't been "done", for example Google has talked about getting their GCs to interop in Chrome, but it's (a) really hard and (b) so far results are, at best, "mixed".
Let's see Swift when catches up with .NET 5 improvements.
Apple's talk about RC over GC is just marketing speak for the failure to have a tracing GC in Objective-C that wouldn't crash left and right when integrated with C like code.
It was a very sound decision given Cocoa semantics, and the difficulty to make anything written in C not to fall apart with segfaults, but lets not oversell it.
Likewise Swift RC makes sense from having to integrate with Objective-C runtime and existing ecosystem, but again that is all about it.
There is no RC implementation with comparable performance to tracing GC languages that isn't just yet another tracing GC from the amount of runtime support needed to make it actually fast.
And once again, tracing GCs do well in microbenchmarks where you only check the cost of local operations. They are horrible when you take the global effects into account, with those local/benchmarking advantages not translating into real world use.
No we don't agree, unless you can make a point on how D, Active Oberon, Go, Modula-3, Eiffel, .NET, Nim, among several others, integration of tracing GC alongside stack, global and off heap memory allocation happens to be a research project.
But all of these performance measurements won't matter if users don't actually own their devices; without being able to run applications without Apple's permission and knowledge, these devices should be ignored by many privacy-conscious computer scientists from the get-go.
Is it as fast as having more DRAM in all possible scenarios? No, but with good memory management the real world performance might very well be identical to a system with the same capabilities and even greater than traditional memory systems with larger non-unified pools.