The on-stage benchmark shows a write throughput of about 2GB/s, which is a much more modest improvement over the 500MB/s of a modern consumer SSD (and perfectly within reason for a PCI x2 interface).
The second thing is that Intel has been claiming multiple GB/second of throughput as well. They really do believe this will be a replacement for DRAM on some platforms. As in you read into your L3 case from this stuff and you flush to it when you write out a dirty cache lines. And while that will make the overall system slower, it gives it literally instant stop/start capability if the key parts of your architecture are static (can retain data at 0 clock). What that means is a laptop that can turn it self off between waiting for sectors to read in from the disk or packets to come off the network, or keys to be pressed by the user. Non-illuminated run times in days off of a battery source rather than hours.
Imagine a 1.2TB of this stuff on the motherboard substituting for DRAM. So you've got every application and all the data for your applications already "in memory" as far as the chip is concerned. App switching? Instant, app data availability? instant. Quite a different experience than what we have today.
Yeah, no. It might be able to turn of the CPU, but everything else would have to keep running. Graphics hardware does not deal well with frequent state changes. Network hardware would still need to run to keep its clock in sync with the signal.
So, you get to turn off a handful of components, which is where we already are today - CPUs spend most of their time in deep sleep states. USB devices and chipsets implement sleep states. Graphics hardware clocks down significantly.
For example, I only have to wait at most one second for an app to load, if it's not already open and minimized. It's comparable to the time it takes me to click a button. And my computer is rebooted only once every few months, so it's always up.
This improvement will be more meaningful for random access database operations.
EDIT: Does not seem to be.
“You could put the cost somewhere between NAND and DRAM. Cost per bit, it’s likely to be in between them somewhere. But actual cost will result from the products we bring to the marketplace.”
CAS latency for DDR3-3300 is less than 1/3ns, according to the article below. If Intel's new Optane memory can achieve addressing on the order of 10ns, it will be very competitive with DRAM, since all of other overhead along the data path ought to be similar.
On top of that, the CAS latency is given in cycles -- the cited CAS latency for DDR3-3300 is 16 cycles. Multiplying # cycles by time per cycle yields a value closer to 9-10ns. Note that this is the "first word" latency in this table.
- The other thing you may be talking about is this "bit time", which looks like it's simply inverse throughput. It's very important not to conflate throughput with latency.
I believe that they are using fairly best-case latency numbers, under the assumption that the memory accesses are close to sequential. For random memory accesses, as you note, the latencies are higher. Unfortunately the article doesn't go into detail on what Optane's worst-case latencies are, nor what the latencies will be like in a real functional system (mostly because Intel has only early prototypes to show).
For sequential access giving a latency number doesn't make sense - you need to talk throughput. You can get upwards of 20gb/s with a single core for DRAM. Here you're at least bounded by the 4x PCI interface (more likely the Optane device) so maybe 3GB/s if you're feeling generous.
Assuming we keep the file system model, I'm guessing some kind of direct memory mapping is in order? Anyone knows what's ahead of us on the software side, to take advantage of this kind of latency?
It would be fairly trivial to layer a conventional filesystem on top of such a persistent object store to get the best of both worlds, which would - amongst others - give you instantaneous suspend and re-awaken and other goodies.
If true this will vastly change the way we use computers.
A similar approach has been taken with HP's interesting memristor-based "Machine"
There's a lot of interesting stuff in Organick's book on Multics, not all of which was implemented, unfortunately. And a lot of really good stuff was tossed overboard when fitting Unix into a PDP-7 (a lot of overwrought bad stuff was jettisoned too -- don't get the wrong idea!).
Short term, the disk controller becomes a peripheral on the memory bus. On a 64-bit x86-64 system, the top 16 bits of the address are either 0000 or ffff for RAM. Make it so that the prefix 1000 (for example) maps to the disk, so accessing (physical) address 1000000013371000 accesses byte 13371000 on the disk.
Now processes can just ask the OS to perform a physical memory mapping to obtain a range of virtual addresses directly backed by disk pages, with page protections set based on their filesystem permissions. Such physical address mapping interfaces already exists in most OSes to support memory mapped I/O (for example, mapping /dev/mem in Linux).
This addressing scheme has another advantage: other devices on the system can use e.g. DMA to directly talk to the disk without any CPU intervention. For example, the GPU could load textures straight off of disk, just like John Carmack wants.
Medium term, we start rethinking the filesystem. If we make the address range for a given disk completely persistent, we can just put pointers to disk bytes on the disk itself. Processes will use the same virtual addresses as the physical addresses when talking to the disk. Suddenly "serialization" to disk is no longer required: data structures can be stored in native form directly on the disk. Imagine having a "dmalloc" function call hand you a chunk of persistent storage which you treat the same as any memory, but which can outlive the process. Similar concepts exist in some languages (like MUMPS), and now we bring the idea to all programming environments.
Long term, RAM ceases to be an independent entity, and merely becomes OS-managed cache for the big persistent storage (assuming it still has any latency/bandwidth advantages by this point). Now you can get rid of the notion of "shutting down" or "starting up" the system: everything is persistent. Without having to constantly refresh DRAM to keep the system alive, devices can "sleep/hibernate" more frequently and readily, saving significant power. Programming models become nearly unrecognizable as old models of memory management and process lifetimes give way to new models of persistent storage management and eternal services.
We're not far off from seeing a potential revolution in computing here.
That's not true. You'd still want excel to just save your sheets as xls file and show it to a coworker as opposed to sharing the entire excel program state. Upgrading from one version to another also requires a stable persistent data format for things such as configurations.
Not to mention that you still need serialization to communicate over a network.
In my original comment, I simply meant that you would no longer have to serialize your data to your own disk.
It seemed to me from the layperson descriptions of how the old tube-based systems were operated, some of the old architectures had no notion of files at all, it was all RAM all the time. I also read stories of when rotating media first started being used, programmers wrote data not sequentially on the media but in an optimized pattern so by the time the cpu pulled a value into RAM the read head was positioned multiple units over, and so data was written out according to the demands of the hardware, and now we're talking about taking the cpu out of most of the loop.
We aren't coming full circle with the new memory architectures we're exploring, but kind of a spiral in a "history doesn't repeat but rhymes" manner by dusting off some old techniques and putting new spins on them, pun not intended.
If always-persistent state becomes popular, then I wonder if that will revive the popularity of image-based environments, a la Smalltalk and Lisp Machines, along with the collaboration challenges those introduced.
http://spdk.io - for block device form factors
Obsolete might not be the right term, but there is essentially no way that open/read/write can take proper advantage of this kind of low latency storage. We are going to need a new way to abstract and interface with this persistent storage.
Anyway, what I was thinking was similar to call gates, which were phased out for being slow. Probably just making syscall faster in the CPU and reducing the overhead kernel side would be enough.
The mmap() model is good. The current page fault mechanism, not so much.
As a medium, for offline, offsite it still reigns king. Density, reliably, and durability remain high, vs platter based drives or SSD.
I know of at least two companies that keep tape going for compliance reasons. At the end of each year, they do a pretty large archival snapshot, and send it off to storage. It will remain there till some lawyer asks for it, and will most likely end up copied by a third party for the sake of integrity.
For one of them, the archive tapes are becoming a concern, the media is fine, but finding hardware is going to become a problem.
My experience with them tells me, they are not the most tech savvy bunch. If they ever need to use backups with the software they were provided decades ago / by modern fast failing agile company (according to taste), even if they can get the source it will be very hard to compile and run them. Reproducible builds is not exactly a solved problem.
Indeed, but we are getting there (over 90% of packages from Debian stretch amd64 and 99.9% of OpenWRT packages):
...ill see myself out
Plus, since SSDs aren't using the latest process technology (as far as I'm aware), they still have a few more years of Moore's law ahead of them. Shrinking transistor size means increased density (and at lower cost, once you recoup the cost of the mask).
Nope. SSDs have already hit the wall and dug into it a ways. Everybody is doing 16nm/15nm NAND flash and struggling to get ~40nm 3D NAND out the door, but they can't beat the $/GB of planar NAND. Sub-20nm TLC flash has had serious problems with the data just plain leaking out of the memory cell, to say nothing of the scary low program/erase cycle counts. The only thing that really kept the SSD market advancing over the past year was the widespread adoption of better error correcting codes to cope with lower quality flash.
PCI Express is packetized and it takes a little over 1 nanosecond per byte for the 3.0 version. I believe the minimum packet size is 20 bytes, but I'm not positive on that. It seems like the latency could never be less than 2 * (1 * 20) nanoseconds in the case of a random access, and this is before the latency of the drive itself is factored in. Surely there will be a few clock cycles required for the drive HW to decode the PCIe transaction and act upon it.
That being said, any latency improvement of the drive _will_ have a direct reduction in effective latency in the end, so it’s all good news to me. I’m just curious about the 1000x figure that’s being referenced everywhere.
I wonder if they will position these as replacements for you SSD, or if they are going to pitch them as a caching layer they way they dabbled in hybrid SSD/HDD approaches.
Good read about consumer SSD performance:
Improved 4k random read and write would of course sill be a big win.
Intel always does this and Optane won't be any exception.
The "1,000x" claims are either at the chip level (e.g. NAND block erase vs. XPoint word write) or maybe some access pattern that is worst-case for flash. It's physically impossible to get 1,000x improvement on common access patterns.
It is latency.
I think this is going to be amazing for "in-memory" data bases: it's cheaper, still very fast, has higher capacity and is persistent (the article doesn't even mention that). But it won't replace DRAM any time soon imho.
The applications will mostly be on heavy data side like medical research and weather patterns, but it could also involve 3D photogrammetry taking in a trillion points.
What will the 2060 Instagram be?
I wonder how it compares to the memristor, which also has a sub 90 ns switching time and pretty low power. Optane blows that out of the water, if it's not vaporware.
I'm curious, why do you say that the open/read/write model can't work? I suppose that current software should be optimized to take advantage of better performance, that is what really is.
500MB 1000x faster would be 500GB/sec or about 20x faster than standard RAM.
That puts the claim in perspective :)