Modern DDR4 RAM quad channel: 50,000MB/s
Optane SSD: 2,000MB/s
Also nothing on the PCIe bus is going to have a latency of RAM to the OS, unless mapped as RAM.
I think you could get a lot of work done with a 750GB RAM (2 of these, or one coming later) and dualcore Opteron setup.
Expand the tables here:
High-end server CPUs such as Intel Broadwell support quad-channel DDR4-2400 PC19200 which is 76.8 GB/s, a lot more bandwidth than the Octane.
The SemiAccurate article looks very fair to me.
And the latency improvement seems to be 2x, not the 10x you claim, or the 1000x Intel originally claimed. (Intel's own numbers compare 20ms for flash versus 10ms for XPoint...) Which isn't nothing, to be sure, but...
It mentions that latency is below 100 microseconds 99.999% of the time, but that's not more than 100x faster than a rotational HDD, albeit with less variance.
We've always made a distinction between memory and disk. Much of computer science is about algorithms that recognize the difference between slow, persistent disk and fast, volatile RAM and account for it somehow.
What happens if the distinction goes away? What if all data is persistent? What if we can perform calculations directly on persisted data without pulling it into RAM first?
My guess is that we'll start writing software very differently. It's hard for me to predict how, though.
The RAM is no longer fast: unless cached, it takes around 150 CPU cycles to access the RAM.
The RAM is no longer byte addressable. It’s closer to a block device now, the block size being 16 bytes for dual-channel DDR, 32 bytes for quad channel.
Too bad many computer scientists who write books about those algorithms prefer to view RAM in an old-fashioned way, as fast and byte-addressable.
For most practical purposes I believe in x86 computers, the block size to consider should be at least a cache line, so 64 bytes.
I now think the next target ought to be latency and power defined, in particular with IOT requirements.
Microseconds, Milliwatts, Millions of endpoints.
or, for storage, Microseconds, Millions IOPS, Multi-Parity
or for computer, efficiency, IOT goals:
Milli-watts as a constraint for a benchmark of compute
Millions of endpoints / processes or cores on a network topology
Microseconds as a measure generically of access to resources, whether memory is sat on another node, or local store, the delay ought to be inside the same order of magnitude as a target.
My purpose in all of that, is, should there - in fact, can there be - consideration of architecture and topology that avoid .. rather can we aim for linear cost in addressing complexity at all, as a goal, or has that been lost already?
I mean "that" and "lost" very vaguely, being non expert, but my question is really should I imagine that there are no effective gains to be had, designing for a IOT style or massively networked future, from the way memory is addressed, or has the complexity that we have, been introduced out of necessity, so will be here to stay, for practical reasons, so the idea of low latency, low power, IOT "grid computing" on a ad hoc basis, is smoke in my pipe?
Don't forget the 8-deep prefetch buffer. The real block size is more like 128 bytes for dual channel.
It'll be wonderful if we actually can get some simplifications in our byzantine architectures, although - I wonder if it doesn't all boil down to our "RAM" (fastest storage) becoming "smarter" - as CPUs manage the three levels of cache - and we "only" loose the "slow disk/ssd" on the far end. CPU cache becomes the new RAM, and cache management microcode becomes our new memory allocators....?
While you can't easily write into other processes anymore, the barrier is more of a suggestion again.
> Tests show that simple ECC solutions, providing single-error correction and double-error detection (SECDED) capabilities, are not able to correct or detect all observed disturbance errors because some of them include more than two flipped bits per memory word
But anyway - even if ECC was perfect at preventing the issue, lots of hosts do not use ECC. That includes practically every desktop.
Sounds like a great use-case for lmdb :)
But ComodoHacker's comment, made me think yet again, why was ACID compliance ever a issue?
I was reliving some nostalgia at the weekend, explaining to a friend how I got excited Microsoft Transaction Service came bundled with NT.
That was my "free" ACID transactions.
It as cross platform, then, too. (or at least multi arch, and advertised to play nice with things like IBM CICS, which may have been much of the point of the bundling, to win deals, even in check box tallies)
MTS is one of the few OS level dependencies SQLServer has. But i think the cross platform origin I recall, was born out when SQLServer/Linux appeared.
I have all this time been confused.
I get it, if you don't need ACID, and have other design reasons, go ahead.
But, well I just always saw it as a problem solved, or plug and play solvable, thanks to MTS. Getting a book on MTS is hat convinced me NT was serious and our business should take heed. If you're a small shop, have margins, can work with fat servers - or as we did, fundamentally scale through transaction routing in the first place - NT (plus the variations of services for Unix now for Linux) can be a happy place.
Incidentally, I believe the middleware - driver that Intel is shipping with Optane, is the work of ScameMP.
This is their Flash Expansion description: http://www.scalemp.com/products/flx/
we've used their product very happily - it may be a good fit especially if you are staging oversize db's you intend to shard, but need them up behind a connection while you test, which was our need.
N.B. Scale MP has a free tier which may also fit your needs.
Not meaning to shill, but I never know why I don't hear of them more, our experience was absolutely satisfying.
edit: typos; "free"/free
edit: removed "very" from "may be a very good fit" about ScaleMP - felt so for us, but can't say why anything that works to meet specific need is qualitatively better when doing the job is a binary y/n..
and to add, now if products like Optane eliminate the performance cost discarding ACID transaction compliance has been justified by, then will that not upset a few conceptual applecarts? I mean, I think some very loose and fast argument has been given, surrounding non ACID compliance and data and databases generally, over recent years, and certainly the performance cost trade off arguments has appeared to me to mask logic elsewhere that needs attention.
I must admit that until looking into it more closely i figured it was no different from the memory map of a micro. Silly me.
Should devices like tablets, with enclosed batteries, be required to have physical circuit breakers for power, as a security measure, if compromised?
We realized that we had to have a planned action to block MACs from the network, in response to any device having questionable integrity.
Even a small Faraday cage was considered - WiFi not the only radio on portables: laptops without removable batteries hop across LTE via VPN... so that was another policy to set a script to shut down.
Persistence, in this case of processes - or just not being able to remove a battery - is a threat that with good reason shocked us, because it is so ubiquitous. I believe the moment a unscheduled reboot or shutdown occurs, at the very least all LAN/WAN access needs to be automatically cut.
There are workarounds, like adding a layer of faster, cheaper ram, but then this starts to look like a big perf improvement in a rather traditional system.
A while ago (a few years?) there was some buzz about a new kind of machine from HP (maybe based on memristors? which were also in the news around then, IIRC), that was supposed to do away with that distinction. They called it The Machine or something like that (:-). I did not follow it at the time, after the initial read, so don't know what happened to it.
"BUD17-503 The HPE Machine and Gen Z" at Linaro Connect 2017
Googled his name and found this:
A look at The Machine [LWN.net]:
 On a side note, Motif was pretty powerful. I remember a colleague of mine who was also in that course, creating a rudimentary app like MS Paint, in the class, in just half an hour or so - as the instructor was teaching, in fact. (Maybe he had studied it some before, of course.) As he demoed it to us, he went "Foo!" :)
The main obvious advantage of using non-volatile RAM across the board is that it won't consume power all the time, like DRAM.
This page says 100ns: https://gist.github.com/jboner/2841832
This one says 60ns: http://stackoverflow.com/q/4087280/126995
I think the difference comes from the latency of operation from the CPU point of view vs actual memory/drive latency.In the article they seem to be talking about the drive latency.
On DDR3, the full latency of a miss to ram is, in a low-bandwidth scenario:
latency of the full cache system, typically expressed as the L3 latency
+ latency of opening a row (Trcd)
+ latency for reading from an open row (Tcas)
+ latency of passing data from memory controller to L1 cache
+ latency of closing a row (Trp)
+ latency of opening a row (Trcd)
+ latency of reading from an open row (Tcas)
+ latency of passing data to L1
+ time remaining waiting out the minimum allowed row active time of the previous memory access (Tras)
+ latency of closing a row (Trp)
+ latency of opening a row (Trcd)
+ latency of reading from an open row (Tcas)
+ latency of passing data to L1
Similarly, the difficulty in large data problems is often more of getting the data in the first place. This is often about coordinating the data creation of many machines and systems.
So, for most of us, this shift will not be huge. Because it is mostly irrelevant. Even systems that do have this much data often benefit from algorithms that don't rely on directly accessing and processing all of it in one shot. If only for resilience. (That is, if a system failing doesn't mean reprocessing all of the data, restarting and recovery is often faster and easier to deal with.)
Sometimes it seems that the diff between a CPU and a cluster is the suffix put on the latency times.
Hm, 67 ns per packet.
So even this local storage would be the drag in many scenarios...
Not sure drive is quite the right word, any more. Has a better nomenclature settled yet?
This is huge deal. SSD's started it. It always seemed wrong for me to address memory through a disk interface.
The main issue is that the re-design of software for permanent non-volatile storage is huge. It goes back decades, and there are many design decisions that need rethinking.
Why? ORMs are essentially systems to convert data structures. Adding a new layer of memory does nothing to address the problem.
The article linked here says "3D XPoint has about one thousandth the latency of NAND flash" but I don't see any actual evidence for that. The paragraph that says it is followed by a link to actual specs for a "3D XPoint" device, saying: "the Intel flash SSD has a 20-microsecond latency for any read or write operation, whereas the 3D XPoint drive cuts this to below 10 microseconds." which sounds to me more like a 2x latency improvement than a 1000x improvement.
So I ask the following extremely cynical question. Is there any evidence available to us that's inconsistent with the hypothesis that actually there is no genuinely new technology in Optane? In other words, have they demonstrated anything that couldn't be achieved by taking existing flash technology and, say, adding some redundancy and a lot more DRAM cache to it?
[EDITED to add:] I am hoping the answer to my question is yes: I'd love to see genuine technological progress in this area. And it genuinely is a question, not an accusation; I have no sort of inside knowledge here.
Keep in mind that this first Optane product is a NVMe SSD. The latency overhead of PCIe and NVMe is usually about 4 microseconds minimum, as measured by reading from a SSD that has just been secure erased and thus doesn't have to actually touch the non-volatile memory in order to return a block full of zeros. This Optane SSD has a best-case latency that is only a few times better than the best case for NAND flash SSDs. This does not mean that the underlying 3D XPoint memory doesn't have a far bigger latency advantage when accessed directly by a capable memory controller, but 3D XPoint DIMMs will be next year.
Given that XPoint is byte addressable is rather impressive as the circuitry and metal layers (wires) needed for this is a lot more than page addressable nand.
The true test is when they connect it directly to DIMMs versus the PCIE bus. Latency numbers there may further prove fundamental improvements in technology.
I have no idea, but some concern how that might affect latency.
For instance, take a look here: http://www.intel.co.uk/content/www/uk/en/architecture-and-te... -- lots of stuff about Optane, always just called Optane. There's a link at the bottom to info about "3D XPoint" but no explicit statement of the relationship between the two.
Also at the bottom of that page, a link to a video called "Revolutionizing the Storage Media Pyramid with 3D XPoint Technology". OK then. What does it say? It talks about DRAM, flash and spinning rust; then it says Intel is introducing new things; first, "DIMMs based on 3D XPoint technology" (OK, but that isn't what they're releasing right now), and then -- these are the exact words -- "Intel Optane SSDs, based on 3D XPoint technology and other Intel storage innovations" (emphasis mine). Hmmmmm.
I really hope my cynicism is misguided. But so far, everything I've seen seems to be consistent with the following story: Intel begin by announcing a new hardware technology called "3D XPoint", which works quite differently from existing flash memory and has amazing performance characteristics, and saying they're going to release products based on it under the "Optane" brand. They work on this technology but can't actually get it to work. But they need to release something. So they make the highest-performance thing they can based on existing technologies, release it under the Optane brand, and tread super-carefully to make sure they never quite say, in so many words, that this thing they're releasing actually uses the new technology they talked about before.
Now, mtdewcmu and you both say that existing tech can't actually deliver the performance Intel say this new product has, in which case it must after all be based on something genuinely new. Again, I really hope you're right. Has this performance profile -- whatever features it has that are impossible to replicate with existing technologies -- actually been demonstrated, or only claimed?
[EDITED to add:] Aha, no, looks like I'm either too cynical or not cynical enough. I found an Intel webpage -- http://www.intel.com/content/www/us/en/solid-state-drives/op... -- that actually does say, in so many words, that the P4800X uses 3D XPoint. So the story two paragraphs up isn't consistent with their current marketing materials, and I now think the story is more likely "3D XPoint doesn't work nearly as well as predicted" than "3D XPoint doesn't work at all yet and they're fudging".
I don't have access to the original planning calculations anymore, but 375GB at $1520 would definitely have been a game changer in terms of performance/$, and I suspect be good enough to make the end user feel like the entire dataset was in memory.
These SSDs have situational uses but unless you want 10+ TB in one server you can get a system with >50% as much actual RAM for the same price.
The probabilistic hyperloglog data type is also a game changer compared to say redshift, but again it's only viable if you are dealing with counting (estimating) unique entities across billions of rows and super-wide dimension sets.
If you are doing a general purpose analytics store, Redshift is hard to beat because of reliability and ease of implementation.
Druid is a purpose-built race car. Redshift is a good cross-over - far less headache and can do almost any job good enough, but you won't have the tuning or performance (when tuned right) at scale. Although, I'm continuously impressed with what redshift actually can do, dispite the humble feature set.
Druid's main weakness is lack of SQL support, so it's not a great analyst datastore. You pretty much have to wrap it into a reporting app.
If I'm going to take on a similar project, I may POC memSQL or Citus DB, and possibly Big Query (if the project is built on Google Cloud as opposed to AWS or raw iron).
This is definitely going to change the way you build your computational fabric. Putting that much data that close to the CPU (closeness here measured in nS) makes for some really interesting database and distributed data applications. For example, a common limitation in MMO infrastructure is managing everyone getting loot and changing state (appearance, stats, etc). The faster you can do those updates consistently the more people you can have in a (virtual) room at the same time.
EDIT: Don't mistake me, I'm very excited about the potential of Optane devices in the workloads my databases handle (though, since my postgres machines are all in the cloud right now, that remains a purely theoretical question). It's just not a panacea.
Then again, nothing is.
I don't think it works like that. There is always some point when at certain number of users avoiding downtime becomes more important, than handling a load and hardware performance stops matter as much.
the its memory and storage would be a boon to dynamic indexing and common table expressions.
maybe this new technology will bring single level storage as IBM has employed on some machines to the public. Where all storage is treated as one resource and only the machine knows the difference.
Also SaaS means that one provider holds data of thousands of companies. Salesforce, for example, has more than 100,000 customers. So the scale is still too big for one machine (or two or three for redundancy).
Correct and incorrect at the same time. Separate orgs may not need direct access to each others rows in the multi-tenant database, but there are plenty of use cases where data does need to be shared between them (companies with multiple orgs, like ours - business parters, etc) and Salesforce has tooling to handle this (Salesforce to Salesforce, Lightning Connect).
For example, if you want fast query returns, the speed of light will limit you if you're using one machine and your query comes from around the world. Then there's the whole high availability thing.
One single machine just doesn't fit every scenario.
Wear management could be moved to hardware, though, using a special MMU/wear monitor/remapping device. If you're using this thing as a level of memory below DRAM, viewing DRAM as another level of cache, something like that would be necessary. That's one application.
This device would make a good key/value store. MongoDB in hardware. Then you don't care where the data is physically placed, and it can be moved for wear management.
> This gives the drives much greater endurance than NAND of a comparable density, with Intel saying that Optane SSDs can safely be written 30 times per day, compared to a typical 0.5-10 whole drive writes per day.
I understand this as: to get normal length usage of traditional SSDs you can get away with writing enough data to completely rewrite the device 0.5 to 10 times, whereas with the new device, it's 30 times.
The new technology is 3 times better than SSDs with regards to write durability.
I think the article would be better worded if it didn't leave out the most important number: 10 rewrites per day for how long before either the SSD or the new thing fails?
Not to mention size factor and lower power consumption of chip itself.
Also, normal (NAND) NVMe M.2 SSDs are still TWICE as expensive as good old SATA ones, at least in my country… And they want to push Optane into the consumer space later this year… who even needs that much performance at home?
In the next ten years (closer to ten than not) we'll need it for massive and or hyper intricate 4k VR worlds, the assets of which you won't want to download every time you load up the world/s. You'll want to hold as many of the assets locally in something extremely fast, if not ram then the next best thing. That is until we commonly have 10gbps plus to the home.
I think more improvement is needed on the VRAM side than on the disk side, so HBM2 hype > Optane hype :)
This starts to make sense in a single server node with enough 4 channel pcie interfaces. AMD Naples?
Memristors can make things more dense, but that applies to both options.
"Initial limited availability starts today, for $1520"
Interesting it seems to be marketed as cheaper memory. You'd think at first they'd try and rip super high margins out of banks/corps by selling it as "persistent" memory.
Although I guess if your waiting for file writes in multiple locations the network overhead makes the actual write sort of irrelevant.....
But for use cases where data matters, fsync seems like a reasonable idea.
I think some of them are protected by both a battery and flash memory for long term backup.
I mean, how else would you get 80 MB/sec? http://web.archive.org/web/20010613164621/http://www.mncurti...