All of that means you have to code to it differently. It isn't DRAM, it's not Flash. We've done the work to integrate and test at Aerospike ( semi-plug sorry ), so the system works.
The XPoint tech gets really interesting later in the year, when three things happen: the higher density drives show up, Dell ships their high-NVMe chassis, and more clouds support it.
Regarding cloud - IBM's BlueMix has announced Optane, and the other providers have a wide variety of plans. I can't comment more.
Finally, Intel has been clear about putting this kind of tech on the memory bus, and that really opens the doors to some interesting choices, some data structure changes. That requires new coding, we're on it.
Here's an interesting ComputerWorld article about our experience with Optane:
> That requires new coding, we're on it.
Can you give a rough example to guide my thinking? I understand OSes are considering ways that this has both traditional disk and RAM properties, and are sorting out storage subsystems of drivers, but I assume you're talking about something closer to user-level structures and algorithms?
There are some interesting talks out there about NAND, how it works, and how to optimize - I saw something here on HN a few days ago about writing your own time-series database, which got a variety of the facts wrong but was an example of how to choose data structures that are NAND-reasonable. You can look up some of my YouTube talks and slideshare, for example - I've been talking this for a while.
At a high level, NAND has more IOPs than god, because they don't seek. An old enterprise spindle littlerally does 200 to 250 seeks per second. And Flash can read from 500,000 different random locations per second. That's so far apart that different user level approaches are called for.
In terms of XPoint, let me give you one detail. What does a "commit" look like in XPoint? What do the different kinds of memory barriers look like? What's the best way to validate this kind of persistence on restart, which you don't have to do with DRAM? Does that change your "malloc" free list structure, because you need to validate? Is it a good idea to chop up all the space available, so you can validate different parts independently, or does that mean you end up with the multi-record transaction problem? These are the kinds of things we consider in database design on new hardware ( obligatory: we are hiring ).
Medium-size companies ( say, box, or AppLovin, or PayPal ) buy software, have some cost consciousness, and buy Dell.
I expect Dell will come out with some chassis closer to what Supermicro has been offering to the market, I also see some defection just like you do. However, that's all speculation.
On a semi-related note, the fact that cloud providers offer managed Postgres databases is great, but things like this keep pushing me to think about bare metal in colo. A $20,000 server/backup combo with a couple of these will give me 5x the performance of a server that costs me $70k/year on AWS before provisioned IOPS. That's a huge gap to play with for taking care of your own maintenance costs.
Also remember that funky storage card may have a firmware bug that costs you hundreds of hours to track down and get resolved. That time could have been burned by Jeff Bezos' guys instead..
Another thing is the skills one needs to efficiently manage and deploy everything. The sysadmin, devops positions in a startup have vanished because now that's AWS and GCE. Sure, AWS/GCE are expensive, but the developers don't need to worry about uptime, and if something happens to their instances, they just hit 'delete instance' and start up a new one.
The new theme seems to be developer productivity rather than cost effectiveness. With the new automatic deploy tools, and automatic load balancing, there's not really a need for anyone to worry about managing systems. It might seem like premature optimization, but you're saving the paycheck that needs to be paid to the sysadmin who has to be there for 24/7 monitoring.
This is all assuming one of the existing developers on the team isn't also a rockstar sysadmin, security guru and an insomniac who doesn't have a family.
Yes the raw compute cost is much higher (and the performance often less) than bare metal, but software development is really, really expensive. With the public cloud you get the result of literally millions of programmer-hours "for free". To many that's worth it (at least at first, below a certain scale).
This is also one reason why Kubernetes is really exciting, BTW. It's a control plane that you don't have to rent.
DDOS, multi-path, multi-az, dr, snapshots, hardware redundancy, elasticity, freedom from datacenter contracts, fiber contracts, driving to rack stuff, jammed fingers, staff costs, firmware bugs, bad hardware batches, cdn, compliance costs. I have more but I'll stop there.
It doesn't make sense at very small or large scales but it captures a hell of a lot of the middle.
Do you have any other recommendations for queue systems that can handle a million messages per minute (our use case)?
Almost all of the systems tested hit 17k / s (aside from NSQ) -- on a laptop. NATS (gnatsd) hits 0.2M per second, that's a magnitude above what you need and that's a brokered system, if your task fits with a brokerless one then nanomsg is yet another magnitude. I do not particularly see where the problem really is?
If you're really concerned about that though, I'd say go for the cloud option - just make it so you can start small on a single physical server, but can scale onto the cloud, then migrate that to physical hardware as needed. With this set up, you save massive cash and avoid vendor lock-in.
While our server costs are probably higher than going bare metal, how many developers would it take you to manage all of that on bare metal hosts?
AWS is specifically designed to maximize instance count (that is, cost). The reason "devops" has exploded since EC2 hit critical mass is that before EC2, people were reasonable with the number of servers they needed. With EC2, it's all water, and it's so easy to press "create instance" that people just do it without thinking, and wonder how they ended up paying Amazon $100k/mo for something that used to cost them $5k/mo.
I'm engaged on a project now that has similar figures to what you've quoted here. It could be done with less than two dozen bare-metal servers. I know because it was done with less than two dozen bare-metal servers before someone in the C suite felt left out and suggested some "modernization" via the cloud.
I think that cloud zealots are mostly people who were either deathly afraid of sysadmin or people who were cutting their teeth as cloud got hyped, because there's no way any competent person who was doing this type of stuff before 2008-2009 can pretend that Amazon is not laughing all the way to the bank.
This is not to say that cloud has no advantages or that its use is always inappropriate, but what you're describing is pure fantasy. Yes, a small team of coders should be more than capable of managing the servers that they need, especially if they are renting dedicated boxes from a professional datacenter facility that handles hardware swaps and similar failures for them.
I'm pretty new to the backend design cost/benefit (have only used AWS since I graduated college) but I'm curious to here about bare metal storage solutions for 500 TB of compressed json as I have not read any compelling solutions for that data requirement (Outside of MongoDB but I have heard good and bad things about that).
There are many build it yourself options, which would involve things similar to this SuperMicro JBOD  + some other servers/external RAID controller to handle distributions over the disks. You could also go really barebones and do a couple of home-built "RAIDZilla"-style devices. 
For the cost of one month's S3 storage, you can buy a pre-built Backblaze Pod from a third-party supplier . I've found this to be the case with Amazon; the monthly cost is about 50%-100% of the permanent cost for the actual hardware (which will usually last at least 3-5 years). Even if you have to hire a couple of your own hardware jockeys, you're going to be saving 6x.
For a less DIY route, any SAN provider will be able to accommodate 1PB (for redundancy) without breaking a sweat. Of course, this will be a large upfront expenditure, but it should still easily be cheaper than S3 over the long run. There are tons of options and storage engineering is a big field. Look around and I'm sure you'll find something acceptable.
All this said, I really have to be skeptical that you need 500TB of (compressed!) JSON data. I would look into how much of that data you really need to keep, set up some retention policies, and seriously consider reworking your storage format to make these numbers more reasonable (JSON is obviously not space-efficient), which will not only greatly reduce infrastructure costs but also make the project much easier to handle.
You may also wish to look into modern compression codecs if you haven't already. LZMA provides the best ratio but the compression cycle is slow (decompression is fast). Brotli and zstd are new compression options that are at least comparable to gzip in ratio and much faster. Data deduplication should also help.
 https://www.backblaze.com/blog/open-source-data-storage-serv... (estimates cost of third-party pod at $12,849.40; S3 cost calculator says 500TB of storage in us-east-1 is $12,407.17 without any bandwidth, request costs)
Softlayer has 'on demand' bare metal plans. I'm not sure how cost effective it would be compared compared to running your own box, but the simplicity is nice and they had me up and running in under 2 hours (assembled with customizations).
It runs substantially cheaper than our colo here in Irvine, but that's also not apples-to-apples. That said, I don't plan on continuing with colos unless there's a pressing reason, after almost 15 years of them being my default. Bare Metal is the sweet spot in my eyes, the benefits of cloud without the downside of shared/inconsistent resources.
It's so much cheaper than AWS, and you get unlimited bandwidth.
Drum magnetic memory has been replace but we still have spinning rust, tape, optical, DRAM, SRAM, SSD...
1) Poking holes in things ( tape, DVD, etc )
2) Magnets ( core memory, drum memory, tape, disks )
3) Circuits ( DRAM, NAND, etc )
Examples of all three of these still exist.
What's interesting about XPoint is it is literally a fourth form that has never been commercially available: melting a substance and cooling it quickly or slowly, forming either a crystal or amorphic solid, which then has different properties. We don't know what the substance is, but it's cool that we now have this 4th thing.
This is exactly how rewritable optical media works.
4) Delay lines. I would definitely not categorize these as circuits. The bits are stored as pressure waves in motion.
5) Electrostatic charge. Talking about the Williams tube. The bits are stored as residual charge on a phosphor surface.
Also, phase-change optical (as in rewritable CDs/DVDs).
Also also, printing stuff out at high bit-density with good ECC, and scanning it back in again.
Also also also, chemically encoded bits (as in the DNA-based storage that was demonstrated recently).
There are more, not in common use or very scalable (e.g., mechanical switches).
I think the problem with paper is thot it's not really part of the computer any more. You need a human to take the paper out of the printer, store it, and put it back in the scanner. At least with tape and optical, we have robots to do that for us.
Generally no amount of ECC will help the default failure mode of these.
Unless you're the US gov. According to Wikipedia: "In May 2016 the United States Government Accountability Office released a report that covered the need to upgrade or replace legacy computer systems within Federal Agencies. According to this document, old IBM Series/1 minicomputers running on 8-inch floppy disks are still used to coordinate "the operational functions of the United States’ nuclear forces..." The government plans to update some of the technology by the end of the 2017 fiscal year." 
ps: I'd be curious about a tiny, 2017 laptop friendly optical disc storage. Say a 2.5" RW BD layer in cartridge. A modern mini disc.
I was mostly wonder about the psychological aspect of form factor. naked optical disks are non-personal, mini discs weren't. You interacted with them freely, carried them as is in your pocket. A large enough yet tiny enough reincarnation might be "fun".
That's not to say it'd always be possible, I presume that there may be regulatory issues in some cases.
There's some speculation these devices are massively overprovisioned due to the er... cells (or equivalent) wearing out much faster than early hype/info/etc.
So, it'd be interesting to get a real world idea if these things are likely to explode badly ;) 6 months after purchase or not (etc). :)
Edit: why the downvotes?
I'm currently doing image processing and using a ram drive with ~200GBs.
If all you need is a massive amount of temporary storage for some algorithm, you'll still need RAM, but if you want a stupidly fast backing store for a huge amount of source and then output data, this is pretty incredible.
Let's be clear here. The real reason people used this stuff was that it was 3x/4x cheaper than RAM.
It's extremely early in the technology, and I imagine we will get there. The first SSDs were terrible compared to SSDs now, and 3D XPoint as a technology is extremely scalable and refinable.
Are we looking at the same numbers? "probably under 10 microseconds" is pretty terrible compared to DRAM.
3D XPoint is being considered as the DIMM-module, byte-addressed "memory" before storage. Saying that it "augments" DRAM is almost meaningless because we already have multiple levels of DRAM.
Databases are best on drives that have reduced latencies. Data access is random and keeps jumping around the drive for joins. Your SELECT statements are going to speed up in proportion to latency improvements.