Intel has screwed up their DC S3500 SSDs

fivedogit · on Nov 18, 2014

> "and of course it completely ruins the day of people who are trying to have and maintain a spares pool"

I used to maintain a distributed system of TV recording servers with hundreds of analog TV tuner cards inside and understand this pain all too well. After years of frustration trying to get these cards and all the different revisions to work together on whatever version of Linux I'd adopted for the system (kernel upgrades were a huge risk), I swore off hardware altogether for future projects. Even though all the devices had the same chipset, I couldn't keep it all working at the same time and it sucked all the time and energy I should have been spending on my actual product.

God bless the rise of cloud computing. Seriously.

I can't even imagine what it must be like to maintain the amount of hardware they have at AWS or Google. Speaking of which, how the fk does a startup like Digital Ocean do it?

Igglyboo · on Nov 19, 2014

Honestly it's probably a lot easier at that scale. When you're a single dude maintaining 10 boxes you're fucked if one dies on you. When you're 50 people maintaining tens of thousands of boxes it's not such a big deal.

yingerj · on Nov 19, 2014

Also, when you're buying that quantity you qualify the hell out of your hardware. The big boys have groups dedicated to making sure that they get what they want from their suppliers.

tw04 · on Nov 19, 2014

DO orders from Dell who has contracts in place with their hardware suppliers to supply the same part for the life of the product. Someone like Amazon or Google order enough parts that they have the same setup, just directly with the ODM.

toomuchtodo · on Nov 18, 2014

> Speaking of which, how the fk does a startup like Digital Ocean do it?

Carefully with DevOps/Networking/Infrastructure folks.

I used to manage several thousand physical Linux servers. It can be done fairly painlessly when you control the environment.

I noticed you mentioned recording servers with analog tuner cards; today it'd be much easier with SDR hardware streaming RTMP streams to cloud servers writing the data out to network attached storage or even S3.

akira2501 · on Nov 19, 2014

I'm not trying to take the piss here, but why would you use a real-time protocol to stream to an archive? And why SDR instead of a dedicated tuner/decoder?

toomuchtodo · on Nov 19, 2014

RTMP is of course not required; you could write locally to your workers, and then push to your storage repo over http, rsync, whatever. My last gig was with video broadcast, so most of my work was with rtmp, Akamai, etc. Different ways to skin the same cat. Almost all hardware video IP encoders support rtmp though, and you can colocate with where your storage is.

As /u/akiselev mentioned, SDR is easier to extend with the hobbyist community that exists around it, especially if you have a custom application and need more control.

akiselev · on Nov 19, 2014

Why wouldn't you? RTMP is one of the most common ways to stream video and if you're already receiving live audio/video data from a tuner then you can easily package it into an RTMP stream going out to any number of archive servers running ffmpeg (which does the hard work of any transcoding or remuxing you need to do). All you have to do is package the RTMP stream, build ffmpeg yourself with nonfree and gpl options enabled, and run/monitor off-the-shelf software with an obnoxious number of command line arguments. Chances are you can package the RTMP stream straight from the source using ffmpeg too.

I don't know about the SDR vs tuner question but many SDRs are made with TV tuners and I'd bet you'd have more luck (especially long term) with the software written by the amateur radio community than whatever company makes your brand of TV tuner.

madengr · on Nov 19, 2014

DTV decoding is really processor intensive, and is really only now possible on the fastest processor. You can do it in an FPGA, but those are expensive. ASIC are still the cheapest way to go. It's like hardware vs software H.264 decoding; night and day speed and efficiency difference.

schtinky · on Nov 23, 2014

Pew just released a study that showed people are starting to feel that they've lost control of their private data (which they have):

http://www.pewinternet.org/2014/11/12/public-privacy-percept...

Your vision of the future assumes that folks will remain complacent about how much they trust their doctors, health insurance companies, medical device companies and hospitals with their minute-to-minute activity.

wmf · on Nov 19, 2014

Amazon and Google design their own hardware so it is much less likely to change out from under them.

mark-r · on Nov 19, 2014

Do they also design their own components? Because that's what we're talking about here.

hocuspocus · on Nov 19, 2014

Google is believed to assemble its SSDs, supposedly from Intel/Micron flash memory and Marvell controllers. So even though they're the same components you'd find in Intel/Crucial SSDs, Google would obviously not run in an issue due to some firmware upgrade.

skuhn · on Nov 19, 2014

They might do that, but they have also bought this SSD. Everyone large has it and/or the S3700 hanging around somewhere. It's a really solid drive and the price / performance / reliability is hard to beat (as with its predecessor, the 320).

When a giant company talks about building hardware themselves, it means that they thought they could get a better deal than their suppliers could get. At some point you're the source of your supplier's discount, rather than a beneficiary of it.

So they call up an ODM, ask for the same motherboard as always, oh but please hold the serial port to cut the BOM by $0.50. It is a custom design made for them, but it's all still built with the same commercially available stuff that everyone uses.

No one is going to design a motherboard from scratch, when they can use the Intel reference design. Likewise, that Marvell controller that Google might use on their Google-brand SSDs is going to run Marvell's standard firmware with maybe a few changes per Google's request.

I doubt Google would run into this issue in production though, because they should have an agreement that requests an exact firmware revision from their vendor. Any change to the firmware revision would require some sort of re-qualification, which should easily catch something this big.

It's still lame that Intel did this. The drive should have just been 4K when it launched, and every other drive since 2011 should be as well.

reacweb · on Nov 19, 2014

Giant companies should buy most of their hardware to reduce cost and focus on their main activity. But they should also build a small percentage of their hardware themselves to have the freedom to ditch away a supplier. This is also important to maintain good technical knowledge to negociate more wisely with suppliers.

beachstartup · on Nov 19, 2014

they don't. it's the same stuff everyone else is using.

in fact, in some cases it's cheaper/shittier than the stuff everyone else is using, because individual component failure and performance matter less.

digi_owl · on Nov 19, 2014

> God bless the rise of cloud computing. Seriously.

In other words, god bless that you can pay someone else to deal with it so you can make the shiny service?

chris_wot · on Nov 19, 2014

Well, yes. That's the whole idea.

rodgerd · on Nov 19, 2014

I read this at the time and, frankly, my takeaway was more that:

> There are applications where 512b drives and 4K drives are not compatible; for example, in some ZFS pools you can't replace a 512b SSD with a 4K SSD

...the ZFS design was fundamentally fucked up. Intel have merely exposed a core design problem, because sooner or later you aren't going to be able to find 512 byte drives at all.

barrkel · on Nov 19, 2014

Zfs sector size (ashift parameter) is set at vdev creation time. That is, when you get a bunch of drives together, and you want to have some redundancy or striping, you create a vdev that is composed of several drives, for your desired mode of redundancy (and / or striping). A pool is composed of multiple vdevs; zfs file systems all allocate from a common pool. So it's generally only a problem if you are replacing an existing drive in a vdev after it failed.

Zfs doesn't support a bunch of things. It has no defragmentation. Filling a zfs pool much north of 90% tends to kill its performance even after you delete stuff to bring it back down again. The usual answer to these things is "wipe the pool and restore from a backup", or "zfs send <snapshot> | zfs receive <filesystem>". The answer to changing the sector size of a vdev is similar, just like it is for removing a disk from a vdev, or reconfiguring your redundancy in most cases.

This is just how zfs is currently implemented. It was designed for Sun's customers, for whom having backup for the whole pool, or having a whole second pool to stream to, is not a big deal. Using it in a home or small business context consequently requires more care and forethought.

rodgerd · on Nov 19, 2014

> Zfs doesn't support a bunch of things.

I am well aware of this, having been running production systems with it since 2008, shortly after it stopped silently and irretrievably corrupting data.

> It was designed for Sun's customers, for whom having backup for the whole pool, or having a whole second pool to stream to, is not a big deal.

The idea that I have to destroy and re-create pools for so many no especially uncommon events is one that runs pretty counter to the way ZFS generally does a good job of being an enterprise filesystem. "Throw it away and restore from backup" is not a good answer.

cbsmith · on Nov 19, 2014

> "Throw it away and restore from backup" is not a good answer.

Honestly, when you think about the life cycle of many storage systems, it is pretty reasonable. Once the drives get to a certain age, you tend to have to replace them anyway, and after the array is beyond a certain age, you want to replace the whole thing.

It makes a certain sick sense to expect a lot of enterprise customers to have a strategy for fail over to a new storage pool.

Rapzid · on Nov 19, 2014

I believe 80% is where it switches allocation strategies by default. Ideally you don't want to cross that threshold.

mark-r · on Nov 19, 2014

If you know you can't get suitable replacements, it's an easy problem - copy the data to a new pool and retire the old one. In this case, since the part number didn't change, there was no way to know.

Sector size is a pretty fundamental property of a disk drive.

w0rm · on Nov 19, 2014

He mentions it's only in some pool configurations. I imagine there might be configurations where 512b or 4k makes a difference and isn't compatible.

EwanToo · on Nov 19, 2014

For those who wonder why, this seems to be a decent explanation of the issue:

http://lists.freebsd.org/pipermail/freebsd-stable/2014-Septe...

So you can have ZFS pools with 4K blocks, it's just if you've chosen 512-bytes at the start, you're going to struggle

protomyth · on Nov 19, 2014

Doesn't the current install of FreeBSD default to 4k blocks even on 512-byte drives?

Freaky · on Nov 19, 2014

No. There is a vfs.zfs.min_auto_ashift sysctl you can poke now instead of messing about with gnop, but it still defaults to 9 (512b).

tw04 · on Nov 19, 2014

Yes. As do all of the illumos derived builds.

kdavyd · on Nov 19, 2014

No they don't. They create the pool based on the ashift that the drives report, unless you override it at pool creation.

DiabloD3 · on Nov 19, 2014

I think ZoL does 4k by default now as well.

IvyMike · on Nov 19, 2014

I think anyone who was on the supplier side of Intel's "Copy Exact" requirements would find this particularly ironic.

http://www.intel.com/content/www/us/en/quality/exact-copy.ht...

nisa · on Nov 19, 2014

Totally off-topic but I really enjoy this blog. I've discovered it while struggling with ZFS and btrfs and it's a concise opinionated no bullshit honest sysadmin blog from someone far more knowledgeable as myself. You may not agree but the presentation and style is really great.

rodgerd · on Nov 19, 2014

Chris is one of my must-read sysadmin blogs. Smart guy, good writer, lots of interesting things to say.

kevin_thibedeau · on Nov 19, 2014

All that changed are the default settings. It's unfortunate that Intel didn't document the change but the fix is to add a new step to the drive replacement procedure to reconfigure the firmware. This is a scriptable action. Not that big a deal.

tw04 · on Nov 19, 2014

It's a minor inconvenience at best. It's not like ZFS let's you add the drive rendering the pool unusable. You get an error telling you it's the wrong sector size, at which point you fix the error and move on with life.

rincebrain · on Nov 19, 2014

You...can't, that's kind of the point.

The statement is that you can't rollback the FW rev and it's not an end-user configurable.

ericd · on Nov 19, 2014

Hm, does anyone know if this would mess up hardware RAID (specifically LSI MegaRAID running RAID10)?

We have some of these drives in RAID, along with some spares sitting around. All bought around the same time, but who knows if they were manufactured during the transition.