So I'm all for scavenging compute bits for future projects, but it is by no means unique to HDDs.
Before I got my hand on vendor provided error injection I would have thought this to be of great use but hacking in ARM assembly to get this would be quite a task.
Any sufficiently complex system will exhibit this trait sooner or later.
My understanding is that the effective width of the write head is 10x the width of the read head... E.g. with the right firmware, it should be possible, if you are okay with a write-once medium, to write the outermost track, move the write head in 1/10th what you'd normally move it, then write the next track, etc... and get 10x the space out of the drive you normally would. In theory, the read head wouldn't have trouble. (of course, this would be write once storage, as the effective width of your write head is still pretty huge; but for a bunch of things? I can totally work with that... if more than X% of a drive was garbage data, I copy the good data to a new drive and reformat the old one. Done.)
I hear rumors that both the major drive manufacturers are actually shipping drives with this technology, but are only selling those drives to really big players, for some reason.
Here's a reasonable reference to the 'shingle' technology, and he roadmap for the rest of us:
but that's the thing, with the datasheets (and, well, a lot more skill than I personally have) we should be able to setup something like shingling on the cheap disks we have today.
Of course, from reading the article, I'm not sure I'm any closer to that particular dream.
I read online (probably on HN or similar) that Amazon Glacier is using drives with custom firmware that keeps them spun down to 300rpm so that more drives can fit in a rack without power and cooling concerns.
That's certainly an interesting case that certainly wasn't possible without drive manufacturers stepping up to Amazon's wishes. Being able to do custom mods like this to your own disks is pretty excellent as well.
For example, most RAID systems don't really care so much about the first error on the disk, if the disk fails to read we can save a lot of time by not retrying too much and just go to build from the RAID. If by any chance this is the second (RAID5) or third (RAID6) error than you want a much stronger retry logic. Current disk firmwares do not allow for such logic.
Huh. I think it's fairly common that companies engage in price discrimination by producing a lot of the same hardware, then crippling the hardware sold to the lower-end. Note, my example of hard drive manufactures doing this has to do with the next bit of your quote:
>They're also working with information you won't have.
So the 'crippling' I whine the most about is the difference between 'consumer' and 'enterprise' hard drives.
If you aren't running a hard drive in a raid, if it's just one drive in a desktop, generally speaking, if there's a problem? you want the thing to keep retrying, if there is any chance at all that it might be able to resolve the problem.
If it's just one drive in a desktop, it's almost always best to do something that will make the drive go slower than to cause the drive to fail.
My situation? where drives are sitting in a RAID? almost the exact opposite.
So yeah; me? I spend twice as much money to get "enterprise" drives that are almost identical, mechanically, but come with slightly better firmware. Firmware that just fails, rather than waking me up in the middle of the night.
(A friend of mine has been telling me: "Luke, a hung drive is just a special case of a slow drive; You need to monitor read/write latency and proactively fail slowish drives. check out blocktrace" - and he's probably right.)
Note, WD has TLER, which they say you can change with WDTLER.exe. In my experience? works on about half the drives you try, and even then those drives are far more likely to get slow (but not completely hang) than an 'enterprise' drive.
Now... let's talk about bad sectors. Filesystems have been handling bad sectors, well, for most of my life now. they can do it fairly well.
The problem with letting the firmware handle bad sectors is that the OS doing read/write reordering assumes that if you write sector 559 560 561, those are physically sequential. Once the hardware firmware remaps sector 560 off into the fucking boonies, my nice sequential read is now completely fucking random... and way slower. My point is that something like ZFS can handle bad sectors way better than the drive firmware, because it's got a lot more information. A lot more information in the case of read errors... all the firmware can do is hang you up retrying; the RAID layer could actively grab that block from another drive.
So yeah, they have information I don't have... and my computer would go dramatically faster if I could have that information. My pager
There are also several places along the way where reallocations go to and the drive tries to find the closest one to the reallocated tracks to avoid too large seeks.
Adds cost, for one thing. But you can arrange for the unit to never run a byte of code (even one loaded from the platter) that didn't come from WD.
As always security is a trade-off. The threat vector of flashing a backdoored BIOS/firmware is irrelevant for 99% of the market: most people will never be targets of such highly-technical attacks.
PS: I tip my hat off to Sprite_TM; fascinating research! I love to disassemble firmware myself :) I liked how you were able to reverse-engineer the data structures in RAM.
Adding security to a system imposes costs on the use, maintenance, and support of those systems. Can you imagine the scale issues associated with maintaining PKI over the millions of devices deployed? How about hundreds of millions?
TPM is present in many, many laptops yet most IT departments leave it un-configured. Why? Because when you replace the hard drive and it changes the boot vector parameters, the machine will no longer boot and you have to work inside the auth mechanisms of the TPM infrastructure do to simple hard drive replacements. (nb: I'm not an IT guy so I might be a bit off here, but you get the gist)
So the reasons for NOT including security are pretty damn big compared to the risk. As security guys are fond of saying: "If I can get physical access to your machine, all bets are off." Keyloggers, evil maid BIOS, HID attacks, firmware attacks on peripherals, etc are all possible ways to compromise a device. If you're wondering how STUXNET got into an air-gapped security facility in Iran, I'd bet that a method like this was the prime candidate. That's how I'd do it.
Now, security is a process. You can expect breaches of high value hardware, and need to react to them.
The security model for consoles STARTS with the assumption that the attacker has physical possession of the hardware. This makes things interesting; it's certainly a big differentiator for PCs versus tablets, and one of the reasons why the Windows group at Microsoft has had a lot of trouble making their stuff secure on non-PCs.
But for a hard drive, things should be pretty contained. I estimated it would have taken a couple of weeks to secure one major embedded system I worked on; assuming the interfaces are limited, it's not a huge deal.
This lets you play "backups" of games, enabling piracy. It doesn't let you run unsigned code (those required other cracks) or sign anything (such as save files).
These features are required by enterprise customers to prevent just this sort of tampering.
E.g. the FOSS community wasn't a fan of only-trusted-secure-boot when it was microsoft holding the keys and the source and releasing neither.
Secure boot isn't a technical solution to a soft/firmware update problem. It's a control mechanism to solve a management problem. Crazy idea, don't put the interface that has access to firmware on the standard interface. Use its own interface and allow motherboard manufacturers to support it for enterprise/datacenter geared systems.
This stuff /is/ hard. I sat next to a bunch of folks doing this on a console platform and the techniques and exploits were, in a word, breathtaking. But if you know what you're doing and your scope is limited -- to a smallish device, for instance -- you can make it very hard for someone to crack.
we think of the system as a holistic entity, but turned on its head, you can see how the inside of a computer is just a network...
edit: sorry, chain looks like this: BIOS reads boot block from hard drive, checks digital signature, signature invalid, boot fails. BIOS check for signature works in concert with TPM to do key storage, so you're fine. albeit, sleeping with 'evil' technology like UEFI and TPM.
This sort of attack is usually going to be more trouble than it's worth to execute, but that doesn't mean it's out of reach for a motivated, educated individual.
The point is that it's the same kind of attack. Relying on the BIOS may save you from an attack on the disk firmware but that doesn't much help if the same class of attack is still effective against the BIOS.
I'd bet most systems see different disk controllers more often than they see different BIOS chips. I'd bet (though not at so high odds) that reasonably secure TPM chips are relatively easier to find outside of the high-end server niche. I'd bet that most non-state actors executing this sort of attack wouldn't have equivalent exploits ready for many different types of hardware.
All of those factors shift risk around (again, what little risk there is from this sort of vulnerability). Forgetting about patching a hole here because of an equal-sized hole over there is silly.
"That isn't supposed to happen" doesn't mean it won't happen. The Titanic wasn't supposed to sink.
The point is, you want to be able to recover from attacks. It isn't about security today. The premise here is that you've already been compromised to the point that the attacker may have been able to screw with the firmware on your hardware. What you need then is not an assurance that the thing that already happened not be very easy, what you need is a way to hard reset the hardware to a known-good (i.e. factory) state given the assumption that every piece of EEPROM in the machine has been replaced with malicious code. Having something like a jumper on the logic board that will do that in hardware would be a welcome security feature.
Virtualised environments that don't pass the vendor specific commands should be immune to the attack though. As others have said, encryption would probably allow tampered pages to be detected. I'd be interested to see if the modified firmware could ignore new firmware...
It can, but doesn't always. For example, eCryptfs currently doesn't protect against tampering; it uses Cipher Block Chaining (CBC) mode without a HMAC or other signature.
(I'm working with some colleagues to add Galois/Counter Mode (GCM) support to eCryptfs, which does provide some form of tamper-detection.)
But this is really amazing. I'd love to see how it could be extended to other OSes, if possible?
One of the SE sites assembled a list. Shockingly, the question isn't closed yet!
However, I will be forwarding this to my wife who gives me a hard time when I, before getting rid of an old computer, remove the HD and give five or so well placed hits with a hammer on the whole HD assembly.
right now on page 1: https://news.ycombinator.com/item?id=6146279
It reminds me of a similar proof-of-concept hack on a common network card firmware: http://esec-lab.sogeti.com/post/2010/11/21/Presentation-at-H... (the slides linked from that page have a good more technical overview that the blog post).
$ echo 3 > /proc/sys/vm/drop_caches
or as non-root
$ echo 3 | sudo tee /proc/sys/vm/drop_caches