Hacker News new | past | comments | ask | show | jobs | submit login
WD Passport 4TB drives don't support WRITE SAME command (wd.com)
214 points by mschuster91 13 days ago | hide | past | web | favorite | 103 comments

SCSI standard only mandates WRITE SAME command support for host managed zoned block devices, so unless the drive reports as one it is within spec.

(a) source? Only thing I could find is SBC-3 which lists everything as optional, so kinda not helpful [https://t10.org/ftp/t10/document.05/05-344r0.pdf]

(b) doesn't really matter, working reality trumps theoretical spec.

(I looked it up and WRITE SAME seems to be a SCSI command to essentially do a "memset".)

> In any case, it is a huge security issue, because file systems use this command to efficiently clear freed blocks to zeros.

Do file systems directly issue SCSI commands? I would've thought they tell the storage driver to do something and the driver would do it with the most efficient means available.

If not supporting WRITE SAME turns out to be a security issue, it's a bug in the operating system.

And yes, some filesystems do - ESX, for example, uses what they call VAAI, which is a set of optional (standardized) SCSI functionality, like WRITE SAME, COMPARE AND SWAP (iirc), and server side copy.

Ah, blame tennis, my favorite game!

Is there an alternative non-optional strategy for achieving secure delete (or revocation semantics of some kind)? If not, this is a fundamental capability that you can't paper over by slapping an abstraction layer on top any more than you could turn a 1TB HDD into a 2TB HDD with an abstraction layer. If so, it seems to me like the bug is very much in the hard drive / standards, not in the operating system.

> Is there an alternative non-optional strategy for achieving secure delete

Issue normal data writes of blocks that are filled with zeros. The same way regular data makes it to the drive just fine will also of course work for data that's all zeros.

Oh, so WRITE SAME doesn't come with "no wear leveling" semantics? That makes emulation much more reasonable.

I think the only way to get "no wear leveling" is the ATA Secure Erase command. Which you only need for devices that do wear leveling in the first place which the drive in question doesn't anyway so it's a bit moot.

Would that work on a filesystem that supports sparse files?

We're talking about the filesystem driver itself issuing the write.

The above is a discussion about whether the filesystem driver or the block device driver would issue the SCSI commands.

This would never happen from userspace.

Why would you need to overwrite blocks of a sparse file? Which blocks would you be overwriting?

> If not supporting WRITE SAME turns out to be a security issue, it's a bug in the operating system.

Is it though? There is probably a big drawback in terms of resource consumption if this is not supported. Not all environments may be ok with this.

That sounds like a performance issue, not a security issue?

The (potential) security issue is that the operating system fails to clear disk space, possibly incorrectly assuming it has actually been cleared.

Except that it receives an error, so if it's assuming that, it's wrong.

This command is not mandated to be supported. Therefore, if an OS assumes it is supported, that's an OS problem, not the drive.

I don't see how this could occur. Either you do WRITE SAME and it works or it doesn't and you have to manually issue WRITE commands.

It likely flows down from the same kind of code that supports TRIM or other free space clearing stuff. They won't (usually) issue the commands themselves

It seems like a bug in a storage driver in that case (if it's actually getting triggered by it)... if a command isn't available, it should be falling back to one that is, right?

Maybe? I don't know if there's a command Discovery for scsi that would let them know if things are supposed to be supported. If there is maybe it advertised support and confuses the system when it doesn't work

When you talk to disks via smartctl, the tool reports the specification versions they support. There's a ATA Version and SATA Version field for SATA disks. I was unable to get details on a SAS disk, but it was identified as a SAS drive successfully.

These standards probably define mandatory and optional commands to certify disks as compatible with these specs IMHO.

If the command is optional, then it's OK, but if it's not, then there's some bug fix what WD shall make.

For SAS drives I would recommend sg3_utils [1]. You can basically query what the drive supports via `sg_opcodes`.

smartctl isn't really designed to handle SCSI protocol I think. It can do basic things but for anything deep you better use sg3_utils.

[1] http://sg.danny.cz/sg/sg3_utils.html

Thanks for the reply and the utility. I'll take a look into it. Since I'm familiar with smartctl due to my server management roles, it came to my mind so I've shared. I never thought it should be able to handle beyond what it needs to do get SMART and other diagnostic data.

Thanks again. :)

> I don't know if there's a command Discovery for scsi that would let them know if things are supposed to be supported.

The OP shows errors that are reported to the OS by the drive when it attempts to use the command. Even if it can't pre-determine support for the command, it can fall back upon receiving an error.

Yes, it's called "REPORT SUPPORTED OPERATION CODES". If you have sg3_utils instead, sg_opcodes can be used to get the list of operations supported.

Why do you need Discovery? The command itself returns illegal opcode, that's sufficient right?

It's not always safe to simply try an opcode if it's valid because it might trigger something else... like a firmware update (which has happened)

Thanks, I suppose that answers the question of "why not try the opcode instead of doing command discovery". Though what I was really trying to understand was, "if you've already issued the command {for whatever reason}, and it returns invalid opcode, then shouldn't you fall back to an alternative command?" Because at that point, you have enough information to know you can do so safely. It seems to me that that's what the storage driver needs to do, irrespective of any command discovery or lack thereof beforehand.

There can be other reasons for command failure than "opcode not supported", even if that's the error code returned. I wouldn't trust cheaper harddrives to handle that properly either.

What would such a reason be? How likely is this to happen? If you have such a mistrust of the response then you can never trust anything, right? How do you know the drive isn't lying about everything else too? At some point you gotta trust something means what it says...

The trust is in what the drive identifies as supported.

The issue is that some command ops may be doing double duty in a different drive. Famously, a few CDROM drive vendors reused the "clear buffer" command to instead mean "update firmware". Linux used support for "clear buffer" to detect if a drive is a CDROM or CDRW drive. As a result, using such a specific CDROM drive under linux would quickly cause the CDROM drive to become permanently bricked.

You can't trust the response because it's likely that at that point, the damage is already done. Even if you get one, you might not know what it means.

That applies to any command that the drive does not advertise support for via appropriate SAS and SATA commands. In some rare cases you might manually have a whitelist of commands supported by drives outside this list but you should never try to automatically discover it during runtime.

> You can't trust the response because it's likely that at that point, the damage is already done. Even if you get one, you might not know what it means.

I still don't get this. If the damage is already done, then how is issuing the fallback going to change things? Again: I'm not arguing about whether discovery should be done or not. All I'm saying is, if the device says invalid opcode, you should use the fallback, whether or not there was any discovery that led you to use the initial opcode.

You don't know what state the drive is in anymore. The safest option is to reset the device entirely and start it back up again. If it comes back, you can use your fallback.

But it is much easier to rely on what is known to work instead of issuing potentially non-working commands to the point that there is no reason to have a fallback other than "rediscover what it supports".

I don't get why you would even want to use a fallback command on a drive that is in a potentially unknown or undefined state.

If discovery led to an invalid opcode the drive is faulty, end of story. The SAS and SATA standards are very clear on what is permitted and what is forbidden and that falls very far on the side of "not allowed".

Is this just a theoretical thing, or have there been actual drives that lied about invalid opcodes on a read and then proceeded to destroy the drive if you issued a fallback read? I have a hard time believing a hard drive would behave like a C compiler if I'm being honest...

As I mentioned earlier, there was a series of CDROM drives that upon receiving an unsupported command (and this was before you could discover it) would lead to all further data being interpreted as firmware data for an update and brick the device. If you issued a fallback read then the device would become bricked, if you reset the bus and reinitialized the device, everything was fine.

Discovery has of course improved this, so we know what a harddrive can and cannot do. Harddrives that lie about what they support shouldn't have the appropriate seals and trademarks of SATA or SAS on them, as they must be certified by those entities.

Oh wow, it would interpret every subsequent command as firmware data? I didn't realize that, that's completely nuts. Thanks for sharing that!

Well, if it's a security concern, perhaps the code author should check the response code rather than blindly assuming it worked.

No, it's indirect via whatever the operating system's disk abstraction is.

I don't know if I am at all surprised that a USB hard-drive is failing to implement all SCSI commands, even if it really should.

You're getting replies about shucking, but WD Passports are 2.5 inch drives that AFAICT are USB-only. The drive controller speaks USB directly, as opposed to SATA chained to a translator chip on a separate board. I've got a pair on my desk which I won't attempt to open, as I'm actually using them as portable external drives (as opposed to my pile of easystore carcasses).

I haven't noticed any abnormal behavior, but I also don't mount/luksOpen them with -o discard (if that's what it takes to trigger this).

You can see it visually too if you have a 2.5-inch drive to compare it to (or know the size of a 2.5-inch drive): drives with separate USB to SATA boards are longer to fit that board, while these are only about the same length as the bare drive plus the plastic shell.


Weird. That's interesting. Why did I think WD and other companies would just keep churning SATA interface drives forever and ever.

Wonder if the physical hdd part itself is identical to the SATA models, and swapping over the control board from a (eg dead) SATA one of same capacity (& platters) would work?

Unlikely, in a modern (high capacity) drive, the firmware contains quite a bit of calibration data for all the components. You'd probably have to perform a factory reset and reformat of the physical disk blocks.

Good point. Probably not useful for extracting existing data from a (eg dead) drive. :)

Isn't the USB mass storage spec just a pipe for sending SCSI commands?

USB Mass Storage has its own command set. Some newer USB3 adapters do allow SCSI commands to be passed through using UASP, but not all do.

Some, if not most, USB2 to SATA/SAS bridges also allows passing these commands. smartctl can talk with most of the bridges to pass-thru commands, at least for SMART purposes.

Seagate's 3.5" Backup Plus Hub drives mask the disk itself intentionally to show itself as a different, generic Seagate device, probably to prevent people from messing with the disk settings and identifying the disk itself.

And even if they support it, it isn't always safe to do so. For example, I have a WD Passport sitting on my desk that was bricked when I issued an ATA secure erase command[1]. Don't assume that any SATA/SCSI commands are safe to issue to USB drives unless you have researched it.

[1] https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase

Probably the USB interface keeps some data for its encryption function on a part of the hard drive which it makes inaccessible. You erased the whole hard drive, including that part, and probably caused the USB interface to malfunction.

No, it famously lacks TRIM for example. They may finally have gotten their act together and updated it in a recent version, but there are plenty drives on the market still today that will , for lack of TRIM support, eventually grind themselves down to a halt. FireWire and Thunderbolt do support full SCSI, iirc.

I also recently tried a bunch of USB3+SCSI - SATA adapters from Amazon, and only found a single one that supported passing TRIM through. And that's after a firmware upgrade that can only (afaik) be done using an installer running on ARM+Linux and provided by a third-party. Truly amazing.

How can I check that for my adapter?

Either mount the drive and run "fstrim" on the drive's location (it will error without TRIM support) or https://unix.stackexchange.com/a/339684/21419

You need to check your adapters for UASP support.

In addition to devices, the host controller has to support UASP too - in early days of USB3, it was separately licensed option.

All the adapters I tried claimed UASP support.

If the drive itself doesn't support it how can the interal usb controller in an external drive compensate? At best it seems it could hide the error or do some hacky compensation for it.

At best perhaps it's a method of enforcing product segmantation - preventing users from buying & shucking what are strangely cheaper external drives (discounted so that WD can sell you (or sell 'you' via telemetry) some value addon software. I don't see a technical reason for how a drive could physical be unable to support this SCSI command.

Why wouldn't it have the same functionality as an internal HDD?

They were never ever intended to be taken out and used that way. You can mask potential firmware problems and ship faster if you also own the enclosure.

Wouldn't it ship faster if you just plopped in any regular ol HDD the company makes (the cheapest one, probably) instead of developing a custom one just for that product?

If you're trying to bring a new drive to market (and even for many capacities, they get refreshed regularly if only to cut platters or find new ways to cut costs), and your firmware isn't ready for prime time as an internal drive, you can still ship it as an external drive earlier because you have a good idea of how the controller is going to treat the drive. Not only do the external drives tend to be the end of the waterfall in terms of quality, manufacturers also may ship with firmware problems that would be a stop ship if it was packaged for nearline or desktop.

Don't ask me how I know this.

And then consumers use external drives as a backup method...

Yes, if there's a take away for me, having seen how the sausage is made, it's far and above worth grabbing as NAS or nearline SATA drive and jamming it in a good SATA enclosure instead of buying a drive that's already in an enclosure if you want peace of mind.

Didn’t backblaze do exactly this (shuck drives from external enclosures) when they couldn’t buy enough drives on the open market after the tsunami?

And they themselves said that it's not a good idea, it simply was a time of desperation. They're not shucking externals now, now that proper drives are available.

Also they have a lot more tolerance for disk failure than almost anyone.

They did but with 3.5" drives which were not USB specific. This is done to save space and noone cares for space with a 3.5" drive. If you ship with a converter board it is impossible to make it this small.

I was always wondering how external drives can be cheaper than internal. This explains. Thank you.

Yeah, and massively cheaper too. I bought a 5TB passport drive for £71 recently, an equivalent capacity desktop drive would be at least £200. The price disparity is insane.

In reality, it is probably exactly that. These are the drives that didn't make the cut to be sold independently.

Yeah pretty common thing across all industries. I would expect things like maybe the vibration is slightly out of normal spec, or an abnormal amount of bad sectors on a fresh unit.

But I don't understand how them maybe not being top quality would make part of the SCSI protocol disappear in the firmware. That indicates that these drives had a different firmware developed for them with missing features which makes no sense.

What you are missing is that WD Passports are 2.5 inch drives that are USB-only. The drive controller speaks USB directly, as opposed to SATA chained to a translator chip on a separate board.

They implement certain parts not pass through and do indeed have separate USB firmware.

Is there actually a SCSI drive in there or just a SATA drive with a USB bridge chip using the USB mass storage spec and implementing some basic SCSI commands?

It's a regular 2.5" hard drive, but the motherboard has a USB <> SATA bridge and some glue logic on it already, probably to save space or costs. [0]

There's no SATA connector so you can't salvage the drive or the enclosure. But there are SATA test points so you could wire it that way in theory. [1] [2]

Toshiba does the same, I found out the hard way after prying open one of them to salvage a hard drive for my PS4

[0] https://www.youtube.com/watch?v=wP4l_L81NKw

[1] https://forum.acelaboratory.com/download/file.php?id=999&mod...

[2] https://forum.acelaboratory.com/viewtopic.php?t=9174

In the 3.5" space, "shucking" the enclosures off desktop USB storage devices almost always reveals a SATA 3.5" hard drive.

Kind of surprising that the drive control board in the Passport has the USB connector built right in. It makes me wonder a few things:

1. What are volumes like for 2.5" spinning rust drives? I understand that the vast majority of 3.5" drives go into servers, desktops, or storage devices where they operate on a SATA bus, so the small volume of USB drives are most cheaply made with a housing that uses the economies of scale of that industry and adds a USB conversion motherboard. A decade ago, I would have said most 2.5" drives are used with SATA connectors in laptops, but who's buying laptops that don't use solid state storage anymore?

2. What's the cost difference for a drive control board with optional pads for both SATA and USB, only one installed at a time, vs one that only supports SATA?

3. Can you pull off the control board and replace it with one from the same lineup that uses SATA, like you would in a data recovery operation where some IC on the board burned out? Or is the mechanical component also specialized?

When I disassembled one portable SSD, it was exactly that: ordinary SATA 2.5" drive inside enclosure with tiny adapter.

Sometimes the firmware is optimized for operation though. WD is one of the most advanced companies in this regard. My 4TB Passport drive reports a drive model which is not sold separately but, it's purpose built to run in an enclosure.

My first generation Passport 320GB disk was also has a different firmware for enclosure based operation.

IIRC some WD disks doesn't have SATA ports but USB ports are directly soldered to their drive boards.

WD is an interesting company.

Well the output looks like it came from the UAS driver which is by name, usb attached scsi. I imagine the drive itself is probably some sata drive and the usb chip just translates scsi to sata.

I wonder if it is some kind of market segmentation choice. Does it even make sense?

It is absolutely a market segmentation choice.

How do you know this?

It’s literally nothing more than a piece of firmware. WD is fairly aggressive about market segmentation, and firmware differences (or settings) are a big part of that.

You should read the rest of the thread. It's likely a completely different drive.

A completely different drive which lacks a feature in firmware. WD creates "completely different drives" which may have physical differences, but the firmware is also a huge differentiator in the market.

i learnt not to trust wd passport drives.

I have a completely unusable 2tb drive at home that for some reason only gets detected by macbooks, not from windows or linux pcs.

I've had similar experiences. Two that I couldn't access with anything, and would've needed to pay a professional data recovery person if I wanted the stuff. But I had two Hitachi ones which were similar too.

I now have a collection of internal drives in enclosures, and the first two, out of old laptops, have now outlasted any external drive I've ever had.

Only 1 external drive I've had has been good in my life. That's a Seagate. Dunno if it's a fluke but I'll just buy that brand in the future until I find out.

If you rely on your USB hard drive to write zeros when you delete data, you must stop and encrypt your data.

Encryption is not future proof (encryption that was previously thought of as secure has been broken). Writing zeros is future proof.

On an SSD writing zeroes, may only trigger a remap of the NAND cells. There is a paper where data has been recovered under such situations... So writing zeroes hasn't been future proof for a decade or longer.

In general, overwriting an LBA on a SSD will not cause a write to the same physical flash memory cells, no matter what the data pattern is. If you want to guarantee that the old version of the data is actually erased from the physical medium, you need to issue a drive sanitize command.

Pretty sure people have discovered that is not reliable on some drives too, that it may only destroy the logical to physical map, and not change the actual data cells.

It is surprisingly difficult to ensure deliberate data loss

> Pretty sure people have discovered that is not reliable on some drives too, that it may only destroy the logical to physical map, and not change the actual data cells.

This can be the case for something like an ATA Secure Erase command, which is why the Sanitize commands were introduced to ATA, SCSI and NVMe. Those do explicitly mandate that all user data be erased, including from all caches and any storage media that is not normally accessible to the host system (ie. old blocks that haven't been garbage collected yet).

Absolutely not. Writing random data is future proof.

1) Writing all zeros is generally considered more SSD-friendly than random data. The exact reasons for this are complex in part because behavior of SSD controllers varies significantly with all-zero blocks. But, while absolutely inferior to using TRIM, there is reason to believe that writing all zeroes is less likely to lead to premature wear than random data.

2) While it's been "common knowledge" since Gutmann that data from old writes can be recovered (thus the advice to write multiple passes of random data), this turns out to have been iffy in Gutmann's day and an outright myth today. Multiple university teams have tried and failed to recover data using advanced techniques (such as SEM tomography) after a single zero pass. Generally the success rate for single bits is only slightly better than random chance. Gutmann himself criticized multi-pass overwriting as "a kind of voodoo incantation to banish evil spirits" and unnecessary today.

3) By far the larger concern in data recovery, for platters as well as SSDs, is caches and remapping performed in the firmware. As a result, the ATA secure erase command is the best way to destroy data because it allows the controller to employ its special knowledge of the architecture of the drive. However, ATA SE has been found to be extremely inconsistently implemented, especially on consumer hard drives. The inability to reliably verify good completion of the ATA SE is a major contributor towards preference for "self-encrypting" drives in which ATA SE can be reliably achieved by clearing the internal crypto information, and the US government's recommendation that drives can only reliably be cleared by physical destruction. Physical destruction is probably your best bet as well, because self-encrypting enterprise drives come at a substantial price premium and you still lack insight into the quality of their firmware. In other words, the price of a drive with an assured good ATA SE implementation is probably higher than the price of a cheap drive and the one you'll replace it with after you crush it.

in regards to 2):

It's true that multiple overwrites are overkill. But for SSD's it's has been shown that it's possible to read data after a full overwrite [1].

[1] https://static.usenix.org/event/fast11/tech/full_papers/Wei....

The data recovered in this paper, though, was recovered by direct readout of flash chips in order to locate pages which had not actually been overwritten at all. This is a very different kind of problem and attack than the one that led to multiple-pass overwrites and falls into my point 3. The reason that multi-pass overwriting can be effective on SSDs is because the increased number of write operations encourages the SSD controller to remap more blocks in and out of the page space which increases physical coverage of the overwrite.

There is a potential benefit to multi-pass random write to SSDs in this case, but this paper shows exactly why you shouldn't do this: because the improvement in security from random overwrites is stochastic at best and cannot be guaranteed without full knowledge of the behavior of the controller, as can be seen in the paper in the drives which continued to contain remnant data after many passes.

As the paper finds, multi-pass overwrite is not a valid technique to sanitize SSDs, and is still cargo-cult security.

Yes like I already said multi-pass is not a good way to sanitize SSDs. However it does directly contradict your stance that data is irrecoverable after a full-write. It doesn't really matter that it's done via a direct flash chip readout. Literally anyone can do that. In comparison the cost of a SEM(which can't read out platters) approaches a million dollars.

Not writing the data in the first place is future proof.

Melting the platters is the next best option.

Writing encrypted zeros is more future proofish

True, but you can do both.

This sounds a lot like the "vitamin juice obviously isn't healthy" argument. Obviously overwriting bytes does not delete files! Stop doing that, use encryption instead!

Future OS'es (if using SCSI) should test to see if this works (easy test), especially if using that SCSI to communicate with a VM host's (i.e., ESX's) filesystem...

So, a user can create a new file and get access to previously stored content somehow?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact