Hacker News new | past | comments | ask | show | jobs | submit login
New hard drive rituals (2018) (linuxserver.io)
119 points by walterbell 3 days ago | hide | past | web | favorite | 69 comments

This article has an unfortunate omission by ignoring drives' S.M.A.R.T. reporting system which can give a surprisingly thorough view into drive metadata, like how many blocks have been remapped due to media corruption.

After running `badblocks`, run `smartctl -t long $DEVICE`, which is a "long" self-test. This will take upwards of a day to complete, depending on the drive.

After that completes, check the status by running `smartctl --all $DEVICE`, and verify that there are only zeroes in the RAW_VALUE column for Reallocated_Sector_Ct, Used_Rsvd_Blk_Cnt_Tot, and Uncorrectable_Error_Cnt.

It's also very illuminating to check SMART stats on "new" drives that show hundreds of on/off cycles or other evidence that the drives are not, in fact, new.

Pro tip: Many, many drives sold on Amazon by large, "reputable" suppliers are not new in the way you think they are - they are "new pulls" from systems integrators that received them as part of the complete PC and then pulled them, en masse, to sell in bulk.

Those "new pulls" may have hundreds and hundreds of hours on them ...

We (rsync.net) have started trending towards B&H Photo for larger batches of drives ... they pack them cluefully and there is no "new pull" monkey-business ... unfortunately they don't always have the quantity on-hand that we require, but that's not a knock against them ...

> they pack them cluefully

Yeah, that's why amazon is a hard no for hard drive buys for me. I don't mind when a lot of things are bouncing around in an oversized shipping box with insufficient padding, or in a box with cat litter, but a hard drive should be shipped responsibly, and I can't reasonably expect that from amazon.

Wow, so I guess people are buying new PCs and essentially scrapping them for parts? That's....... actually kind of funny

When I bought my present laptop, it was about $200 cheaper to order the min-spec (4G RAM/500G hard disc) and add aftermarket RAM and SSD upgrades than to buy it the way I wanted from the maker.

I could imagine a firm that ordered 1000 units and did (or contracted out for) the same upgrades, now has 1000 unneeded 500-gig hard drives they can unload cheap.

Probably not new PCs. But I know a lot of Enterprise-tier clients that do a 3-5 year refresh of hardware. And no shortage of business that fail or need to offload new-ish hardware.

Rip out the HDDs, RAM DIMMs, etc. and resell as "new" for extra $$$.

Why not just buy direct from the manufacturer?

You need really high quantities.

At moderate quantities, Intel and SSD makers do backend deals to get rebate credits to a PC OEM.

When I bought lots of computers, we would usually get -1-3% margins.

> This article has an unfortunate omission by ignoring drives' S.M.A.R.T. reporting system

TFA states:

I have taken to using a 'burnin' script to wrap smartctl tests before and after in order to have relatively high confidence that the drive will not fail soon.


The script I used can be found on Github here

Agree this could be discussed in a bit more detail, but it is actually there.

Better yet, run `smartctl -t conveyance $DEVICE` first, which will run a short (few minutes) test to check for transport damage. `smartctl -a $DEVICE` also reports self test results, so always check those as well.

I've found that recent drives (wd & hgst) didn't support conveyance.

On the topic of new rituals for storage devices, whenever I buy a microSD card or a USB flash drive and before I setup the filesystem, I've been filling it with random data and reading it back to ensure it does have the storage capacity it's supposed to have.

I started doing this after reading comments online (probably here) on fraudulent cards and drives circulating around Amazon. Some comments mentioned that there are even some devices that have their firmware set-up such that writes don't fail even when you've written more than what the device can actually hold. They supposedly manage this by reusing previously used blocks for new files. I suppose in other words, each block has multiple addresses. The addresses would probably loop around the same blocks until the fake size is reached. That would give it the most realistic appearance that it's of the reported size while silently corrupting previously written storage.

In any case, so far I've only gotten 4 cards/drives since then, and I haven't found them to be frauds.

I've since wondered how much I'm subtracting from their product lifespan with my testing. I mean, I imagine that microSD cards and USB flash drives do wear leveling on unused blocks, but without TRIM support on them, after testing them, I'm probably eliminating the devices' ability to do wear leveling since I'm occupying all blocks without the ability to tell the device that they're unoccupied now.

As far as I understand, the problem with lacking TRIM support on USB flash drives is not that there's no protocol for it through USB, since there are USB-SATA adapters that specify support for UASP. Admittedly, though, I'm assuming that UASP is enough to get TRIM support. I don't remember testing that.

In any case, I wonder if this new ritual I'm doing is really worth it, and if it's not, when would it be?

EDIT: I wonder if it's realistic to expect that one day we'll get USB flash drives and microSD cards with TRIM support. I don't expect the need to check for fraudulent storage devices to go away.

Filling with random data is advisable anyway if you use encryption for storage, which almost everybody should do (it's clearly fine to store stuff that is genuinely public unencrypted, but if you're not sure probably not everything is public and so just encrypt everything)

The reason to write random data is that otherwise an adversary inspecting the raw device can determine how much you really stored in the encrypted volume, as the chance of a block being full of zeroes _after_ encryption is negligible so such blocks are invariably just untouched.

The encryption will ensure that random bits and encrypted data are indistinguishable (in practice, in theory this isn't a requirement of things like IND-CPA it just happens that it's what you get anyway).

What's the cause for concern on someone knowing how much of the drive is unused?

I believe that could be used to determine if encrypted hidden volumes are in use

Apart from under-sizing, there are other issues with fake SD cards too. For e.g. with a crappy controller, you can lose data on the sd card during powerouts, or because it reported that the write was successful when it hasn't finished writing, or it can have silent corruption over its lifetime, etc, etc.

There's a tool called `f3` that does this automatically: https://fight-flash-fraud.readthedocs.io/en/latest/index.htm...

TL;DR (for Debian variants):

    $ sudo apt install f3
    $ f3write .
    $ f3read .

> In any case, I wonder if this new ritual I'm doing is really worth it, and if it's not, when would it be?

Depends on the chances and impact of getting a fake/faulty device, yeah? I've heard claims that most online marketplaces have rampant fakes, although you're 4 for 4 with good ones. On the other hand, the failure mode is data loss, which seems pretty bad. So personally I think your approach is a good idea.

I want to think so too, but I wonder how much less the devices are going to last me before they wear out. I know it depends a lot on usage patterns, though. Luckily, I needed these devices almost exclusively for reading, but I wonder if I reduced the available writes something massive.

I mean if the write cycles are so low on a device that one drive write is a problem then it was already a problem before you did one drive write. Even a cheap (legitimate) uSD should have hundreds of full drive write cycles.

Yes, I run f3write / f3read on every USB drive I get. So far, I found one instance of a flash drive equipped with a FIFO controller (reporting itself as a large capacity, but only retaining a much smaller amount of data).

Back around 2010 I got a great deal on eight PACER SSD's from China. At the time, SSD's were still quite expensive compared to spinning platters.

My performance benchmarks were fantastic, but I started seeing some corruption issues. I wrote a C# program to fill the entire disk up with pseudorandom bytes (starting with a known seed), then read it back and compare.

Nearly all of them gave back different bytes at some point in the test. These were silent errors, and didn't trigger any read warnings or CRC/ECC failures reported by the drive's controller.

I suspect they achieved the performance by simply ignoring any errors and steamrolling right along.

Bad blocks bad blocks what ya gonna do, what ya gonna do when they come for you.

Hard drives fail hard or smooth, SSDs fail hard ! On a hard drive, we can picture how the blocks are stored (sequentially). On an SSD, there is a remapping of blocks in silicon+firmware, and this added layer adds its own bugs [0] [1].

[0] https://blog.elcomsoft.com/2019/01/why-ssds-die-a-sudden-dea...

[1] https://www.anandtech.com/show/6503/second-update-on-samsung...

(edit : added one reference more)

Badblocks writes easily compressible patterns and I assume harddrive manufacturers try to compress and deduplicate data, masking any defects. Especially on SSDs and SMR. So I prefer to fill the drive with AES-encrypted data like:

yes | openssl enc -aes-256-cbc -out /dev/test-drive

and then decrypt and grep. It's almost as fast as badblocks.

> and I assume harddrive manufacturers try to compress and deduplicate data

You assume incorrectly. There have been attempts, like SandForce drives in the early days of SSDs but there are a bunch of reasons why you don't want to do opportunistic general-purpose compression at the lowest possible level of the storage chain.

The problem with compression* is that instead of having one drive capacity number (say, 500 GB), you have two: the drive's physical capacity and it's "logical" capacity. Which is fine and dandy, except that the logical capacity varies wildly with what kind of data is stored on the drive. Highly patterned data (text and most executables) compresses very well. Data which is already compressed (most images, videos, archives, etc) does not. So how do you report to the user how much space they have left on the drive? Even if you know the existing contents and current compression ratio, you don't know what they're going to put on the drive in the _future_. Your best guess would be, "250 GB or maybe about a terabyte, lol i dunno".

There's also the fact that application-specific compression algorithms tend to do FAR better than general-purpose compression algorithms, which is why almost all of our media storage formats default to using them. JPEG, HEVC, and so on. Plus you get the benefit of having the thing compressed all the time (even over the I/O channel or network) instead of just on disk.

Compressing data which is _already_ compressed often results in additional overhead. So unless the drive is testing the data to see how well it compresses before writing it to disk (which would murder performance), your 500 GB drive could actually end up being a 450 GB drive.

Further, always-on compression would result in a substantial performance penalty unless special silicon is crafted to handle it. Storage is already an industry with razor-thin margins, companies are not going to add to the BOM cost for a feature that could ultimately make people buy _fewer_ drives.

In the case of deduplication, there's no OS-level standard for it which makes current implementations far less efficient than they could be.

That said, data compression is very popular in the enterprise storage space, but it is typically done at the pool or volume level (large groups of disks) rather than per-disk. These arrays usually combine compression and deduplication with other strategies like thin provisioning to optimize storage to an almost absurd level. It typically requires trained storage engineers to manage them.

* _I lump deduplication into compression because dedeuplication is actually just one kind of compression strategy, even though lots of things treat it like a separate feature._

SSDs have enough remapping logic and get enough benefit from reduced data size to make it worth it.

SMR maybe, I don't know if it's really worth it.

For a hard drive dealing with 4KB sectors, that's a lot of trouble for almost no benefit.

Per recent postings on HN about /dev/urandom, wouldn't it fine to use data from that, rather than bothering with AES output. /dev/urandom should be cryptographically safe.

But how would you detect corruption in this case?

If you write random bytes then read them back, you have nothing to compare them to.

tzs' solution is clever, but if you're planning to simply toss a bad drive (and don't particularly care which blocks are bad), something like

tee /dev/the/disk </dev/urandom | sha256sum


sha256sum /dev/the/disk

should also work fine.

Let b be the number of bytes per block.

1. Test N blocks of the disk, where N is small enough that you can keep a 128-bit hash of each block of random test data in memory for the duration of the test. You can use the hashes to verify that read data is correct.

2. Then test another Nb/8 blocks using the N blocks from #1 to store 128-bit hashes of the Nb/8 test blocks.

3. Then test N(1+b/8)/8 blocks, using the blocks from #1 and #2 to hold the hashes.

4. Then test N(1+b/8 + b/8^2)/8 blocks, using the blocks from #1 and #2 and #3 to hold the hashes.


(Maybe replace the 8's in that with 7's, and in each block of hashes include a hash of the 7 hashes, so you can check that hash blocks are reading back fine?)

So basically you are demonstrating that the method is more complex than OPs, with absolutely no benefit, which I think was the point.

You're right, I don't know what I was thinking with that comment.

Have you ever tried dd'ing fom it? It is slooooow.

Wouldn't doing this sorta thing to SSDs reduce their life significantly?

Consumer SSDs are typically rated for 0.1–0.5 drive writes per day, for a warranty period that's usually 5 years but sometimes just 3 years on low-end drives. So a single pass of sequential writes uses up somewhere between 2 and 10 days of warrantied write endurance, which is a more conservative measure than actual expected functional lifespan.


SSD blocks can be written 10k, 100k, maybe a million times. If one block is rewritten every time the most frequently changed directory is changed, that block might be exhaustible. But starting out with writing each block once won't make a difference.

In 1991 or so, Intel was speccing devices with 100K erase cycles. In private conversation we were told "Oh, they'll last millions, it's just that erases take longer and longer to the point where you run out of patience and just declare the block dead." I don't remember the part sizes, but they were in the low megabit range.

Challenge accepted, I tried to wear a device out (my setup had direct bus access to the flash part in question, no remapping layer). Went for months without an error and I eventually repurposed that corner of my desk.

30 years later, they've pushed the technology a little :-)

You won't get 10K out of an SLC cell (ten years ago those would suffer around 3,000 erase cycles, and it has to be much worse now). I don't know what QLC endurance is (16 voltage levels in a cell, run away), but I wouldn't be surprised if it was a few hundred cycles.

> You won't get 10K out of an SLC cell (ten years ago those would suffer around 3,000 erase cycles, and it has to be much worse now).

10k P/E cycles is the right ballpark for SLC 3D NAND these days, and it's never been anywhere as low as 3k—that's more like what good TLC 3D NAND gets. QLC is in the 500-1000 cycle range.

Yup, I had mis-remembered the SLC versus TLC numbers.

I'm surprised QLC gets as high as 1000, frankly. It's like the cells are on a first-name basis with the electrons they hold.

I think it was worse with TLC right before the transition to 3D NAND. They were down to about 8 electrons between voltage states. 3D NAND memory cells are physically much larger than 15nm planar NAND, and most are a lot less leaky.

EIGHT electrons?

Indistinguishable from magic, man.

[Cue future archeologists: "Remember when our grandparents could only get a few gigabytes on a quark? How primitive, how sooo last-femtocentury."]

> last-femtocentury

You nerd-sniped me. I couldn't resist calculating it. A femtocentury is about 3.16 microseconds.

Does anybody sell standalone SLC drives these days? I was bummed when Intel stopped selling theirs several years ago.

Samsung's "Z-NAND" is a latency-optimized 3D SLC NAND, which uses smaller page and block sizes than their capacity-oriented mainstream 3D NAND. The Samsung 983 ZET, SZ983 and SZ985 are drives using that memory.

Kioxia (formerly Toshiba) is sampling their version of the same basic idea, and SSDs using that memory should be coming out during the next few months.

Intel has of course replaced SLC NAND with their 3D XPoint memory, which is about twice the price but can sustain better write speeds because it doesn't have flash memory's slow block erase operations.

> SSD blocks can be written 10k, 100k, maybe a million times.

It's not 2007 anymore.

SSDs also have wear leveling; you cannot wear out a particular block of flash by writing to the same LBA a bunch of times.

For badblocks, I suggest a block size larger than 4KiB, it'll go faster. The legacy reason for 4KiB: badblocks can produce a badblocks file that mke2fs can use to avoid using known bad sectors, however the block size for badblocks and mkfs must be the same. Since bad sectors are managed by the firmware these days, you're best off not using this legacy feature, therefore you can up the badblocks block size well above 4KiB.

Another idea for a generic approach, whether HDD or SSD, is to use f3. It'll check for fake flash (firmware loops to report a bigger device size than real size), corruptions, and read/write errors.

Here is a little program I wrote to stress test disks. It was originally written for HDD but it works well on SSD and Flash too. I test all my media with it for 24 hours. In fact I'm testing two 5TB external Seagate drives as I type!


Well done. Thanks for sharing your efforts!

Thank you for adding Windows support! Some of us still run it :)

My preferred software for scanning a newly purchased disk is MHDD:


Because it times how long it takes to read/write each sector, it has the advantage of being able to detect "weak" sectors before they even become bad ones --- HDDs will retry accesses multiple times if an error occurs before giving up, and to a program that simply reads and writes data, weak sectors that successfully read after a small number of retries aren't noticeable. They will show up in MHDD, as taking longer than usual to read.

I'm thinking since a while about replacing mdadm's raid5 with ZFS' RAIDZ in my two NAS:

if I understood correctly, in the case of single/few block failures ZFS would transparently remap the bad block(s) present on the bad HDD to a good block on the same HDD (therefore without having to immediately replace the partially faulty HDD and rebuild the whole raid) => is this correct?

I admit that continuing to use the partially faulty HDD might be a bad idea, but at least having a raid 100% healthy before replacing the partially faulty HDD gives me a good feeling... .

It’s pretty standard procedure for a drive to stop using known bad blocks transparently though.

These are simply called “bad blocks” in S.M.A.R.T - I guess it’s a little similar to overprovisioning and wear in SSDs.

Aha! I wasn't aware of that at all! Thank you!

25 in the last 10 years. Must be nice. I buy that many each year. I often buy new drives that can't detect their own errors that ZFS finds. Because a faulty device by definition can't test itself.

SMART doesn't work anymore if it ever did. Manufacturer firmware blobs have a huge incentive to report a successful test no matter what. Also as someone else mentioned, there are plenty of used drives being resold as new in the market.

This industry has skated for way too long in my opinion. They benefit from a big reluctance on the part of people & companies to return faulty drives to protect sensitive company data.

I have seriously considered starting an ecommerce shop that sells only hard drives. But also does backblaze like tracking / reporting of individual drives, has a good return policy and degausses returned drives.

Speaking as a consumer I have zero interest in the current system, I want a system that lets me take the TCO of a drive/drive class into account before I make a purchasing decision.

There are at least two comments in here about special processes people use to validate Amazon purchases. This is because you cannot trust that what you are buying is new and because you cannot trust that it is the advertised size. If this is true, why not just buy computer equipment somewhere else?

I'm curious about what prevents the manufacturers from running badblocks and eliminating some of the downstream pain. Obviously some users care and will spend the effort to check their drives, and some unlucky users will have a data loss and a bad experience.

Is not running badblocks before shipping a way to exploit users who won't RMA or won't realize their drive is busted? Or are drives getting damaged in transit? Or something else?

I recommend using Flexible I/O tester (fio) to run burn-in new disks: https://fio.readthedocs.io/en/latest/fio_doc.html#running-fi...

It will give you both performance statistics and catch checksum verification errors.

For hard disk drives I leave the drive aside for 24 hours, before powering it up, if it's likely to have been in below-zero temperatures before it got to me.

I run a short SMART test and check the results, as well as attributes, in case it's faulty already. Then do similar to the OP, afterwards running a short, conveyance, and long test.

> For hard disk drives I leave the drive aside for 24 hours, before powering it up, if it's likely to have been in below-zero temperatures before it got to me.

Regarding temperatures: it used to be somewhat common practice to freeze drives that were clicking.

Funny to hear someone waiting to them to thaw, when I have intentionally frozen a bunch of drives before.

The way I figure: they're fragile as it is, so if I can help make their life easier, I do :-)

On every new hard drive I always run SpinRite when I hook it up on my machine(s). I know that for the repair mode it is sloooooooooow, but Steve Gibson is working on a newer version that would significantly improve speed. (not affiliated, just save my butt a couple of times).

Ritual I've had since at least 1994: naming a new hard drive volume under MS-DOS/Windows "PLEASE_WORK"

I'm curious about the efficacy here; how many "bad" drives will this find?

I've found several in the past couple years (I test them more rigorously before I put them into a NAS).

Would it work with SMR drives?

[Wrote a comment on the wrong thread... sorry]

Are you on the right thread?

haha, nope. Ty

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact