What sucks, however, is that you can transfer 1TB to a USB drive correctly, and then finally have a bit-error that spoils the entire transfer and might even corrupt your drive. Happened to me several times during testing on different machines and with different cables and drives (luckily no real data was lost, but USB is now sort of banned from my office for use with external harddrives). Any sane protocol would be able to deal with such a low error rate, and finish a data transfer without problems.
Use a filesystem that does end-to-end error detection and correction. I use ZFS on all of my USB drives (even "real" SATA and M.2 SSDs via adaptors) and it frequently finds checksum errors (and thanks to the design of ZFS, is able to completely recover without any risk of further corruption).
Isn't any filesystem that runs on top of a SSD or flash drive at risk from the SSD/flash drive controller corrupting the disk or making it inaccessible, when the volume is not cleanly unmounted?
I can remember losing a USB drive that way, even though it was using a journaling filesystem.
Potentially yes, realistically no. This is the old "if the storage controller is lying about flushing to disk, there's not a lot you can do about it" problem.
ZFS does have defensive mechanisms, like doing a read after write to try and be sure that what is written is actually committed to disk, but if the storage controller chooses to serve that out of cache then that could be a lie too. It's the old "trusting trust" predicament, there's no way for hardware higher in the stack to prove that lower levels aren't simply a tower of lies, only instead of viruses it's flush.
That said, in practice very little hardware is actually a giant tower of lies. Flash drives typically do not have enough of a controller to actually cache anything, thus no real capability to lie about writes/etc. SSDs do, but they also generally obey the expected behavior around flushing actually flushing and not just lying about it.
RAID controllers and similar are the danger zone, because they may have cache and may lie to the CPU about it on the assumption that it actually will eventually be flushed, and that's the dangerous thing.
I think the one you are replying to understands this, but the point is that "the last few writes" may be executed in different order by the underlying hardware and this might confuse the filesystem.
That's great if you only use these drives on your big computers. Sadly if you do want a "universally" portable drive – usable on an Android phone and an iPad and whatnot – you're stuck with horrible basic filesystems like exfat :(
Are you sure it's a USB problem and not something else? Did you reduce all possible variables - ie how scientific were you in determining USB was the problem and not something else?
I think I tested quite thoroughly on very recent Linux versions with Rsync. Of course it is difficult to pinpoint the real culprit with certainty, but USB has no error correction, and that makes it the prime suspect here.
I'm old enough to remember parity bits and FEC, so I know that the number of bits per data byte might be 8, but the number of bits required to transmit a single byte might be more than 8, and the number of bytes needed to transmit a frame of data might also be higher than the number of bytes of payload within the frame.
I know next to nothing about modern serial protocols, but nevertheless as a rule of thumb, and absent any other information, I tend to use 10 bits per byte when converting bits per second to bytes per second.
Recent standard revision (USB 3.1 Gen 2, as well as USB 3.2 Gen 2x2) switch to a more efficient 128b/132b encoding (https://en.wikipedia.org/wiki/64b/66b_encoding), with ~3% encoding overhead rather than the historical 25.
A byte has been synonymous with an octet for as long as I can remember. I found a Wikipedia article [1] that describes the transition of a byte meaning the number of bits to encode character of text to an octet.
A byte is also referred to the smallest addressable unit. Which in most cases is 8 bits but many (I guess mostly niche architectures) have much larger characters/bytes, and is still very much relevant.
Why? Assuming a byte being 10 bits instead of 8 bits would actually make the drive capacity lower in terms of bytes (of course not physically but for marketing), which a vendor probably doesn't want.
Yeah, that's the kibibytes vs kilobytes / gibibytes vs gigabytes diffence...
Which adds up to ~7% difference in capacity when talking about GiB/GB, and even 12.5% when talking about PiB/PB
Wow 90% overheads?