
15TB HDDs: Western Digital Unveils the Ultrastar DC HC620 - Breadmaker
https://www.anandtech.com/show/13523/western-digital-15tb-hdd-ultrastar-dc-hc620
======
chrisper
It's an SMR drive. So better for archiving than regular IO

~~~
Hei1Fuya
Copy on write filesystems can probably be optimized for SMR by using TRIM
commands to punch holes and rewrite the content sequentially in a new zone.
Afaik both zfs and btrfs have plans to do this.

That way they can be useful for more than archival.

~~~
zzzcpan
You probably mean log structured, not copy on write. CoW doesn't help make
writes sequential, unlike log structured filesystems.

~~~
aidenn0
log-structured is a special-case of CoW; specifically it's CoW where the
allocation strategy is sequential blocks.

------
post_break
Will we hit a point where the size of the drive is simply too big to get the
data off in any decent amount of time?

~~~
stephengillie
This is one reason why RAID 0+1 is a best practice, and RAID 5 & 6 are no
longer recommended. It takes too long to rebuild the array, leading to a
multi-failed disk situation.

~~~
chrisper
Raid 01 has its own risk. If the wrong two disks fail your entire array is
toast

~~~
throwaway2048
as opposed to raid 5 where if any two disks fail your array is toast, raid 6
increases this to 3.

However both raid 5 and 6 have 2 huge problems:

Data inflight at write time (power/hardware failures are more likely to
corrupt the array, especially silently, which is the worst outcome).

Parity calculations require you to spin up the whole raid5/6 array during a
rebuild, massively increasing the chance of a multi drive failure and a lost
array. If one close-to-EOL drive dies, putting its sister drives through what
is essentially an all day full tilt stress test is a terrible, terrible idea,
and this idea keeps getting worse (takes longer) as drive sizes grow.

raid 0+1 sidesteps these issues mostly at a modest increase in drive count,
its a no brainier for most setups.

~~~
StillBored
_Data inflight at write time (power /hardware failures are more likely to
corrupt the array, especially silently, which is the worst outcome)._

How is that? RAID doesn't affect data persistence behavior in any meaningful
way. FUA/SyncCache/etc are supported by RAID controllers same as the
underlying disks in writeback enviroments, parity updates included. Put
another way, if you FUA or flush the writeback cache, those operations won't
complete in a properly implemented RAID environment until the data is
persisted somewhere, even if that means passing FUA down to the underlying
storage. Granted there are a number of ways to mess this up, RMW cycles in a
controller that doesn't have some kind of persistent memory and flush on power
restore. Anyway, none of this is any worse than what happens in any other WB
cached storage technology.

Finally, all this fearmongering about loss on rebuild is also something that
should be more fully explored in the context of the fact that decent RAID
systems run background scrub operations on a regular basis. Those operations
by themselves are going to "stress test" the array on a regular basis when its
consistent and not degraded. I've actually got a fair amount of experience in
this area, and I'm here to tell you that if you think this is a risk consider
what happens to non-raided unscrubbed drives that have a lot of data silently
bitrotting on the platters. That latter effect is nearly always the problem in
RAID environments when someone starts a rebuild on drives/sectors that have
been unread for extended periods of time. But, in the case of RAID, a properly
implemented system won't fail a drive for a single read failure during a
rebuild, instead reconstructing from the other drives and leaving the drive
online long enough to complete the rebuild and then taking it offline.

Basically raid 1 setups don't actually fix any of these problems, except
through the use of massive additional parity disks overhead. Overhead that can
also be applied to other RAID algorithsm to much better effect. AKA a mirrored
RAID 6 provides far more protection than a mirrored raid 0. Similar levels can
be had with 6+6 in environments where that is possible, with trivial capacity
overhead.

~~~
throwaway2048
Raid 5/6 require parity calculations before data can be written to disk. This
is a significant amount of data, especially at high writing speeds. That is
what causes the inflight data problem.

Battery and flash backup on controllers dosen't fix the problem of hardware
failure (which is significant, especially on big hot controllers.

~~~
StillBored
Again, decent controllers have ECC protection and the like, and frequently are
available in HA configurations if your worry is controller failure (along with
redundant/dual data paths to the media via SAS/NVMe/etc). Plus, there are a
long list of technologies that can be enabled at the HBA layer and pushed all
the way to the media (T10 DIF/DIX comes to mind).

But much of this micro level redundancy is overkill as frequently one uses
some kind of application level HA/redundancy as well. So, loss of a RAID5/6
disk in a single machine is the functional equivalent of loss of a any
combination of RAID 0/1 in the same machine. You still need the higher level
redundancy as well as a backup plan.

We could start breaking the discussion up into fabric attached vs direct
attach RAID vs Software, but I think its sufficient to say, that RAID5/6
doesn't _increase_ the failure surface in any meaningful way when your not
using fly-by-night RAID.

Edit: Maybe what your trying to say is that cache flush/FUA operations for a
give piece of data don't cover the parity calculation and buffers? That is
false, a controller should not be responding to FUA/etc until the entire
(including the parity) block has been persisted. So if the controller dies
during the operation the host OS is fully aware that the operation didn't
complete. The given block is of course left in some unknown state in this
case, but that is true of any write operation that fails like this, regardless
of WT/WB/RAID/etc.

------
quasarj
So does anyone know of a way to take a set of files and write them to an HDD,
in NTFS or exFAT format, in a single sequential write? Essentially building
the FS on-the-fly (because we're talking about datsets that are much too large
to fit into memory)?

~~~
Dylan16807
So basically a zip file?

I will note that you could easily build the MFT before you start transferring
data. That's really your active 'dataset' here, and it's not very big.

~~~
quasarj
Building the MFT first is pretty much what I want to do. But I'm not aware of
any utilities that can handle it, nor where to start with writing it myself...

I have a project where I routinely need to copy large amounts of data (3 to 8
TB) to a hard drive. Problem is, my files are all 512kb. So this is much
slower than it could be...

If I write it as a single tar file I get excellent throughput, but the users
who need to be able to work with the drive are unable to handle a tar file.
They need to be able to plug the drive into a Windows computer and have it
"just work".. which presents some problems.

~~~
AgentME
Maybe you could use the same type of filesystem that file CD/DVDs use (UDF?).
They're written sequentially and are commonly supported.

------
bitL
I was about to buy 3x 12TB Toshiba for Deep Learning datasets; now I need to
reconsider... Does anyone know what are the current reliability stats for
>10TB drives? My old 6x 4TB HGST in NAS are running without a single problem
for the past 3 years...

~~~
worldexplorer
How do people use such large HDDs when internet download speeds are still low
as compared to downloading directly on cloud services (like aws)?

~~~
peterburkimsher
Internet speeds are slow in some places, such as France (specifically the Pays
de Gex near Geneva, where my parents live). My dad uses iCloud, but he drives
to CERN to upload (he just retired).

I have 18 TB: 5 TB Seagate (x2), 4 TB WesternDigital, 2 TB WesternDigital, and
2 TB internal.

Backups take the most space - I fix laptops for friends from church, and they
don't back up but still want their files to be safe. I had to shuffle some
files around to free up 650 GB for a recent repair, mostly photos & videos.

Virtual machines use a lot of space too. I made VMWare Fusion images of every
Mac OS version 10.5-10.13, Windows 95, 98, 2000, XP, 7, and 10, in several
languages (
[https://peterburk.github.com/i2018n](https://peterburk.github.com/i2018n) ),
and some Linux distros.

Another 1 TB is a dataset of Chinese characters from a machine learning
project of mine ( [https://blog.usejournal.com/making-of-a-chinese-
characters-d...](https://blog.usejournal.com/making-of-a-chinese-characters-
dataset-92d4065cc7cc) ).

Music, mostly from repaired iPods back in high school, accounts for a lot as
well. There's some movies too, though I missed a chance to get 2 TB from a
friend because I didn't have enough space at the time. If I upload those, even
those that I legally ripped from CDs & DVDs, I'm worried that it'll trigger
content filters.

For these, local disks are more useful than cloud services in my opinion.

~~~
fencepost
I'm going to highly recommend looking at drives with WizTree, which does a
very fast display of what's taking up space based on parsing the MFT rather
than scanning the entire drive.

You may find that there's a massive amount of data where you wouldn't expect
it, such as in the Windows Temp directory - if so and it's a bunch of files
named "cab_something", you can kill all of those and prevent recurrence with a
little housekeeping.

Details: [https://www.computerworld.com/article/3112358/microsoft-
wind...](https://www.computerworld.com/article/3112358/microsoft-
windows/windows-7-log-file-compression-bug-can-fill-up-your-hard-drive.html)
(update log files in windows\logs\cbs get auto-compressed, but compression
breaks and leaves big temp files if the file to compress >2GB)

------
femto
The 15TB drive packs in 1108 Gbit/inch2. That is, each bit is a square of side
8.5nm. This is small but the transistors in flash are smaller [1]. As mind
blowing as the numbers (for both technologies) are in the referenced article,
that article is now 2.5 years old. Is anyone aware of more recent numbers?

[1] [https://www.computerworld.com/article/3030642/data-
storage/f...](https://www.computerworld.com/article/3030642/data-
storage/flash-memorys-density-surpasses-hard-drives-for-first-time.html)

~~~
Lramseyer
From my experience in the HDD industry I remember that a magnetic bit of data
is about 13-15nm long and about 40-60nm wide (narrower tracks for SMR.) The
length of a bit is constrained by the grain size of the magnetic media.
However, the width of a bit (prior to SMR) is actually constrained by the size
of the write head. I don't remember why, but I think it has to do with the
fact that the write current is like 40 mA, and the magnetic flux density on
the write element is like 1.5T (no that's not a typo)

I'm not an expert on transistor pitches, but here's a chart from Wikipedia for
the 10nm -
[https://en.wikipedia.org/wiki/10_nanometer](https://en.wikipedia.org/wiki/10_nanometer)
It's kind of impressive for HDDs considering that it's a 2 inch long
mechanical arm that is able to move with that level of precision.

------
Latteland
I am going for the 15tb instead of that 14 so I have extra space for backups.
Says no body. We are clearly close to the end of spinning rust, absent some
new breakthrough.

~~~
tallanvor
HAMR and then HDMR are expected to allow data densities to increase by 5 to 10
times what is currently achievable. HAMR will probably start showing up in a
year or two.

Spinning drives are definitely not going away anytime soon unless there is a
much more significant drop in the cost of SSDs.

~~~
londons_explore
_Investment_ in new spinning drive technologies is going away though. Nobody
wants to spend R&D money on coming up with patents and ideas which will be
worthless in 5 years when SSD's overtake.

Science investment requires a new technology to have a prospect of a return
for most of the ~20 year patent lifespan for it to look like a good
investment, and spinning bits of metal aren't that right now.

~~~
blihp
In the same way that hard drives didn't kill off tape, SSD won't kill off hard
drives. The price differential is too great for many applications and they
have different operational strengths and weaknesses.

~~~
johngalt
Tapes have a use case that hard drives do not. Tapes are the lowest cost/GB
stored and are more shelf stable than hard drives.

SSDs are higher performance than HDDs and have none of the packaging
constraints. Flash storage is going to be put into everything and the
economies of scale look quite good.

Storage is scaling but the r/w speeds of hdds aren't keeping up. Following the
trend line and we see huge hdds that are functionally useless due to how long
it takes to do disk operations.

HDDs only exist above tapes because of their performance. And only exist below
SSDs due to cost. Tapes are the floor and SSDs are the quickly lowering
ceiling. HDDs are likely to be crushed between.

~~~
StillBored
You might be right, but keep in mind that flash has been scaling due to the
shrinking semiconductor feature sizes (and additional layers/etc). So a large
part of flash's core R&D & production costs are being spread over all the
logic being produced. That has been hitting a wall, so while the
capacity/price curves for flash look nice, they likely won't continue, which
leaves open the possibility that if rust actually gets a 4-5x boost in the
near future the current market trends will continue. SSDs for perf/power/size
and mechanical harddrives for bulk nearline storage, leaving tape where its
been for the past 30 years, as an archival technology.

~~~
wtallis
Horizontal feature sizes for flash memory stopped shrinking years ago. The
continued improvements in density and production cost have been the result of
R&D that is very specifically focused on 3D NAND flash memory and has little
in common with R&D for logic circuit fabrication.

That said, on the horizon of multiple years, I agree that the future
scalability of NAND flash doesn't look quite as promising as HAMR/MAMR for
hard drives. How that translates into actual product demand and adoption will
probably depend on the relatively unexplored question of how much performance
per TB our applications actually need. 40+ TB hard drives might not be fast
enough to actually serve as nearline storage for that volume of data without
eg. multi-actuator technology that essentially gives you more than one hard
drive sharing a common spindle motor. Meanwhile, there's no question that QLC
NAND flash definitely has adequate read latency and throughput.

~~~
londons_explore
Multi-actuator tech sounds interesting, and I wouldn't be surprised to see
drives with 5, 10, 50 or 100 read heads per platter at some point.

With 100 read heads per platter, typical seek time is cut by a factor of 100.
That won't let them overtake SSD's, but at least allow them to close the gap.

~~~
wtallis
So far, nobody has announced plans to manufacture hard drives with two read
heads per platter, so speculating about 100 heads per platter seems rather
unrealistic. The multi-actuator technology that is actually being developed by
Seagate still has only one read head per platter, but out of the eight or so
platters in a drive, the read heads for four of them will be controlled by one
actuator and the read heads for the other four platters will be controlled by
the second actuator.

Going all the way to 100 read heads per platter would be insanely expensive
and would massively increase drive failure rates, while still leaving them
about four times slower for random reads than the slowest $35 SSD on the
market. This will never turn into a viable product.

