
Where's my petabyte disk drive? - bit-player
http://bit-player.org/2016/wheres-my-petabyte-disk-drive
======
rgbrenner
The growth in hard drive space is called Kryder's Law [0] (like Moore's law).
There's a paper from 2012 on the cost of long-term storage [1].. and here's a
quote from the researchers blog[2]:

 _Here is a graph I got from Dave Anderson_ [director of strategic planning at
Seagate] _years ago. It shows that what looks like a smooth Kryder 's Law
curve is actually the superposition of a series of S-curves, one for each
successive technology generation. Naturally, because the easy transitions get
done first, the cost of each successive transition increases, perhaps even
exponentially. Since margins are constrained and so, these days, are volumes,
to generate a return on the investment in each transition requires that the
technology be kept in the market longer. The longer interval between
transitions translates to a lower Kryder rate._

[http://4.bp.blogspot.com/-bkuDDrBpcZE/TpMsLTEspsI/AAAAAAAAA9...](http://4.bp.blogspot.com/-bkuDDrBpcZE/TpMsLTEspsI/AAAAAAAAA94/a-sL-
tSO-OQ/s1600/DaveAnderson.png)

0\.
[https://en.wikipedia.org/wiki/Mark_Kryder](https://en.wikipedia.org/wiki/Mark_Kryder)

1\. [http://www.lockss.org/locksswp/wp-
content/uploads/2012/09/un...](http://www.lockss.org/locksswp/wp-
content/uploads/2012/09/unesco2012.pdf)

2\. [http://blog.dshr.org/2012/10/storage-will-be-lot-less-
free-t...](http://blog.dshr.org/2012/10/storage-will-be-lot-less-free-than-
it.html)

~~~
jsprogrammer
So, the rate if increase (in density) is slowed by lack of demand for storage?

~~~
Razengan
> lack of demand for storage

and in turn, demand for storage is slowed by lack of _need_ for storage.

The real question should be: Where's my 4K video? My 10-bit-channel images? My
lossless audio?

Yes they are here but not nearly as common as content delivered in ancient
formats from over a decade ago. For example go on a wallpapers subreddit, the
vast majority is still in 8-bit 1920x1080 JPEG. Video is still mostly 1080.
The leading music services still deliver in 2-channel lossy formats. And this
is 2016.

Most of this I suppose is because of the general slowness of the internet and
the usage caps in many areas of the world.

So maybe improve internet service -> create more detailed content and let
people save it -> people will want to demand more storage to store all that
in.

~~~
douche
For a lot of things, that extra resolution is not really any benefit.
Particularly for content that wasn't ever recorded at that level of fidelity
anyway - I don't need 4k versions of Seinfeld re-runs, or SpongeBob
SquarePants for the kiddos. Most of that stuff doesn't even need to be 720p.

Besides, pirated video content was really the only thing that most normal
people could fill up their hard drives with (okay, maybe GoPro people or
people with huge Steam libraries, too), but the Netflix and YouTube and Amazon
Prime have taken a huge chunk out of that.

~~~
roywiggins
Also people who shoot raw photos. They're like 35MB each, an order of
magnitude bigger than jpegs. They can eat up gigabytes (though to be fair, not
terabytes) pretty fast. I have about 525GB of photos and videos that I've
taken over years. Something like 300GB is jpegs. If I'd exclusively shot raw
files I would need 3-4TB to store everything.

~~~
masterj
And 4TB these days only costs ~$120 [http://www.amazon.com/Seagate-
SATA-3-5-Inch-Desktop-ST4000DM...](http://www.amazon.com/Seagate-
SATA-3-5-Inch-Desktop-ST4000DM000/dp/B00B99JU4S) though it's still a chore to
manage and backup so many photos

~~~
Razengan
> it's still a chore to manage and backup so many photos

I can't wait for a photo manager with some kind of AI that goes through the
pictures I've taken myself and offers them based on "feel good vibes",
"eerie", "cozy" and so on, and learning what I like as it goes.

Apple please?

~~~
roywiggins
Google Photos has some AI tied into it- it tries to autotag photos into
categories like "selfies" and "concerts" and "Christmas" (to name a few at
random) and it isn't completely perfect, but not horrible.

Not so useful for me, since I don't care to upload my entire photo collection
to Google. Something like this that either works offline or can progressively
tag photos and insert those tags in a format Lightroom could read would be
awfully useful.

------
PaulHoule
Note in the early 1990s you could read hard drive content pretty easily with a
scanning probe microscope, in fact the bits were lozenge shaped and if you
looked close you might find the edges of bits that had been written before and
weren't perfectly aligned. (i.e. scientists had tools in their labs that were
better at reading a hard drive than the read head)

Back then it was possible you could smash the plates and somebody could
reassemble some of the data.

Then by 2005 or so the density of the data was high enough that the scanning
probe microscope wasn't much better than the read heads, and at that point
extreme methods of data extraction got much much harder.

~~~
dalke
"in the early 1990s you could read hard drive content pretty easily with a
scanning probe microscope"

I believe that is an urban legend.
[http://all.net/ForensicsPapers/2012-12-07-OverwrittenMagneti...](http://all.net/ForensicsPapers/2012-12-07-OverwrittenMagneticRecovery.pdf)
describes attempts to track down such cases:

> To date I have found no example of any instance in which digital data
> recorded on a hard disk drive and subsequently overwritten was recovered
> from such a drive since 1985, when about 15% of the overwritten data was
> claimed to have been recovered from an modified frequency modulation (MFM)
> disk drive.

It cites "Overwriting Hard Drive Data: The Great Wiping Controversy" at
[http://www.vidarholen.net/~vidar/overwriting_hard_drive_data...](http://www.vidarholen.net/~vidar/overwriting_hard_drive_data.pdf)
which gives a best case example of a pristine hard drive, written once and
then wiped once, and where you know the data is located before hand. Even then
nearly all of the data had disappeared. If the drive was not pristine, it was
not possible to recover the data. Quoting from it (emphasis mine):

> The purpose of this paper was a categorical settlement to the controversy
> surrounding the misconceptions involving the belief that data can be
> recovered following a wipe procedure. _This study has demonstrated that
> correctly wiped data cannot reasonably be retrieved even if it is of a small
> size or found only over small parts of the hard drive. Not even with the use
> of a MFM or other known methods._ The belief that a tool can be developed to
> retrieve gigabytes or terabytes of information from a wiped drive is in
> error.

> Although there is a good chance of recovery for any individual bit from a
> drive, the chances of recovery of any amount of data from a drive using an
> electron microscope are negligible. Even speculating on the possible
> recovery of an old drive, there is no likelihood that any data would be
> recoverable from the drive. The forensic recovery of data using electron
> microscopy is infeasible. _This was true both on old drives_ and has become
> more difficult over time. Further, there is a need for the data to have been
> written and then wiped on a raw unused drive for there to be any hope of any
> level of recovery even at the bit level, which does not reflect real
> situations. It is unlikely that a recovered drive will have not been used
> for a period of time and the interaction of defragmentation, file copies and
> general use that overwrites data areas negates any chance of data recovery.
> _The fallacy that data can be forensically recovered using an electron
> microscope or related means needs to be put to rest._

~~~
tamana
Parent said reading a drive:s current data is possible. You said that reading
erased data is impossible.

~~~
jacquesm
You're wrong.

> if you looked close you might find the edges of bits that had been written
> before and weren't perfectly aligned.

GGP spoke about both recovering data that had been overwritten and re-
assembling data from destroyed platters without the original drive mechanism.

~~~
dalke
Thanks. Yes, I should have included the "that had been written before" in my
quote.

I didn't look too into the question of how to recover the contents of a hard
disk with microscopy because I figured it would be possible, but expensive.
Looking now, I quickly found a MS thesis at
[http://escholarship.org/uc/item/26g4p84b](http://escholarship.org/uc/item/26g4p84b)
which recovered data from a disk using MFM. While the performance was poor,
the author attributes that to the experimental setup.

Ahh, and [http://www.dataclinic.it/magnetic-force-
microscopy.htm](http://www.dataclinic.it/magnetic-force-microscopy.htm)
appears to provide a commercial service to extra data from a hard disk using
magnetic force microscopy.

------
evmar
Massive hard drives only useful for archival purposes, under this argument: if
your hard drive is being used for live queries, you want to access all the
data on it. Even if you have have all the pieces in place to stream 1gig per
sec off of the drive, a 1pb drive would still take 1 million seconds = ~11.5
days to read.

So in practice in production it's more useful to have smaller hard drives in
more places to work on the data in parallel. And in the truly archival cases
there are other concerns (like redundancy) that mean there isn't as much
demand for a single massive drive.

~~~
digi_owl
I seem to recall reading some worries that current sizes are already pushing
the limits of RAID setups. This in that it takes so much time to rebuild a
drive given current interfaces that you risk one of the others in the array to
fail during the process, thus making recovery impossible.

~~~
Viper007Bond
My consumer storage RAIDs already take 1-2 days to rebuild. I can't imagine
larger enterprise ones!

~~~
BrainInAJar
your consumer storage RAID is a lot dumber than enterprise ones, which can
resilver more intelligently

------
drostie
We appear to be slashdotting the server; here's some cached copies:

Just text (loads instantly):

[http://webcache.googleusercontent.com/search?q=cache:Jr34hZX...](http://webcache.googleusercontent.com/search?q=cache:Jr34hZXtEYgJ:bit-
player.org/2016/wheres-my-petabyte-disk-
drive&num=1&hl=en&gl=us&strip=1&vwsrc=0)

Images etc. loaded from the site (seems to work, albeit slowly):

[http://webcache.googleusercontent.com/search?q=cache:Jr34hZX...](http://webcache.googleusercontent.com/search?q=cache:Jr34hZXtEYgJ:bit-
player.org/2016/wheres-my-petabyte-disk-
drive&num=1&hl=en&gl=us&strip=0&vwsrc=0)

~~~
lucb1e
> _< meta name="generator" content="WordPress 3.4.2" />_

I found the problem.

~~~
icelancer
How people use Wordpress without a caching plugin today... sigh.

~~~
ChrisDutrow
What caching plugin do you recommend?

~~~
icelancer
W3 Total Cache generally works fine. I use it with memcached and PHP's new
built-in caching module (replacement for APC) and get my site blitzed from
time to time due to major mentions in the media with few problems. SSD on
Digital Ocean probably helps with disk caching too.

I use Apache but falcolas' point of nginx (often in combination with Varnish
and potentially hhvm; I've used this combo before with great success) is worth
considering as well.

No replacement for good configuration of your database (try MySQLtuner.pl if
you use MySQL/MariaDB) of course.

------
archiebunker
Excellent post. Great information. I have a question about SSDs, though.
Google has published their information about hard drive and SSD survival in
their data centers. It can be viewed here:
[http://www.datacenterdynamics.com/servers-storage/googles-
ss...](http://www.datacenterdynamics.com/servers-storage/googles-ssd-
experience-contradicts-flash-lab-results/95789.fullarticle) So my question is
how we mere mortals can deal with all the maintenance. It may well be that
spinning hard drives are best for us.

~~~
brianwawok
You assume HDs will die. Keep 3 copies of all data, at least 1 far away, and
diff regularly. If a HD lasts 2 years vs 10 in average, doesn't really change
the best practice to keep your data.

~~~
jfoutz
Just to elaborate a bit, it's pretty easy to keep three copies of stuff you
care about. First, you have the working copy, on the machine itself. A big
backup disk is pretty straightforward. I use time machine, which isn't super
reliable, but it's very very easy.

Now, when it comes down to it, do you really need to backup the OS? or your
installed software? if the machine and the backups fail, you're going to be
reinstalling anyway (probably) So, for the third copy i rely on 3rd parties.
Different people have different needs, you might want to do something fancy in
house.

It pretty much boils down to finding a service for your stuff. I have a couple
of private github repos. Photos on iCloud and whatever Alphabet is calling
Picasa these days. 20 gigs of music to Amazon or Alphabet (or both).
Administrative stuff, like taxes, i just email to myself. It's probably
smarter to keep that in dropbox or something along those lines.

The key point is, there are the things you make or capture that are
irreplaceable, save those lots of places. There's a bunch of other crap on
your computer to make it be useful. That stuff is trivial to reinstall. Well,
ok, it might cost you a day or two to redownload and reconfigure emacs just so
- but with a little planning you can put that config in git, so it's easy to
restore or set up on a new machine.

It's almost better to think in terms of, if i had to upgrade tomorrow, what
would i need to copy over? that's the stuff to be really fussy about.

~~~
simcop2387
> Now, when it comes down to it, do you really need to backup the OS? or your
> installed software? if the machine and the backups fail, you're going to be
> reinstalling anyway (probably)

I'd counter this with what I do with my laptop. The OS is considerably smaller
than the data I actually care about (<20gb), it takes almost no time to backup
and so it leaves me with a very quick ability to restore the system to a known
good state in the event of some kind of failure. I don't do constant backups
of it, but maybe once a month i'll update the backup I have of the OS.

~~~
jfoutz
Yeah, I was trying to point out you don't really need 3 copies of everything,
and really 1 is enough for some stuff that's easy to replace. But the stuff
that matters, you should have lots of copies of that. 2 backup disks is
another way to go, just swap them, say, weekly. Photos from a once in a
lifetime trip? make a bunch of copies, local and remote.

------
edward
You can buy a 8TB HDD for $219.99, that's $27.50 per TB.

I made a page that shows hard drives and SSDs sorted by price per TB:
[https://edwardbetts.com/price_per_tb/](https://edwardbetts.com/price_per_tb/)

~~~
chris11
PCPartPicker is also usually a really good source for comparing part prices.
[https://pcpartpicker.com/parts/internal-hard-
drive/#sort=a7&...](https://pcpartpicker.com/parts/internal-hard-
drive/#sort=a7&page=1)

------
spullara
He is wrong about 6TB being the biggest hard drive in the market. Now that
SSDs have become the standard they are taking up the curve. Samsung announced
a 16TB SSD last year. I don't expect people to devote as much effort to making
spinning disks bigger.

[http://arstechnica.com/gadgets/2015/08/samsung-
unveils-2-5-i...](http://arstechnica.com/gadgets/2015/08/samsung-
unveils-2-5-inch-16tb-ssd-the-worlds-largest-hard-drive/)

~~~
pilsetnieks
He mentioned that:

> As the pace of magnetic disk development slackens, an alternative storage
> medium is coming on strong. Flash memory, a semiconductor technology, has
> recently surpassed magnetic disk in areal density; Micron Technologies
> reports a laboratory demonstration of 2.7 terabits per square inch. And
> Samsung has announced a flash-based solid-state drive (SSD) with 15
> terabytes of capacity, larger than any mechanical disk drive now on the
> market. SSDs are still much more expensive than mechanical disks—by a factor
> of 5 or 10—but they offer higher speed and lower power consumption. They
> also offer the virtue of total silence, which I find truly golden.

------
abecedarius
> _Is this notion of merging memory and storage an attractive prospect or a
> nightmare? I’m not sure. There are some huge potential problems. For safety
> and sanity we generally want to limit which programs can alter which
> documents. Those rules are enforced by the file system, and they would have
> to be re-engineered to work in the memory-mapped environment._

This was done back in the 80s in
[http://www.cis.upenn.edu/~KeyKOS/](http://www.cis.upenn.edu/~KeyKOS/) . A
favorite demo reportedly was to pull the plug on a running computer then start
up again. They took the need to redesign security as an opportunity to make it
better.

~~~
capitalsigma
I'm not sure his idea about "merging memory and storage" really makes sense.
He says that he wants load instructions to be able to hit the disk in order to
avoid "calls to input/output routines in the operating system." But you can't
avoid the input/output routines --- he's effectively saying that we should
hardcode our filesystems into a single machine instruction and let the
processor figure it out. If anything, we're moving _farther_ from this model,
since VMs give us virtual address spaces inside virtual address spaces.

I don't think this is just a security issue; it really breaks all of the
assumptions that we like to make in modern programming languages.

~~~
abecedarius
_saying that we should hardcode our filesystems into a single machine
instruction and let the processor figure it out_

I think he was rather saying that the _OS_ could do it: persistent virtual
memory as the primary abstraction. In Unix, files and processes are different
kinds of things; in KeyKOS there were only processes; RAM was effectively a
cache. As Unix directories have links to files, KeyKOS processes could be
given capabilities to invoke other processes (passing capabilities and data as
arguments). The different security model makes this analogy misleading, but
you can see how you could emulate a filesystem.

What assumptions do you mean?

------
jimothyhalpert7
At least you don't have to worry about erasing your childhood to store 80GB of
data, when you apply for that courier position in 4 years. [1][2]

[1] Johnny Mnemonic - Official Trailer -
[https://www.youtube.com/watch?v=Uwl5MBzTCRQ](https://www.youtube.com/watch?v=Uwl5MBzTCRQ)

[2] Also, I guess Toshiba is in a lawsuit with Apple over their new VR headset
- the EyePhone -
[https://www.youtube.com/watch?v=vXSqN7qXwpU](https://www.youtube.com/watch?v=vXSqN7qXwpU)

------
animex
Could the reality be that storage needs for the average user have plateau'd
for a combination of reasons such as network/internet speed limitations, rise
of cloud computing, rise of streamed movie subscriptions (netflix, etc) thus
muting the financial benefit to develop new technology to drive exponential
drive storage technology forward. I think once we have a reason to grow our
storage again, we could see storage technology pick up again.

------
tempestn
One way you could fill up a hard drive orders of magnitude than the ones we
have now is with fully immersive recorded experiences, of which the recent
'holoportation' demo from Microsoft gave an early example. Look at how much
space regular ultra-HD video consumes, then imagine further increasing the
resolution and recording from not just one viewpoint, but the entire 3D scene.

This is similar to how a cellphone camera of today can shoot video in a single
minute that would completely fill hard drives from the 90s. (And even a single
high-res image from today's digital cameras is larger than the install size of
Windows 3.1.) Back then it would have been difficult to imagine these uses for
storage.

------
Kenji
The reason why spinning disks still stick around is not just because of the
price. Often, it is the case that you do not need 200-500MBit/s and ~100+ are
enough. I have my OS on a (small) SSD and my spinning disks filled with my
documents, music, movies. All of this does not need high speed, so it would be
wasted money to buy an SSD for it. I'd rather buy twice the storage and do
RAID1 (which I did).

~~~
13of40
It's a mystery to me that hybrid SSD+HD drives aren't more ubiquitous. I can
guess what data is going to be read off the drive more frequently, but the
computer can collect statistics and make a way more accurate prediction than I
can.

~~~
mrob
I don't think statistics will make better predictions than I can. For example,
consider a moderately large file (eg. 100MB), which is only ever read
occasionally, by sequential low speed streaming. It seems reasonable to place
it on the HD, but it's music, and I want it on the SSD so I can have the HD
powered down for lower background noise when I'm listening to it. And I could
have another almost identical file which is a podcast instead, and that should
go on the HD because I don't care about minimum background noise when
listening to podcasts, so even looking at the file type won't help. The
correct place for a file depends on what you intend to do with it, not any
measurable property.

~~~
13of40
I'd assert that it could make a better prediction for the majority of the
files stored on your drive. For example, even if it was technically feasable
to do with separate drives, do you know which files in the global assembly
cache should go on SSD versus HD? Which pieces of your registry hive files?
Which bits of your browser cache?

~~~
mrob
I don't use Windows so I don't have GAC or registry files. Browser cache
always goes on the SSD because it's highly latency sensitive.

~~~
Kenji
You say browser cache always goes on the SSD (I do that too, whenever there's
no configuration to put it into RAM, which I prefer). Aren't you concerned
about wear? If you stream high resolution video files from, say, youtube or
twitch, you write gigabyte after gigabyte into that SSD.

~~~
mrob
Modern SSDs will survive hundreds of TBs of writes, sometimes even PBs.
Techreport did a long endurance test with multiple drives:

[http://techreport.com/review/27909/the-ssd-endurance-
experim...](http://techreport.com/review/27909/the-ssd-endurance-experiment-
theyre-all-dead) "All of the drives surpassed their official endurance
specifications by writing hundreds of terabytes without issue."

I expect to upgrade for increased capacity long before I reach that.

------
DannyBee
"Is this notion of merging memory and storage an attractive prospect or a
nightmare? "

Well, it's coming either way, in the form of large scale NVRAM

------
mikhail-g-kan
I suppose the need of volume of consumer disks will depend on future networks
qualities. With low latency and high speed users won't need big local disks,
services like cloud storage/streaming will be enough. We will always find ways
to use higher capacity disks, i.e. high-def VR content, or biochemistry data
of our bodies for personal healthcare/fitness

------
ksec
I often think people are underestimating the demand for storage. Let pick an
example, If Apple were to offer free 30GB iCloud Space to every iOS customers.
650 Million customers equals to roughly 20EB of Data, that is 40 Dropbox! Are
you will need Multiple copies to safe guard it.

------
aab0
You could also ask what ended the kink. The timing looks like it matches up
with the Thai floods, which killed price decreases for years afterwards. The
experience curve means you need bulk to achieve improvements, and the floods
shattered a key part of the supply chain.

------
RubyPinch
the main speed boost back then was innovations landing to usable drives. We
are currently in a innovation phase with a whole new space to explore, once
the winners of that innovating gets picked, refining happens, cost savings are
found, etc, I think we'll probably see some return to the expectations

that being said, for the slashdotting issue, I can't see how bad the graph is

------
aaron695
The concept that a Petabyte is a lot of storage is very strange?

50 gig per hour for a cinema screen quality setup (Most houses in the next 20
years...) would be 20,000 hours of entertainment, meh I might want access to
that in a life.

Also remembering we might be heading towards an environment where we record
everything at all times.

Certainly currently I'm buying a hard disk every year as quality goes up and
it's easier than throwing stuff out.

Closer we get to Petabyte HD the better I say.

------
Paradigma11
Basically the Internet is my HD now, so i don't see the need to upgrade any
time soon.

------
lazyjones
It's in the "Cloud" ...

~~~
samstave
I thought it was in my quantum computer...

------
madflame991
> “Data that comes off a mechanical disk has a subtle warmth and presence that
> no solid-state drive can match,” the cogniscenti will tell us.

Probably not

------
wtbob
Why did he write '\\(2\frac{1}{2}\\) or \\(3\frac{1}{2}\\)' (which presumably
uses JavaScript to be rendered), when he could have just written '2½ or 3½'?
It'd have taken 8 bytes to store instead of 36 — and that's not even counting
the size of the JavaScript!

As long as folks keep on reinventing the wheel, only bigger, hard drives are
going to have to keep increasing in size.

~~~
msbarnett
I'm not sure that using LaTeX markup, a format that predates HTML, counts as
"reinventing the wheel".

~~~
wtbob
Oh, I love LaTeX dearly — but it's not part of HTML, and in this instance
UTF-8 handles it perfectly well.

