
How would you store 500TB of stuff you don't need everyday? - Corrado
My company has vast amounts (&#62;500TB) of media (mostly pictures) that needs to be stored as cheaply as possible.  Currently, we're using spinning hard drives in a colo as well as S3 for backup, but that's expensive.  Can we trade speed for cheap?  In the past the cheapest thing around was tape or possibly CDROM/WORM drives however with the dropping prices of hard drives I'm not sure those solutions make sense anymore.<p>How would you store 500TB and growing, of stuff you don't need everyday?
======
wazoox
This is definitely right in my area of expertise :) It largely depends upon
how frequently you use the data and what is your preferred access time.

It's possible to build RAID arrays with 2 TB drives (the sweet post right now
in price/capacity), or 1 TB 2.5" drives. With a RAID controller that knows how
to spin down idle arrays, it can be pretty cheap, let's see :

-using 2 TB 3.5" drives in 24 disks arrays, you need 1 24 disks supermicro-based server (21 usable disks) \+ 3 45 drives supermicro JBODs gives you 240 TB of usable capacity.

Two similar systems will give you 480 TB in 8*4 =32U of rack. The setup will
draw about 8KW of power when running and weigh about 300 Kg.

If you'd rather go with 2.5" drives, 1 24 slots supermicro server chassis + 3
88 slots supermicro JBODs will give you 260 TB of usable capacity.

Two systems will give you 520 TB in 28U, draw about 3.5KW and weigh 200 Kg. So
I'd definitely go for the 2.5" version. Furthermore, the 2.5" drives stand
much better being put to rest quite often.

Please note that this volume is also largely in the realm of tape archive.
Current tape libraries are fast, and LTO-5 are 1.5TB uncompressed. It can
definitely be cheaper in the long run.

Caveat lector! this is a quick and dirty, back of the napkin calculation. I'm
on my way to install a 96 TB machine at some customer's premises, I've just
quickly thrown the numbers together before leaving.

~~~
illumin8
I would caution against this. If you need the data to truly be protected,
don't build it yourself. Here are some questions and issues that can come up:

* Using desktop drives, when 24 are mounted in a single chassis, the vibration from those drives will cause a higher rate of failure. Desktop drives were not designed for this purpose and are not made to handle the vibration and heat created by 23 other drives in the same chassis.

* When a single drive in your 24 drive chassis fails, how do you know which one it is? On an enterprise storage array, you have the ability to flash an LED so you know which drive to replace, or you have a direct mapping (shelf 2, bay 16) in the error message. If you built it yourself, how do you know which physical drive is /dev/sdzg?

* Purchasing a large number of drives from the same batch means they are very likely to start failing around the same date. Can you handle a double disk failure without losing data? How about a triple disk failure? Do you have hot spares capability?

* Are you scrubbing your data regularly to make sure those blocks on disk aren't getting silently corrupted over time? If you never read your data or scrub your entire disk regularly, the only time you'll find out if you've lost data is during a RAID rebuild, when you'll get read errors and have complete data loss (unable to read parity from another drive, double disk failure, complete loss of RAID).

* Are you able to use RAID 6 or additional parity disk technology to protect against double disk failures? What if I want 3 parity disks to prevent against a triple disk failure?

All of these considerations are things that will come up when you're designing
a storage system. These are things that enterprise storage vendors spend
months with engineers designing. Failure modes should be considered. In my
experience with hand built PC server storage arrays, the time you are most
likely to lose data is when you have a disk failure, and are rebuilding parity
from another disk. If you have many disks from the same batch, a rebuild can
stress the disks more than normal, and cause a second disk to fail. This is
why RAID 6 is becoming more and more popular.

I would caution you to consider that cheap hand built storage usually ends up
expensive in the long run, especially when you lose data that you thought was
protected.

~~~
wazoox
> _don't build it yourself._

Of course not. I've built several petabytes worth of storage systems in the
past 15 years, OTOH.

> _Using desktop drives, when 24 are mounted in a single chassis, the
> vibration from those drives will cause a higher rate of failure._

Absolutely. However this is not always directly applicable to all desktop
drives. In fact I'm going to reveal a secret : physically, desktop SATA drives
are exactly the same as professional grade 24/24h operation-guaranteed SATA
drives. When they fall off the assembly line, they're tested for reliability
and performance. Over a certain threshold, a drive is deemed professional and
receive the pro firmware; under the bar, it's a desktop drive.

What is the most important difference between the desktop vs pro firmware? The
desktop drive is supposed to be used alone. In case of an error, it will
retry, retry and retry again endlessly to ecc-correct the error. OTOH, the pro
drives "knows" it's part of a RAID array; at the slightest fault, it will
simply cry "I'm failed" and give up. So in case of a transient error, your IOs
won't be blocked.

There are some other differences depending upon the maker: for instance all
Hitachi GST drives carry an accelerometer, but the desktop drive firmware
don't know how to use it to compensate for low-frequency vibrations.

WD desktop drives have particular settings that make them extremely dangerous
to use in RAID arrays.

Note that the "pro/consumer" threshold can be moved to reach some volume
threshold regardless of the actual drive quality. This happened, I've been
bitten.

Anyway, prefer the pro drives even if there 50% more expensive. It will save
you some headaches.

> _When a single drive in your 24 drive chassis fails, how do you know which
> one it is?_

First, I suppose the maker (in my case, my team), carefully checked that all
drives are properly labeled (so that drive 1 is labeled 1, etc); second,
decent RAID controllers can flash a drive LED (Adaptec 5xx5 or 6xx5 series
come to mind).

> _Can you handle a double disk failure without losing data?_

With drives of 1 TB or more, RAID-5 is definitely out and has been so for
several years. RAID-6, RAID-6, RAID-6 is the way to go.

> _These are things that enterprise storage vendors spend months with
> engineers designing._

E-xact-ly. BTW I'm an enterprise storage vendor. However I'm really too far
from Louisville to be able to ship there and support it. I just mentioned what
is possible; don't forget that nowadays, when you're buying an SGI (or many
other big names) storage system you're buying Supermicro hardware.

~~~
originalgeek
> In fact I'm going to reveal a secret : physically, desktop SATA drives are
> exactly the same as professional grade 24/24h operation-guaranteed SATA
> drives. When they fall off the assembly line, they're tested for reliability
> and performance. Over a certain threshold, a drive is deemed professional
> and receive the pro firmware; under the bar, it's a desktop drive.

This may be true in some circumstances, however I have observed that
enterprise drives from some vendors have a larger heatsink volume than the
consumer drives, and also weigh a little more.

~~~
wazoox
Possibly; however Seagate Barracuda 7200.11 and ES2, and Constellation 1 and 2
TB; Hitachi Deskstar and Ultrastar 1, 2 TB, and Western Digital Caviar Green 2
TB desktop and pro are all physically exactly identical.

------
cmsj
The print media company I used to work for, which had an almost bottomless
collection of massive image files, used a large tape robot and had its office
across the street from an Iron Mountain storage depot. The Mac guys would work
with small images then the workflow tools (TWIST afair) would issue requests
for the huge versions, which would produce a daily request to Iron Mountain to
bring us the right tapes. The next day we'd feed those tapes into the robot
and the full resolution jobs would be completed, rendered, uploaded to
appropriate FTP servers and the tapes would be returned to storage in the next
daily Iron Mountain pickup.

Poor latency, but massive throughput and storage :)

------
chrismiller
I know you said you are currently using colocated servers but have you had a
look at [http://blog.backblaze.com/2009/09/01/petabytes-on-a-
budget-h...](http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-
to-build-cheap-cloud-storage/) ?

A few of these replicating between each other might be cheaper than the less
dense server + S3 combo you have at the moment.

[edit] The cost of the Backblaze units is ~$117 per TB so it would cost
~$59,000 to store 500TB or $118,000 for 2 redundant copies.

$118,000 would be equal to only a few months of hosting 500TB on S3.

~~~
pstack
Looks like we were doing the math at the same time (see my post elsewhere,
here). You'd be looking at 15 of these servers, but then I have no idea how
much the expense of maintaining them would be. That's also the cost of storing
their current data plus redundancy. It may be the cheap solution, today, but I
wonder if it would scale on the order of a seriously high end tape system (I
linked one in this thread that potentially stores 91 archived petabytes).

I couldn't find pricing information and since this isn't something I ever deal
with, I don't know what the industry standard would be. I would not be
surprised if the cost of setting up a very high capacity tape system would
come out as the better deal in the long run. Perhaps the BackBlaze solution
would be cheaper with half to a full petabyte, but over the long term, the
high capacity tape systems might beat the multi-petabyte pants off it.

Of course, that kind of depends on how quickly spinning drives expand in size
and drop in price, which could feasibly account for the difference in cost
between the two options over time. The important variable there is in whether
their data accumulation outpaces the drop in prices on spinning drives.

No matter what, cloud hosted solutions seems pretty much off the charts. Even
if the storage was reasonable, I'm guessing they have to transfer several
terabytes a month to the backup which would be pretty painful on a lot of
networks.

~~~
chrismiller
Something I just realised (not sure if you took this into account in your
calculations) is that the article we linked was published in 2009.

In all of their calculations they list 1.5tb Sata HDDs as costing $129. Those
same disks can now be purchased for between $50 and $60. So at 45 disks in
each of the servers that would be a saving of $3105 per server or ~$46,000 in
total savings.

------
pstack
This isn't my area of expertise, but when you factor in redundancy, you're
looking at a petabyte of data just for what you have now. You'll need it to
failover plus have at least one full backup at a separate location. Commodity
hardware almost seems prohibitively expensive. You'll have the hardware, 600+
2tb drives just to start, space, electricity, cooling, maintenance (with that
many drives, you'll probably be replacing them often).

If you went with the solution that BackBlaze rolled for themselves, you'd be
looking at 15 of their servers (for redundancy), without even adding any more
data in the future. Using the solution they came up with, that would run
almost $120,000. I have no idea what the monthly maintenance expenses would
be.

[http://blog.backblaze.com/2009/09/01/petabytes-on-a-
budget-h...](http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-
to-build-cheap-cloud-storage/)

You are undertaking a nearly herculean feat, here. I don't think any of the
professional data storage solutions out there will be any cheaper, either. In
fact, a petabyte of storage (especially if you need doubles of all hardware to
place at another location) would probably run nearer a million bucks or more.

It might even be worth, as someone else linked, considering BackBlaze
themselves. At least find out what they could do for you. They offer a
business plan at something like $50/computer for unlimited storage. Obviously
they're not going to store 0.5-1.0 petabytes of data for that price, but on
the scale that you need, you might be able to negotiate something that
surpasses what you could accomplish at any reasonable price -- and you have
the added benefit of any catastrophe being their problem; not yours.

If you truly only rarely need to access your data and speed of that access is
trivial to you, then as others mentioned, Tape could be completely feasible. I
have no idea how much money it would cost, but you're talking a TON of
cartridges and some sort of rotational system. If you expect the amount of
data you need to continue growing at a rapid rate, you might even end up
considering something like Spectra Logic offers. I have absolutely no clue how
much their solutions are, but they can store up to 91 petabytes of data which
I imagine would do you quite well for a long time to come:

[http://www.spectralogic.com/index.cfm?fuseaction=products.di...](http://www.spectralogic.com/index.cfm?fuseaction=products.displayContent&catID=1990&src=bab)

~~~
wazoox
SpectraLogic tape libraries are insanely great. IMO they're the best
available. Unfortunately they're quite expensive, sensibly more than Overland
equivalent. An Overland Neo 8000e should cost about 60000 bucks, to which you
must add 350 60$ LTO-5 tapes. However the power usage cost will be negligible,
while 500 TB of drives will burn more than its price in power and AC every
year.

------
xd
Have you looked into tape libraries i.e. The HP StorageWorks MSL4048.
Obviously you would be looking at the uncompressed figure (144TB per
unit)assuming your pictures are already compressed.

~~~
khafra
At $5k per unit (first froogle result) that's only $35k for a petabyte; less
than 1/3 the price of assembling your own backblaze-type storage. Not bad.

~~~
xd
I wish it was that cheap .. throw in the cost of the tapes and the price
doubles.

------
jodrellblank
How big is your company? If you have hundreds of employees with desktop
computers, you could put extra Tb disks in them and work out a redundant
distributed filesystem.

Still going to cost a lot for disks and a lot of do-it-yourself setup, but
piggy backs on existing network and power infrastructure, no need for
expensive datacenter kit or new machines/tape drives/libraries/etc. and you
could build into your existing desktop maintenance and renewal procedures.

Do you want a hacky build-it-yourself solution?

~~~
thwarted
I actually worked on a system to do this back in the late 90s code-named "Dark
Iron". Our boss, who conceived of it, thought he was some kind of old school
mainframe ("Big Iron") guy who talked a lot about using the unused capacity on
all the desktops at companies at night (the "Dark" part, like "dark fiber").

I wrote the Linux kernel block device module and the other guy on the project
wrote the part that ran on all the users desktops, mainly just a key-value
store indexed by block number (the users liked to kill the process, which made
it terribly unreliable). Back then, we used Linux purely for development,
under the theory that once we "proved" it worked with the rinkydink Linux open
source operating system, the company could then justify the purchase of the
HP-UX and AIX kernel development kits and we'd port it to machines that people
"actually" use.

We all know how that last part turned out.

But really, if you want to do something exactly like this (rather than the VMs
that davecampbell suggests) these days (that is, since about 2003), you'd be
best off using md software raid over Network Block Device. I believe there are
Network Block Device storage ends (is that the client or the server?) for
platforms other than Linux. Even a simplistic AOE service would do.

~~~
jodrellblank
Your first paragraph suggests you look down on the idea - do you?

How far did you get and what happened to it? I wasn't suggesting a VM based
idea, more of a daemon/windows service.

Having said that, there's no real need for it to be a filesystem per-se, it
could be any kind of store with a central interface (web based, for instance)
and then it wouldn't have to deal with parts of a filesystem vanishing
arbitrarily. As long as the data didn't need to be live all the time, that is.
It could even hook into Wake-On-LAN, and also benefit from any power savings
if machines are switched off overnight.

~~~
thwarted
_Your first paragraph suggests you look down on the idea - do you?_

It was grossly ahead of its time (remember, this was the late 90s). The
assumption was that the network was faster than the disk, which in our
implementation was never true because it required a reliable backing store, so
it was network latency + disk desktop disk latency. And we couldn't use a lot
of RAM as cache because desktops didn't have multi-gigabyte memory at the time
-- even servers didn't.

 _How far did you get and what happened to it?_

I assume it's on some backup tapes somewhere. Every so often I come across an
old CDROM I made during the era and get this idea that the source code is
buried on it somewhere. It was tied pretty closely to early 2.0 Linux kernels
and the block layer that existed at the time (I don't know how much would be
portable to modern kernels).

A certain company that sued Microsoft for a patent related to embedding
executable content in web pages, that we had a working relationship with at
the time, had claims on their web page that they owned the trademark for it,
but as far as I know, we were the only ones to produce working code.

Both of us who were working on it eventually left for greener pastures (and
that was like 12 years ago at this point).

 _I wasn't suggesting a VM based idea, more of a daemon/windows service._

Nah, davecampbell in this thread suggested the use of VMs. Using a VM might
make deployment easier. There wasn't anything like vmware player back when I
worked on this, and we had to go around to everyone's machine and install the
new versions after hours with every release.

 _there's no real need for it to be a filesystem per-se, it could be any kind
of store with a central interface (web based, for instance) and then it
wouldn't have to deal with parts of a filesystem vanishing arbitrarily._

True. And with removing that constraint/use-case, there are a number of things
today that already provide this: Amazon S3 is one. We implemented it as a
generic block device and put a filesystem on top of that, mainly for testing.
One of the "goals" was to eventually use the block device as an Oracle
tablespace file (once ported to HP-UX, since Oracle didn't exist on Linux at
the time, or had barely come out for Linux).

Like I said, if you want to do this today, _try_ it with something like md on
top of NBD. Although, see the section "Late news - intelligent mirroring" on
<http://enbd.sourceforge.net/> (which is itself old).

But today, I wouldn't even bother. I'd just bite the bullet and buy the disks
(or use some other system that is more directly under sysadmin control than
employee's desktops). I think there's way too much variability in using
desktop clients that were deployed for another use (desktop use) for this
purpose.

A friend of mine pointed me to this
[http://portal.acm.org/citation.cfm?id=844130&dl=ACM&...](http://portal.acm.org/citation.cfm?id=844130&dl=ACM&coll=DL)

------
portman
How much is it _growing by_ per day? That's more important than the total
volume of data.

The strategies for these situations are very different:

    
    
      - 500TB growing by 1GB per day
      - 500TB growing by 1TB per day
    
    

In the first case, it's mostly about _preservation_ , and optical media is a
very real possibility.

In the second case, unless you have a source of cheap labor near your data
center, optical media isn't probably an option.

------
swalberg
There's a company called Cleversafe.(com|org) that makes a product that uses
dispersed storage. So in one scenario, you split a block of data up into 8
chunks and store them on separate servers. Only 5 of those servers are needed
to reconstruct the data. It has better reliability than RAID and needs fewer
disks.

You can either access the data over http as files, or it can be a block store
accessible with iSCSI.

They have a commercial product (a colleague of mine is a reseller, I can
introduce you if you want), and also an open source version if you want to
build the servers yourself.

------
maukdaddy
Since everyone has chimed in with a tech solution, let me offer a business
solution:

Have you determined that you actually _need_ this 500TB of media? If you're
not using the data every day, do you really need it? A lot of businesses hoard
data without ever analyzing why or if they should keep it. Keeping data for
records retention laws is one thing, but keeping data just because you have it
is a poor business decision.

After running a thorough analysis you might determine that you only need to
keep 100TB or so.

~~~
Corrado
We definitely have orphaned data in the mix and are working to purge it.
However, most of the data has to be kept in case its needed in the future
(however unlikely that may be).

------
JoachimSchipper
This is probably _too_ cheap, but Blu-Ray media is about $2/25GB, or $80/TB
(and a new standard is coming). You'll want a lot of redundancy, so "just"
$40k in media won't get you there, but it is less than the $120k for hard disk
storage quoted by pstack (Blu-Ray discs don't even need power and cooling; on
the other hand, Backblaze Pods don't need interns to keep feeding them blank
media).

You should probably do some RAID-like error correction to keep the number of
extra copies you need reasonable. (E.g. two copies of every disc, every fifth
disc is just the XOR of the previous four; that'd give you two backups for
every disk at a total cost of $90k.)

~~~
aidenn0
Blu-ray media is around $1 per disc if you buy 25 or 50 packs.

------
bigiain
How long to you need to store it all for?

For the ~$50k per month you're spending on S3, it doesn't take too long to
justify your own big robot tape library. I'd be keeping my eyes open for
somebody decommissioning one (if you're in Sydney, Australia, I've heard
rumors about a certain bank who acquired another bank who're building a new
data center to combine the two banks - I suspect there'll be a lot of
decommissioned gear on the 2nd hand market shortly)

------
sidmitra
Just curious, did you do a cost analysis with Amazon S3, i'm guessing that
amount of media might cost a lot. I haven't run the numbers myself.

EDIT: At $0.095/GB, it would cost you over $48K per month!! I guess i didn't
internalize how expensive S3 was.

~~~
StavrosK
You probably didn't internalize how much data 500 TB was :)

~~~
sidmitra
True! But then again i have an external HDD filled of sitcoms. That's how i
think of storage space, as 10K seasons of House MD :-)

~~~
webjunkie
Whow, 10,000 SEASONS! That's sure a lot ;)

~~~
sidmitra
Hah, that figure is ofcourse not accurate!, was just a random number that
popped into my head.

------
kylecordes
As you have already discovered, storing that data on S3 is rather expensive.
S3 is priced in a way that makes sense for hot, important, accessible data;
not for idle permanent archives.

I attacked a problem like this a few years ago, and found that the best
solution at the time was to build servers with lots of big cheap hard drives
and as little else as possible; similar to the BackBlaze idea but less
impressive. I wrote it up here:

<http://kylecordes.com/2008/big-raid>

...but since hardware changes so fast, all of the details there are probably
obsolete.

Extra servers with backup copies of data can be much more economically
deployed in an office closet instead of a colo. There will be more downtime
(power and bandwidth reliability), but if it's an idle backup, that won't
matter.

Servers full of hard drives (20 TB worth or whatever) for long-term, un-
accessed archives can be turned off (no power or heat), but opinions vary as
to whether the on/off cycles for occasional access will reduce the total
service life.

------
kia
I think in the long run for such amounts of storage Tape Libraries would be
the cheapest solution (in terms of cost per GB and management)

<http://en.wikipedia.org/wiki/Tape_library>

------
patrickgzill
I recommend that whatever solution you go with, you ensure that you have ECC
at the filesystem level to make sure that bit rot doesn't end up costing you
data.

For instance, with ZFS, you can issue a "scrub" command which will trundle
through the whole filesystem and double-check the ECC codes of everything
stored on it, correcting single-bit errors it finds along the way. Running
scrub every month or 2 months would ensure data is not lost.

------
wladimir
AFAIK, currently, hard drives are by far the cheapest solution per GB.

Does it need to be online at all times? If not, why not use offline drives and
store them somewhere?

~~~
xd
I've dealt with hundreds of hard drives over the years. Using them as offline
storage is very risky as they do and will fail to spin up after being sat for
long periods of time.

~~~
wladimir
You want to take some redundancy into account. Even with that, it's still
cheap.

Tapes degrade and break down as well. Everything needs maintenance. If you
want really long term storage, there is no other option but to copy the data
to new mediums once in a while.

~~~
xd
True but, in the 13+ years I've been in the industry I've not once seen or
heard of a failed tape ..

~~~
joss82
Well, I've worked for 1 year at HP phone support, especially on tapes.

I assure you that tapes fail all the time. Their mechanical nature and the
strain on their delicate components make them susceptible to failure.

EDIT: Unless you're talking about high end like Ultrium. In that case, it's
solid. But then much slower, more expensive than a same size hard drive.

~~~
xd
Most of my experience has been with Ultrium TBH. But you can pick up Ultrium
1.5TB tapes for less than the cost of a equivalent sized hard drive, here in
the UK at least.

~~~
beagle3
That's because disks are expensive in the UK, not because tape is cheap.

You can get 2TB of disk storage for <$100 in the US when you are buying retail
(if you buy three hundreds like this guy needs, you'll probably be able to get
it down to $30 or $40 per TB).

No tape device required, comparable speed, disks are random access; A
backblaze-style setup in todays costs sets you back ~$60/TB for online, random
access, storage redundancy and other nice things - 500TB can be build for $30K
or so, with a few hundred dollars/month for electricity (and a few hundreds
more for colocation if you don't have the office space).

How cheap can you get a functioning system with reasonable (a few minutes)
retrieval time when you have 400 1.5TB tapes? (you need redundancy here too,
you know).

Since sometime in '98, tapes make no economical sense whatsoever for any kind
of storage, and hardly any sense at all, as far as I can tell.

------
EwanToo
The question surely isn't "As cheaply as possible", after all, you could just
delete it - that'd be cheap.

Tape is still the cheapest, reliable option, and your main options are
Quantum, IBM, and Oracle (Sun).

The Quantum Scalar i500 can store up to 600TB of data, and is relatively
cheap, or you could buy 4 or 5 smaller models and use tape management software
to track where the data is.

------
IChrisI
How much of that is duplicates? How much is lower quality versions of other
images, which could be recalculated if necessary? How much is higher quality
than needed, or stored in the wrong way (PNG vs JPEG for jpeg-y pictures)?
500TB is a lot, and perhaps you can reduce that number.

------
ZogStriP
Have you tried <http://www.backblaze.com/> ?

~~~
sidmitra
I'm guessing they would change their terms :-) I just calculated(above), that
it costs a small fortune to store that amount of data on S3.

~~~
moeffju
But Backblaze aren't storing data on S3, they built their own storage pods.
See first-level comments.

------
jacobwhatwhat
You could use cheaper storage like SuperMicro systems that are really dense
and then run Gluster across them. Gluster will handle replicating data so each
file is on two systems. Running RAID underneath will protect you in the event
of a disk failure and then you have Gluster protecting you from total server
failure. This would be significantly lower cost that EMC or NetApp and provide
you with the same level of data protection. There is nothing wrong with
"building it yourself" if you do it right.

------
cambridgejacob
I do projects like this all of the time using an archival file system from a
company called FileTek. FileTek is a software product that presents a tape
library as a NAS. You use a small amount of disk as a cache. It then stores
the data on tape with redundancy and all kinds of media management. I have
several clients using with multiple PBs of capacity. If you contact the
company ask for a guy named Diamond and tell him that Jacob sent you.
www.filetek.com

------
jodrellblank
In cases like this, is there any commercial hardware-accelerated compression
tech which would make a useful difference? Not just tape drive hardware,
something more clever.

I know the latest WinZIP can losslessly recompress JPGs by fudging with the
compression algorithm, for example.

Or what about deduplication? Is there any deduplication tech which works on
image visual similarity instead of file block matching?

------
Jetlag
Color seperate the pictures and project them onto black and white negative
film. Develop the film, put it in a vault. This might be overkill.

------
Andys
I'd use off the shelf consumer hardware. High density Supermicro chassis with
SAS uplinks back to a single server. 2+TB consumer low-power disks. Software
raid (eg. ZFS) to manage it all. And finally, host it a site office or low-
grade colocation facility.

------
raypace
Take it from someone who's been around the block, call EMC or NetApp. They are
the best of breed and your data will be safe.

If you choose a home grown solution, YOU will be responsible for the quirks of
the hardware, making it work with your OS and network and every failure.

~~~
chrisbolt
And how much would that cost you?

------
max2grand
Lotsa Drobo, Lotsa Hard drives, several pogoplugs. It would take 41 drobos
filled with 328,2 TB drives. I don't think I'd hook up more than four drobos
per pogo, so 11 pogoplugs.

328 HDs at $84.99=27876.72 41 drobos at $1,511.11=61955.51 11 pogoplugs at
98.75=1086.25 Total=90918.48 including shipping.

[http://www.newegg.com/Product/Product.aspx?Item=N82E16822148...](http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413&nm_mc=OTC-
Froogle&cm_mmc=OTC-Froogle-_-Hard+Drives-_-Seagate-_-22148413)

[http://www.newegg.com/Product/Product.aspx?Item=N82E16822240...](http://www.newegg.com/Product/Product.aspx?Item=N82E16822240016&nm_mc=OTC-
Froogle&cm_mmc=OTC-Froogle-_-Network+-+Storage-_-Drobo-_-22240016)

[http://www.newegg.com/Product/Product.aspx?Item=N82E16822601...](http://www.newegg.com/Product/Product.aspx?Item=N82E16822601002&cm_re=pogoplug-_-22-601-002-_-Product)

------
mathnode
Massive Array of Idle Disks. Speak to your favourite SGI or IBM sales rep!

------
Mankhool
LTO5, 2 copies. I'm archiving TBs of video in a Final Cut Server environment
this way. My 2 cents.

------
idknow1
artificially re-encoded [DR]NA sequences with redundant branching;

OR:

uuencode and compress with one of the newer high-end compressors like lrzip.

------
martin1b
You need a chinese magical hard drive
(<http://blog.jitbit.com/2011/04/chinese-magic-drive.html>)

------
wakes
Encode in a laser and bounce off the moon.

