A backup copy of the library of Alexandria would have been a nice to have before it burned down but would have been priceless afterwards.
So please, make all of the archive available in some form. It will be an insane amount of data but at least there will be some institutions that will be able to insure this precious resource against various disasters.
Ironically enough, it is hosted at the New Library of Alexandria, Egypt. :)
The internet is flooded with pop culture bullshit. The Library contained precious works from some of the greatest geniuses in history--works that we know exist but were forever lost.
My point is that it's not like there's any shortage of great reading materials. Unless you want some specific materials from the library of Alexandria, it does not make much sense to miss it.
More here: http://www.gwern.net/Culture%20is%20not%20about%20Esthetics
Electricity was discovered and even used in the ancient middle east. Steam powered perpetual motion devices were constructed, but never applied to locomotion.
Can you imagine where we would be now as a species if ideas like these were allowed to propagate across the Mediterranean thousands of years ago? Steam powered devices are only 350+ years old, and your grandpa's grandpa probably did't have electricity in his house.
But without positive proof of a series connection I don't see how that is a sustainable theory. Wiring is a necessity.
'If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.'
This is annoying, that they're using the enterprise sales model for distribution. Just put it on S3.
They already know how to store massive amounts of data, and how to send it over the network. Assuming $100/TB for their own media means it would only cost them about $4000 to store it themselves.
Assuming you have 1Gb/s connection rate, that would take you over 7 days to download. It's probably both cheaper and faster to write the data to disk and ship the disk then to an S3 download.
It reads more like they don't know if or how people want to use this. (The "are interested in exploring how others might be able to interact with or learn from this content if we make it available in bulk.") Simply making the data available doesn't give them feedback.
For example, is it sufficiently worthwhile for them to go through the effort of providing the data on S3, given the costs?
edit: Just saw dalke's response. Great minds think alike!
Here's the "Everything will always work" math:
~ 27 3TB SATA drives @ $129.95 
~ 7 machines @ $60.17  
~ 8-port switch @ $11.99 
~ 100ft cat5 cable @ $15.95 
~ 14 cat5 connectors @ ~$5 total.
~ 2 6-prong power strips @ $5 .
Total: 27 * $129.95 + 7 * $60.17 + $11.99 + $15.95 + $5 + 2 * $5 = $3972.78.
With decent redundancy: ~$5000 .
Monthly power bill: ~$138 .
Labor: $0 .
You can store basically a copy of "The entire internet" for 1/4th the cost of a new sedan  and power it at 1/5th the cost of using that sedan .
I officially christen this the future.
 The cheapest all-in-one with SATA I found was the $49 cubieboard (http://www.linuxbsdos.com/2012/09/11/cubieboard-raspberry-pi...). 4 of them would run you $200 ... putting it at higher expense.
$24.99 Motherboard: http://www.ascendtech.us/asus-p5rc-le-lga775-ddr-motherboard...
$3.50 256MB RAM: http://txmicro.com/256MB-DDR-RAM-PC3200-184-Pin-DIMM-Name-Br...
$4.00 celeron: http://starmicroinc.net/intel-celeron-325j-253ghz-256k-533mh...
$8.95 CPU fan: http://3btech.net/inorlga775co.html
$21.99 550 W PSU: http://3btech.net/24pinch550wa.html
$0.74 Molex -> SATA: http://www.amazon.com/Syba-SY-CAB40007-Molex-Power-Inches/dp...
 Using base price * 1.25.
 http://www.bls.gov/ro9/cpilosa_energy.htm Assuming $0.15 KW/hr + 180 W/hr per machine avg usage we have: 7 * 0.180 KW/hr * $0.15 KW/hr* 24 hr * 365.25 d / 12 m = $138.07/month.
 I mean equity, cough cough.
 Based on KBB value for a baseline 2013 Nissan Altima (http://www.kbb.com/nissan/altima/2013-nissan-altima/25/?vehi...)
 Based on 15,291 miles per year average (http://www.fhwa.dot.gov/ohim/onh00/bar8.htm) * IRS mileage rate of $0.55 (http://www.irs.gov/uac/IRS-Announces-2012-Standard-Mileage-R...) = $700.
You've omitted the labor cost to assemble, debug and maintain your McGyver-Device. That's easily another $2500/mo (amortized).
Secondly you don't really want to store 80T on the cheapest components you can possibly get without a lot of testing and planning. This $22 PSU, trust me, it will come back to haunt you.
Thirdly, "decent redundancy" starts at factor 2.5, not 1.25.
And finally: If you want to put this stuff online and have people actually download it then you'll soon notice that redundancy is not only needed for availability but also for performance.
A reasonable ballpark figure for low-end networked storage nowadays is $0.05/GB per month (it gets much cheaper above 500T). Thus hosting those 80T should cost roughly $4000/mo, give or take a few.
I'd be doing this myself, so I'll charge myself $0.
> Secondly you don't really want to store 80T on the cheapest components you can possibly get without a lot of testing and planning. This $22 PSU, trust me, it will come back to haunt you.
Sure. Of course. Bump that to $45. Ok, another $200. Not huge.
> Thirdly, "decent redundancy" starts at factor 2.5, not 1.25.
If you are serving it to the internet at large. But for personal use, 1.25 is fine unless you are saying the proper RAID setup is Number of disks * 2.5; which would be something new to me, for sure.
> And finally: If you want to put this stuff online and have people actually download it then you'll soon notice that redundancy is not only needed for availability but also for performance.
I don't. The presumption is that it's a copy (my copy, actually), not the original.
> A reasonable ballpark figure for low-end networked storage nowadays is $0.05/GB per month (it gets much cheaper above 500T). Thus hosting those 80T should cost roughly $4000/mo, give or take a few.
You might be getting ripped off :-(.
I can get a half-rack (that's 22U) for $900/month. Even at 2.5 redundancy and if I had to pay for the patch and the switch, it's still way under $4000/month.
Besides, the thought experiment was to run it from somewhere like my entry-way, near my coat-hanger: "What's this? Oh, it's just the internet; the Whole Internet. No no, just a copy."
Yes, of course you can cobble something together when availability does not matter at all (it might blow the fuse in your apt, though;)).
I was just saying that in an application with most basic availability-requirements you're not getting the cost down like that.
I.e. even though you could fit that into one rack, nobody actually would (redundancy is measured in powers of >=2). And even though you might find an ISP who won't bitch about you drawing >10 Amps in "half a rack" (cough), you should still be a little concerned about other tenants screwing around in the same rack as your only copy of 80T of data that you care about... ;)
I still argue 7 machines though. I mean, sure you could do USB + enclosure or have some more expensive board (with 6 SATA connectors, I don't know how cheap those go). Then you also may need more power, depending on how you use the thing. It's true that fewer machines generally = fewer faults, just as a matter of statistics.
But in reality, in practice, the users will probably have some IBM or SGI solution that is a full-height rack with a bunch of SAS drives or something. I'm sure you've seen those things at trade shows.
But my point here was to try to determine how much it would cost with total baseline OTS hardware.