'If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.'
This is annoying, that they're using the enterprise sales model for distribution. Just put it on S3.
They already know how to store massive amounts of data, and how to send it over the network. Assuming $100/TB for their own media means it would only cost them about $4000 to store it themselves.
Assuming you have 1Gb/s connection rate, that would take you over 7 days to download. It's probably both cheaper and faster to write the data to disk and ship the disk then to an S3 download.
It reads more like they don't know if or how people want to use this. (The "are interested in exploring how others might be able to interact with or learn from this content if we make it available in bulk.") Simply making the data available doesn't give them feedback.
For example, is it sufficiently worthwhile for them to go through the effort of providing the data on S3, given the costs?
edit: Just saw dalke's response. Great minds think alike!