We gave someone a free account to archive this and when they finished we chflag -R schg'd the directory so it is immutable.
We are not publishing this - there is no web or anonymous access to an rsync.net account - but if you're reading this in 2035, know that we have a copy of this.
Related: If you are involved in any way in preserving/archiving usenet, please be in touch. We will give you whatever (storage) resources you require.
Is there a long-term plan for this archive? For example, if someone is reading this in 2035, would they be able to contact you or the person who archived this for a copy if other copies aren't available?
Off-topic: Can you explain why cloud storage is so expensive?
You charge 1c per gigabyte per month, so $120/TB/year. I can buy a 20TB HDD for around $300-$400 in my bedroom. On your rack, same drive runs $2,400 / year.
Your prices are in-line with everyone else. This is an almost the definition of a frictionless commodity market, so if things could be done cheaper, someone would.
What's stopping me from plugging a 20TB HDD into a RaspberryPI, tossing it on a high speed connection (and charging for transfer), and getting 6x annual returns for providing the world's cheapest (if not fastest / most reliable) storage? For backups (as opposed to e.g. web hosting), that's good enough.
How much business do you lose when that single drive fails and you have to tell your customers you lost their data? There’s a baseline cost to managing arrays of disks and the software which runs them, keeping everything patched, paying for the building and power, answering support questions, etc. The reason this can seem massively cheaper on your own is that you aren’t paying yourself wages or rent, or otherwise dealing with the cost of spending your time juggling disks rather than doing something else.
I can speculate too, but the reason I asked OP is because he would know and have actual numbers.
Regarding your questions:
1) Loss of data only matters in some contexts. For cloud backups, it really doesn't matter much since it's already a backup. It only matters if the cloud provider AND my local system go down at the same time.
2) Still, at scale, avoiding loss of data is CHEAP. Something analogous to RAID6, with data striped across data centers, means the overhead can be quite low.
3) Yes, I do pay people salaries. Buying disks -- even stupidly and just having two copies of everything -- works out a lot cheaper than cloud storage. Ask any video shop if they use (1) cloud storage or (2) a storage closet full of hard drives.
The numbers don't work out for me. By my estimate, I could provide the same service which rsync does for *much* less money and dominate the cloud storage market. I'm obviously wrong, or someone would have done so already, but I don't know how I'm wrong.
It matters in all contexts where you’re charging people to store their stuff. The iron law of backups is that if you haven’t tested them, you don’t have them and that’s the case here because people will usually only use yours after they’ve found a problem locally.
Consider also the problem of synchronization: if your remote backup fails, someone with the average American anemic upload might need weeks to re-upload everything, especially if they don’t want to interfere with everything else on their network.
Personal feedback: You have a tendency to make obvious, unsubstantiated, often slightly incorrect statements with a very high level of confidence and assuming you're talking to an idiot. I'm not learning anything from you. Let's both move on. This is my last post in this thread.
If you do want to learn something yourself, I'd suggest, as a homework exercise, taking the time to architect this out, and running some numbers on things like:
- How often local devices fail (HDD AFR)
- How often remote devices fail
- Overhead of monitoring / catching each type of failure
- Time for synchronization
If you can, include algorithms along the lines of those in RAID5. I would also back-of-the-envelope cost estimates.
The goal of backups is to decrease odds of data loss to some manageable, quantifiable risk (e.g. from ≈1% AFR to e.g. ≈0.01% AFR), and what tolerable levels of risk are is an ROI and expected value calculation: odds of failure * cost of failure should be less than the cost of the backup system.
It's a good homework exercise, and purely for your benefit. E.g. that's the sort of question I might ask on an interview, and answers like yours would not lead to a hire decision. The people I least enjoy work with operate from best practices (often using strong language like "iron laws"), and can't operate from first principles.
In this case, this is all quantifiable.
You don't need to post your response. There's a reason I asked someone who might now, and you cutting in was a pure distraction. I'm done here.
I find it interesting that you are claiming to have known all of this all but your original proposal was “plugging a 20TB HDD into a RaspberryPI, tossing it on a high speed connection (and charging for transfer), and getting 6x annual returns”.
If I was asking someone else for free training, I’d want my question to accurately reflect my starting knowledge and willingness to do my own homework.
- You have an uncle with skin cancer. You bump into an oncologist at a party. You ask some technical questions. Annoying lady cuts in and starts talking about essential oils. Oncologist wanders off.
- You're an armchair general (a nerd who has never been in the army, but reads a lot), and you bump into a government intelligence analyst who studies the situation in Ukraine. It's a goldmine! Someone who read an article on Fox News cuts in and starts lecturing you on Putin. It's not even quite wrong, but it's so shallow it's pointless. The analyst wanders off.
That's basically how I read your intervention. I didn't ask you for free training or anything else. I asked rsync a question. You decided to butt into a conversation uninvited, without having anything to contribute, figured you knew more on the topic than a cloud storage provider, and proceeded to deliver a lecture at the level of what my child might learn in their grade school IT class.
I'm not claiming to be brilliant, but I don't get why people like you do that. I think that's a place you could provide a more interesting insight than about anything technical.
(And to answer your question: I wrote a question and not an essay; an extreme example is often just a concise way of phrasing something like this)
> You have an uncle with skin cancer. You bump into an oncologist at a party. You ask some technical questions. Annoying lady cuts in and starts talking about essential oils.
You know, you could have written that without the casual sexism. I also find it interesting how you feel this intense need to simultaneously claim that you already knew about all of the same details I mentioned but also dismiss them as irrelevant: if they’re grade school level, what does that say about your original question not indicating awareness of them? If they’re “essential oils”-level quackery, which is hard to reconcile with them being in grade school curriculum, why are you so concerned about saying you knew everything already?
For an organization that is not well-funded, it has taken some remarkably risky actions that few nonprofits or commercial ventures would ever attempt.
I wonder if there's isn't some uber-wealthy Silicon Valley connection waiting in the wings to bail IA out if operating costs, legal fees and financial penalties overwhelm its budget.
Conspiracy theories are always amusing, but I can assure you there isn't some billionaire sitting with a check in case things go incredibly south. I would tell you that things you are defining is risky are simply steps being taken for a library in the 21st century. Anyone not addressing these issues is going to have a very bad wake up in 5 to 10 years.
That's a blast from the past. OS/2 and Hobbes were essential back in the 90s when you wanted to avoid Microsoft OSes, as Linux was still pretty raw until the late 90s. I guess some never really felt like it was worth migrating, and fair enough!
There's a subset of the finance world thats very fond of both "commercial support" and "security by obscurity." OS/2, Alpha servers, they kept hold of lots of stuff 2 decades past its peak.
It's always sad to see stuff like this go. These simple, beautiful sites are like payphones. One day we'll all crowd around the last one as it too goes.
The archive is 18GB. Since it's probably mostly zip files, that's probably about the size of everything extracted. So a static version wouldn't be too bad to just put in S3 and CloudFront or something, and that wouldn't require much work. I used to know someone who helped run it at NMSU in the 90s. (Can't remember which of two acquaintances, or maybe it was both.)
Assuming that some remain, I believe ArcaOS was licensed from IBM to provide support in such circumstances. Not to mention that you aren't going to use software from Hobbes on an ATM.
For what it's worth, I have seen OS/2 used in banks within the past decade (though I believe those banks have since migrated to Windows).
We gave someone a free account to archive this and when they finished we chflag -R schg'd the directory so it is immutable.
We are not publishing this - there is no web or anonymous access to an rsync.net account - but if you're reading this in 2035, know that we have a copy of this.
Related: If you are involved in any way in preserving/archiving usenet, please be in touch. We will give you whatever (storage) resources you require.