1. There was no rush to pick the latest technologies. Tried and tested was much better than new and shiny. Archive.org was mostly old PHP and shell scripts (atleast the parts I worked on).
2. The software was just a necessity. The data was what was valuable. Archive.org itself had tons of kluges and several crude bits of code to keep it going but the aim was the keep the data secure and it did that. Someone (maybe Brewster himself) likened it to a ship traveling through time. Several repairs with limited resources have permanently scarred the ship but the cargo is safe and pristine. When it finally arrives, the ship itself will be dismantled or might just crumble but the cargo will be there for the future.
3. Everything was super simple. Some of the techniques to run things etc. were absurdly simple and purposely so to help keep the thing manageable. Storage formats were straightforward so that even if a hard disk from the archive were found in a landfill a century from now, the contents would be usable (unlike if it were some kind of complex filesystem across multiple disks).
4. Brewster, and consequently the crew, were all dedicated to protecting the user. e.g. https://blog.archive.org/2011/01/04/brewster-kahle-receives-.... There was code and stuff in place to not even accidentally collect data so that even if everything was confiscated, the user identities would be safe.
5. There was a mission. A serious social mission. Not just, "make money" or "build cool stuff" or anything. There was a buzz that made you feel like you were playing your role in mankinds intellectual history. That's an amazing feeling that I've never been able to replicate.
Archive.org is truly only of the most underappreciated corners of the world wide web. Gives me faith in the positive potential of the internet.
This resonates with me. Sometimes we developers need to get off the "move fast and break stuff" bandwagon (which has been ongoing for over decades now), and consider we're the ones responsible for preserving almost all human digital heritage of our epoch. There's a simple and obvious method to implement preservation-friendly content implicit in the web architecture: emit/materialize everything as plain HTML, even dynamic content. This is of course antithetical to most of this decade's SPA web development trends, but I think it's worth drawing a line between web content (worth preserving in the first place) and web apps (which have highly volatile content not worth preserving). I feel like this distinction isn't considered sufficiently in our staged web app architecture dicussions which are all about your latest JS MVw framework, to the degree that newby web devs really don't learn the fundamentals of HTML etc. anymore, and are lead to use React, Vue, etc. for content-based web sites.
> How does archive.org make money?
> I imagine their storage costs must be quite high.
No, they aren't. Building and hosting your own storage is cheap. Same reason Backblaze and Dropbox built their own storage systems.
Archive.org uses S3 extensively. Not exactly cheap.