I had to end up creating a link component that would automatically link to an archive.org version of the link on every URL if I marked as "dead". It was so prevalent it had to be automated like that.
Another reason why I've been contributing $100/year to the Internet Archive for the past 3 years and will continue to do so. They're doing some often unsung but important work.
It's _not_ a video recording service. It saves and can replay all network requests during a session (including authenticated requests). It's open source, you can self host, I'm not affiliated even though I'm very happy that it exists
If the site goes down later, I just remove the link and don't worry about it. My code from 15 years ago is probably atrocious, so I'll consider it a small blessing :P
It's behind a paywall now, but at least I have a digital copy!
It could potentially be considered fair use, since I'm not making a profit and I provide commentary.
Tangential, but I long for a world where we can all be as candid with each other.
And: Who has been harmed here?
Although people through that term around willy nilly, in our current framework that means being sued for a minimum of $100,000 per supposed violation, and making your fair use defense in front of a judge.
Youtubers have reported spending $50,000 just to begin talking with lawyers and preparing briefs.
Maybe our ISPs can start offering us insurance.
Remember to donate to the EFF, they're literally the only thing between you and a world where the corporations rule the world.
As much as I tried, all I could find about it is that it's run by one guy in Czech Republic who's paying $2000/month out of pocket for hosting, and apparently dislikes Finland.
http://archive.is/robots.txt doesn't seem too bad, it looks like you could slowly inhale everything... in theory. There are no sitemaps (they're there, but empty placeholders); you have to know the site name to be able to get a workable list.
The author hasn't ruled out/blocked archiving the snapshots, but apparently it's... big. http://blog.archive.is/post/154930531126/if-someone-was-will...
Maybe at the fringes, but I feel that the internet today, with my emphasis being on the "inter" (different) "net" (networks) part of it, is far less deeper or richer than before. What we basically have reduced to are a bunch of siloed netowrks such as Facebook.
When I searched for something when Google first came out I got a mix of results from a variety of sites I had never heard about. Today it's basically Wikipedia at the top, with results from the same list of about 3-4 sites depending on the topic of what I searched for.
I would really like to not have in search results most sites that try to monetize on my attention. I want raw facts and opinions. No click-bait to grab my attention or feed my internal cave man with rage. No ad-networks or data extraction operations. Just pages put there by people that want to share knowledge and ideas. I mostly find it on pages that lack ads and often are pure HTML - no CSS and no JS. At least in areas that interest me.
Maybe there is a place for a search engine that would index only pages like that? It certainly would be easier than competing with Google on indexing whole of the attention-whoring Internet.
Probably not, actually; the kind of pages you describe would almost always be leaf nodes on the web graph, so your spider would need to walk "through" the attention-whoring parts to get to them, whether you kept records of doing so or not. (And it'd be very inefficient to not.)
...and if you actually try to search for more obscure/"fringe" subjects/phrases with Google, you either get no results (despite knowing that there are still active sites with those phrases), or it starts thinking you're a bot sending "automated queries" and blocking you for a while (not even giving you the option of completing a CAPTCHA.)
The first time that happened to me, which was within this year, was my realisation that Google had truly changed, and not in a good way.
The breadth and depth of information on the web now vastly surpasses what was available in 1994. Youtube and other video and music streaming sites have provided a media revolution to compare with the transition from radio to television. Social media, whatever its drawbacks are, allows people to communicate and collaborate far more personally than email or basic chatrooms would have.
Sometimes the way people here seem to dismiss the modern web is baffling. I get it, but look at it from the point of view of the mainstream web user. The web offers access to so much more than would even have been possible in 1994, and lets people interact with one another on a much more direct and complex level.
Yes, the added richness and depth comes with a lot of baggage, but its undeniably there.
But the difference with Wikipedia was that people would maintain a Links section full of interesting stuff and people would join web rings for various subjects, interlinking vastly different sites. Finding information often happened through Yahoo! (AltaVista was there too but it was lacking the quality of handpicked results) through a tree based discovery system, to continue through whatever you could find through links on an interesting page. Exchanging links was something that really frequently happened.
It resulted in an internet where you just kept clicking and discovering and digging. Sometimes also frustrating as browsers lacked tabs and I would navigate all links one by one by loading it and going back. I would forget how I arrived at a certain page sometimes because it was so deep and I never found the breadcrumbs again.
Um excuse me what do you call this masterpiece
I would have said TVTropes, but the core point is the same.
I remember having to restart my computer, because IE lacked tabs and Windows would let you open so many instances of it that the whole OS ground to a halt.
It's weird to think that now, in the absence of Google, I couldn't find my way from anything to anything else.
For a close look at the earlier era, see this very long essay I wrote in 2006 (which was popular back in 2006) in which I summarized the tech world's blog debate about RSS:
"RSS has been damaged by in-fighting among those who advocate for it"
More content density, better S/N ratio.
All the HTML takes 2-4 kB on average, but you might not get much use from the site in a terminal :)
I wish everyone who pined for the 25-years-ago days of mostly text pages would, instead of pining, just go out there and produce that content they want to see.
Instead of pining, start writing. Hosting is cheap or free, browsers still parse simple HTML, there's nothing stopping anyone from creating a return to that simpler form.
# I Bought a Book About the Internet From 1994 and None of the Links Worked - Motherboard
For one, it was largely un-monetised. There were no banner ads, no ad trackers, no giant shopping sites, no market-driven content design.
But many ISPs offered free web space, and HTML 1.0 was so simple almost anyone could hack together a basic site with a plain vanilla text editor.
So people did. For fun. Lots and lots of them.
It looked like crap, but it was weird/hilarious/insane/inspiring in equal measures in a way that's impossible to reproduce today.
What? It's not at all impossible. Get a server, put whatever you want on it. No one is going to force you to monetize or market anything, or use a Node.js backend with AWS and React, or whatever the kids are doing these days. Basic HTML in a plain text editor still works just fine.
Not in Germany. There is a set of laws called "Impressumspflicht" (https://de.wikipedia.org/wiki/Impressumspflicht) which forces you to add a mandatory imprint on your website. If you do something wrong or forget to include some mandatory information (what is mandatory also depends on the kind of website), you can easily get sued (and this has often happened). In other words: It is not easy for a layperson to set up a website in a way that will not easily become a risk of becoming sued.
It just happens that people add unnecessary complexity to their projects nowadays because 1) they use frameworks and tools that facilitate it and 2) it looks better on their resumes.
The laws introduce lots of complexity, which leads to lots of requirements in the code. So this is no contradiction.
> It just happens that people add unnecessary complexity to their projects nowadays because 1) they use frameworks and tools that facilitate it and 2) it looks better on their resumes.
And 3) because the law requires such complications (in my personal perspective the largest problem that causes the most headaches). Just to give another, "more EU/less German" example:
which requires that the user has to explicitly opt-in before a cookie is set. This of course has to be coded - otherwise risk being sued.
There were, however, tons of animated 'under construction' banners, which gave the same 'visual noise' problem.
If you very badly want to rip off the theme, it's stealable from my GitHub copy of my site: https://github.com/KeenRivals/brashear.me/tree/master/themes...
That's what you are suggesting.
No need to spend any money.
I get this request a lot actually. The reason I decided to not do it was because webrings, though nice, had a lot of problems. The main issue was that people's sites would go away, and then the ring would break. I also didn't want to introduce any functionality that would make sites depend on Neocities backend APIs to function. Web sites are more long-term and durable if they remain (mostly) static.
I tried using "tags" that could bind sites together on Neocities, but to be honest the idea has largely been a failure. People will tag their site "anime" and their site will have nothing to do with anime... but it's a popular tag so they add it in just so they're on a popular tag. Geocities had this problem to a certain extent too (a tech site being in the non-tech neighborhood). You can get a flavor of the problem here: https://neocities.org/browse?tag=anime
One idea I'm considering is to only allow a site to have one tag, rather than 3 like I do right now. Maybe that will stop people from adding tags that are irrelevant to the content of their site. Or it may compound this problem. I'm on the fence about it.
Another idea I'm considering is allowing people to create curated lists of their favorite sites on Neocities, similar to playlists on Youtube. The "follow site" functionality kindof does this, but in a generic way, and it tends to be a bit... I guess nepotistic (hey you're popular, follow me so my site can get more popular too!)
I'm always happy to hear ideas on how to improve this. I do like the idea of related sites being able to clump together, but in practice it doesn't work as well as I would like it to. But maybe it works well enough and I'm overthinking it.
I just discovered Neocities BTW, it sounds very interesting!
But if the next site went down or didn't link to the next site correctly, you couldn't proceed. That was always my problem with webrings. They depended on each site to embed the ring code properly, and usually they didn't, so you were stuck trying to find a working one. It was a pretty lousy UX overall.
Maybe a modern equivalent would redirect downed sites to the IPFS archive.
The BBC also donated its Networking Club to the Internet Archive: https://archive.org/details/bbcnc.org.uk-19950301
A good demonstration of how far video on the internet has come.
160px X 120px, 8 frames per second
Did it by hand with MS Notepad, MS Paint, Apple QuickTake 100 camera, Chameleon TCP/IP, and Mosaic for PC.
I did, however, drop the webmaster email address about 20 years ago.
How come the site is hosted at lysator and how come it's still up?
Sidenote: The man hosting the site has a very on-topic profile page.
I was commiserating on Usenet about not being able to find affordable hosting for the website. This was way before hosting-only companies were available or even Geocities. David stepped up and generously offered room on Lysator.
As to why it's still up, I'm not sure. There was a short period where it looked like it was offline, but it's back now. Perhaps because Expo '94 is on a lot of "oldest websites that still work" lists.
CNN's original online coverage of the OJ Simpson murder trial (1995) is still online and mostly intact - http://www.cnn.com/US/OJ/
Welcome to Netscape (94) - http://home.mcom.com/home/welcome.html
It's later than 96 but I don't know who is still paying for hosting for a stadium that was demolished over 16 years ago http://www.3riversstadium.com/index2.html
And this one from CNN (still 1996, but appropriate representation of that 1995/96 era when design had changed a bit from the earlier plain white backgrounds & basic text layouts):
The credits link & page are pretty bad, otherwise no complaints.
And hey, it still works well on small screens. Not flawless, but close.
Curious to know how this works with ad-heavy sites. Do responsive sites display differently then too?
Been flirting with going to a single 32" monitor for a while. Just wondering if you think it's worth it with your experiences since I do development as well and have a similar setup (one monitor for editor and smaller laptop for browser/terminal) and would like to hear your input.
But, generally, it's fine. Most sites are entirely usable. There's a few that are quirky, but generally, modern websites are designed to scale down to tablets and phones, so they don't act too weird for a narrow lapptop/desktop browser.
I think the most common problem is sites that switch to hamburger menus at too high a resolution (so I get the mobile hamburger navigation on some sites, even though it's kinda silly looking and slightly less ergonomic). It's not super common, though. Most switch to hamburger a little lower than my browser width.
Monitors might be wider, but screens in general are much narrower.
EMWAC supported TLS 1.2 already back then. Amazing!
Good thing they were usually optional.
Frames were a nightmare. You can't link to a page in frames, you can't bookmark it either. Frames break the back button. Come in via search engine? You're only in the main frame, your navigation frame is missing. Want to print? Lol, good luck with that. You always ended up (either intentionally or unintentionally) with a browsing session within someone else's unrelated frameset.
I also heard they were bad for screen readers.
Pages from 1996 start 102 pages prior to the last page (at present: http://oneterabyteofkilobyteage.tumblr.com/page/10743 )
Also funny that my email address I signed up for in 1996 with them is still active, even though I dont check it for years at a time.
its an embarrassingly amusing slice of life :)
Also, sites are a very volatile medium. I often bookmark pages with interesting information to read later, and it inevitably happens once in a while that a site went down and I just can't find the information anymore.
I had the same experience and that's why I made a browser extension that archives pages when you bookmark them. (https://github.com/rahiel/archiveror)
Happy to discover yours!
Launching soon. We use a browser extension to mine your current tab on a click of a button!
I've recently had this problem with some online fiction that I had bookmarked. Now, I was able to recover thanks to the Wayback Machine, but I really shouldn't depend on that.
I should really put some thought into archiving pages I like or getting a Pinboard account.
On another note, the more dynamic the web becomes the harder it will be to archive so if you think that the 1994 content is a problem wait until you live in 2040 and you want to read some pages from 2017.
I know I won't be maintaining it forever, but I want it to be accessible through the archive.
That VC money must really be flowing...
It's just as ridiculous as it sounds.
But then again there's WebRecorder for exactly that.
For even more accuracy save a Chromium binary of the version at the time so it'll look exactly as intended.
> The average lifespan of a web page is 100 days. Remember GeoCities? The web doesn't anymore. It's not good enough for the primary medium of our era to be so fragile.
> IPFS provides historic versioning (like git) and makes it simple to set up resilient networks for mirroring of data.
If geocities.com/aoeu isn't popular, then IPFS won't store it unless someone bothered to pin it. And as soon as they stop, it'll disappear.
You need a dedicated host (like archive.org) to retain it, or volunteers willing to coordinate and commit their resources. Otherwise it's just more resilient (a good thing), but not permanent.
It's not "just" more resilient, It's also much more elegant and convenient: with the current web you need to go find some archive version of the dead link you found, while with IPFS the link can simply work, even after the creator stops hosting it.
On the other hand, nobody is talking about the problem of domains: yes, linkrot is a thing, but many are due to dead domains and dead blogging/content silos.
Original Geocities content is probably the oldest large body of internet content that will be preserved for many years to come and it will look roughly how it looked on Geocities because it wasn't relying on much of anything other than basic HTML and some images.
I search for it once a year, hope springs eternal.
Have you checked Jason Scott's torrent of all things geocities as well?
Looks like I even got some of the images on disk here, but no html.
Jacques, I never knew it was you behind reocities, thanks for saving a large part of the old content geocities! The story on what you did at creating reocities alone is a worthwhile read. 
edit: Just found again (and remembered) web.archive.org has 2 of its pages, so some of it is still out there.
I distinctly remember a misspelling in my url--- /liscensetokill --but I can't be sure of my parent category anymore. I think /mi5/
Haven't seen it in over 15 years, though.
Either that or you can spend an hour clicking :) :
It really stresses the importance of directly quoting / paraphrasing the content you want in your plain text, and not relying on external resources for posterity.
Looking at you Photobucket. And all those useful images now replaced with a meaningless Photobucket placeholder.
More info: https://en.wikipedia.org/wiki/Wikipedia:Link_rot
but redirects are also present in legitimate pages
What's cool isn't how fast some of these technologies become obsolete, such as various Java applets and cgi-bin connected webcams. It's the static content that can survive until the end of time.
Like Nicolas Pioch's Web Museum. Bienvenue!
The MBone was not a "provider", it was an IP multicast network. This was the only way to efficiently stream video content to thousands of simultaneous clients before the advent of CDNs. https://en.wikipedia.org/wiki/Mbone
For years I thought TV stations might connect to the MBone to do simulcasts for people on the Internet once broadband became widespread, but the world moved on before it could become a reality. Part of me still thinks this is a missed opportunity but it's too late to cry over it now.
It was neat, but presumably most of the multicast stuff was abandoned not long after I left.
Someone else worked on a system to help the user schedule their TV around the broadcast times.
There was a vicious circle where the core didn't do multicast so the big iron routers didn't put multicast support in hardware, which made it impossible to support multicast on the core...
How I laughed.
I noticed that the wayback machine no longer lists historical sites if the latest/last revision of robots.txt denies access. Has anyone else experienced this?
In the late-90's I helped build one of the first fortune-500 e-commerce web sites. The website was shutdown years ago, but it view viewable on the wayback machine as recently as a year ago. The company in question put a deny-all robots.txt on the domain, and now none of the history is viewable.
It's a shame -- used to use that website (and an easteregg with my name on it) as proof of experience.
Do you know if archive.org still has a copy?
edit: also doesn't load on a desktop here.
But, worldwide several predate that. Botin Restaurant dates back it 1725... Oldest I could find was Stiftskeller St. Peter. Salzburg, Austria is still in the same building from ~803.
"Free $tuff From the Internet" from 23 years ago is closer to a restaurant guide than a book containing literary or academic references. I'd imagine most of the references in an 1850s coupon book or almanac would also be long dead.
The "Right to Be Forgotten" in EU law refers to personal data not to works that you have published, it is about privacy not copyright. See the Commision's factsheet on the subject: http://ec.europa.eu/justice/data-protection/files/factsheets...
I had lots of fun reading them as an Internet-addicted kid -- but several of the links were dead even before it was officially published.
Makes me want to try to read a Markdown-only Internet browser, which treats native Markdown documents as the only kind of Web page.
On second thought that wouldn't be so bad considering the bloat we have to deal with nowadays. (1 MB per page for just news from sites like CNN ugh.)
And yes, the way I got on the internet in those days was to dial into a public Sprintlink number, then telnet to a card catalog terminal in the Stanford library, and then send the telnet "Break" command at exactly the right time to break out of the card catalog program and have unfettered internet access. Good times.
That was before I had a local PPP provider, of course.
The web is ephemeral unless somebody archives it. Many companies offer an archive service for your sites for a fee, and archive.org does it to provide a historical record,
Zilch. Nada...couldn't find it anymore. Gone. Something I had easily chanced upon before I know couldn't find with directed searching. They must have restructured their site.
Answers a question I always had about "Snow Crash" by Neal Stephenson. The main character, Hiro Protagonist (I still giggle at that name), sometimes did work as a kind of data wrangler - "gathering intel and selling it to the CIC, the for-profit organization that evolved from the CIA's merger with the Library of Congress" (Wikipedia).
I always wondered what made that feasible as a sort of profit model, and I guess now I know - that was the state of the internet in 1992, when the book was published. Seems like a way cooler time period for Cyberpunk stuff, I'm almost sad I missed it :(
You think most businesses are less than 24 years old? Or do you think businesses change phone numbers every few years?
(Of course, there would be plenty of attempting to read from the C:\ drive too, but that didn't make a loud, unexpected sound like reading from the floppy drive did)
Also, consider that right-click save as works the vast majority of the time and the exceptions are for content which simply wouldn't be available (e.g. video streams) for direct download due to IP concerns.
People tend to think that our society is very well documented but if you look at what is left of old societies it is usually whatever was engraved in stone or if you're lucky what remained on paper. With the internet replacing most or even all of the paper storage in the short term it is true (besides our present day paper not being acid free to last long enough anyway), we are better documented than ever. But in the longer term it may be a huge gaping hole in history.
And that's different than cities changing at their regular pace and books becoming less dated. It's like the visitor guide itself is no longer readable and therefore you won't even know what was there in the past.