Hacker News new | past | comments | ask | show | jobs | submit login
I Bought a Book About the Internet from 1994 and None of the Links Worked (vice.com)
460 points by slyall on Aug 15, 2017 | hide | past | web | favorite | 318 comments

I've had a similar problem. In updating my portfolio site recently, I noticed a vast majority of links were dead. Not just live projects published maybe 3 years or more ago (I expect those to die). But also links to articles and mentions from barely one year ago, or links to award sites, and the like. With a site listing projects going back ~15 years, one can imagine how bad things were.

I had to end up creating a link component that would automatically link to an archive.org version of the link on every URL if I marked as "dead". It was so prevalent it had to be automated like that.

Another reason why I've been contributing $100/year to the Internet Archive for the past 3 years and will continue to do so. They're doing some often unsung but important work.

For portfolios, you should really also look into https://webrecorder.io/

It's _not_ a video recording service. It saves and can replay all network requests during a session (including authenticated requests). It's open source, you can self host, I'm not affiliated even though I'm very happy that it exists

Thanks for mentioning this site. I've done this with mitmproxy but it's complicated, and this was super easy -- I could recommend to anyone.

That's impressive. I'll take a look, thanks for the tip.

I've updated my portfolio before and noticed that as well. I usually include a screenshot or two when I first add a project, so at least that remains.

If the site goes down later, I just remove the link and don't worry about it. My code from 15 years ago is probably atrocious, so I'll consider it a small blessing :P

I actually downloaded a copy of the NYT article I was quoted in in 1996 specifically because I feared it would fall off the internet at some point.

It's behind a paywall now, but at least I have a digital copy!

Isn't that illegal, since you don't own the copyright? Or are you not distributing it and keeping it for archive purposes?

It's on my website, which is a blatant copyright violation. So far the NYT hasn't asked me to take it down.

It could potentially be considered fair use, since I'm not making a profit and I provide commentary.

> It's on my website, which is a blatant copyright violation

Tangential, but I long for a world where we can all be as candid with each other.

I figure you're doing the same as someone that cuts an article that they are mentioned in out of a newspaper and frames it on their wall. I've seen plenty of restaurants and businesses do it.

Except that's a physical product that was purchased.

What if OP purchased a copy of that day's paper?

That's exactly the point I'm making.

Isn't the argument for adverts that they're somewhat equivalent to buying the website page?

And: Who has been harmed here?

> It could potentially be considered fair use, since I'm not making a profit and I provide commentary.

Although people through that term around willy nilly, in our current framework that means being sued for a minimum of $100,000 per supposed violation, and making your fair use defense in front of a judge.

Youtubers have reported spending $50,000 just to begin talking with lawyers and preparing briefs.

Maybe our ISPs can start offering us insurance.

If he is actually providing commentary and it does meet the fair use requirements, the EFF would probably end up representing him.

Remember to donate to the EFF, they're literally the only thing between you and a world where the corporations rule the world.

Don't downvote people for asking questions ffs!

beware: robots.txt can retroactively clear archive.org data

To clear things up: robots.txt can retroactively hide content from the archive. If it's changed back to allowing the archive's crawler, content from before the ban can be accessed again.

another reason to use archive.is

Considering the topic of discussion, how sure can you be that archive.is will still be around in a year? Three years? Ten?

As much as I tried, all I could find about it is that it's run by one guy in Czech Republic who's paying $2000/month out of pocket for hosting, and apparently dislikes Finland.

It's actually worse - http://blog.archive.is/post/151510917631/how-do-you-guys-kee... says it's $4k/mo.

http://archive.is/robots.txt doesn't seem too bad, it looks like you could slowly inhale everything... in theory. There are no sitemaps (they're there, but empty placeholders); you have to know the site name to be able to get a workable list.

The author hasn't ruled out/blocked archiving the snapshots, but apparently it's... big. http://blog.archive.is/post/154930531126/if-someone-was-will...

I think http://www.webcitation.org/ might be better in that regard since it's a consortium of "editors, publishers, libraries". See "How can I be assured that archived material remains accessible and that webcitation.org doesn't disappear in the future?" in their FAQ (http://www.webcitation.org/faq). Although from my perspective it seems to be more geared towards academic use.

archive.is is very nice, but they're a URL-shortener as well, so their links are utterly opaque strings of alphanumerics, whereas the Wayback Machine preserves both the full original URL and the date and time it was captured in the archival URL.

What is the difference between these two?

archive.is does not crawl automatically, it must be pointed at a page by a user. While this makes it particularly useful for snapshotting frequently-changed pages, it is not a replacement for the proper Internet Archive.

perma.cc is another option as well.

I miss the optimism of the early web, when you could create a simple web page, join a web ring and going online was an event. It's richer and deeper now, but the rawness and simpleness of it all was enjoyable and novel.

Is it richer and deeper now?

Maybe at the fringes, but I feel that the internet today, with my emphasis being on the "inter" (different) "net" (networks) part of it, is far less deeper or richer than before. What we basically have reduced to are a bunch of siloed netowrks such as Facebook.

When I searched for something when Google first came out I got a mix of results from a variety of sites I had never heard about. Today it's basically Wikipedia at the top, with results from the same list of about 3-4 sites depending on the topic of what I searched for.

I'm beginning to think that there is a niche for a peculiar kind of a search engine. A search engine for static almost-none to none JavaScript pages. It would penalize pages for ad-network usage.

I would really like to not have in search results most sites that try to monetize on my attention. I want raw facts and opinions. No click-bait to grab my attention or feed my internal cave man with rage. No ad-networks or data extraction operations. Just pages put there by people that want to share knowledge and ideas. I mostly find it on pages that lack ads and often are pure HTML - no CSS and no JS. At least in areas that interest me.

Maybe there is a place for a search engine that would index only pages like that? It certainly would be easier than competing with Google on indexing whole of the attention-whoring Internet.

Awesome, filtered top 10^6, removed sites with ads and e-commerce, typed in "enigma machine" and got some great gems:

http://ciphermachines.com/index.html http://enigma.louisedade.co.uk/howitworks.html

If you're interested in Enigma machines and find yourself in Maryland, you can play with one at the NSA museum next to Ft. Meade.

http://enigma.louisedade.co.uk is 3rd result in google search for "enigma machine" though

Looks like the site's having some issues right now. Using some of the search criteria redirects me to https://millionshort.com/500

Several search engines have had issues today, I haven't heard anything about a root cause though.


This is terrific, I wanna add this as a ddg bang...

Just filled out ddg's suggestion form to add this as a bang. I'll let you know if it goes live.

This is awesome, never knew about this feature. I may have to give ddg another shot.

I had that feeling of discovering Internet again when I used tor and surfed hidden websites for the first time and read beginner's wikis, opinions pieces such as The Matrix, etc.

I am not interested in most of the "deep web" but what you say sounds interesting. Could you please provide link to that Matrix thing? And other pieces you found interesting?

http://zqktlwi4fecvo6ri.onion/wiki/index.php/Main_Page is the wiki I stumbled upon when I first accessed hidden websites, the matrix rant is the first link, but it's not in the form of what I remember (PS: I do not endorse the content, it's mostly a critic of our society's mechanisms).

> It certainly would be easier than competing with Google on indexing whole of the attention-whoring Internet.

Probably not, actually; the kind of pages you describe would almost always be leaf nodes on the web graph, so your spider would need to walk "through" the attention-whoring parts to get to them, whether you kept records of doing so or not. (And it'd be very inefficient to not.)

I don't know about that - I find that I get a lot of my information from sites that have user generated content such as Medium, reddit, and of course HN. I think it would be extremely hard to fit in sources like that to your search engine without letting in what I will admit is garbage. Would be very cool if it did manage to though!

Well, there was Yanoff's list which was pretty great. I think 94 was around the time I stopped having to remember a lot of IP addresses.

I would love this

When I searched for something when Google first came out I got a mix of results from a variety of sites I had never heard about. Today it's basically Wikipedia at the top, with results from the same list of about 3-4 sites depending on the topic of what I searched for.

...and if you actually try to search for more obscure/"fringe" subjects/phrases with Google, you either get no results (despite knowing that there are still active sites with those phrases), or it starts thinking you're a bot sending "automated queries" and blocking you for a while (not even giving you the option of completing a CAPTCHA.)

The first time that happened to me, which was within this year, was my realisation that Google had truly changed, and not in a good way.

I've sort of had this same experience myself. The quality of links for obscure topics is nowhere near as good as it used to be. It's harder and harder to find the topics I know exist, sometimes they're buried under lots of irrelevant results, or results I know about aren't there at all. I've not experienced the 'bot' throttling to my knowledge, but it certainly feels like they're trying to do some kind of language translation for me when I want explicit results. I'm not convinced Advanced search isn't doing similar translations either

Could you give a few examples of these queries?

Don't confuse people's tendency to no longer bother looking past the top of the first page of Google with the internet somehow shrinking into whatever fits those slots. Of course the most popular sites now dominate the top of Google's search results, but Google isn't the internet any more than Facebook is.

The breadth and depth of information on the web now vastly surpasses what was available in 1994. Youtube and other video and music streaming sites have provided a media revolution to compare with the transition from radio to television. Social media, whatever its drawbacks are, allows people to communicate and collaborate far more personally than email or basic chatrooms would have.

And let's not even get into the ways that Javascript, HTML5 and Webassembly have and will transform the web into a platform in which virtual machines will converge to becoming just another content type. I know people here like to rend their garments and scream Javascript Delenda Est[0] into the void and just hope everything that happened to the web in the last 20 years just goes away, but the day is coming where all archived and obsolete code will have a URL endpoint that bootstraps a VM and runs it. The best the web of 1994 could do is file downloads, maybe Java applets and flash.

Sometimes the way people here seem to dismiss the modern web is baffling. I get it, but look at it from the point of view of the mainstream web user. The web offers access to so much more than would even have been possible in 1994, and lets people interact with one another on a much more direct and complex level.

Yes, the added richness and depth comes with a lot of baggage, but its undeniably there.



The really bonus of the internet of old was that text was most of its content. Today you have video, pictures, emojys and music and - in my opinnion it reduces the experience.

I remember browsing the internet was much more of a networked thing. If you want to know what it was like you could take a look at Wikipedia, where you still can get lost in a never ending deeper web of links. However WikiPedia is a very cleaned up version of the early web. It lacks animated .gifs for sure.

But the difference with Wikipedia was that people would maintain a Links section full of interesting stuff and people would join web rings for various subjects, interlinking vastly different sites. Finding information often happened through Yahoo! (AltaVista was there too but it was lacking the quality of handpicked results) through a tree based discovery system, to continue through whatever you could find through links on an interesting page. Exchanging links was something that really frequently happened.

It resulted in an internet where you just kept clicking and discovering and digging. Sometimes also frustrating as browsers lacked tabs and I would navigate all links one by one by loading it and going back. I would forget how I arrived at a certain page sometimes because it was so deep and I never found the breadcrumbs again.

> It lacks animated .gifs for sure

Um excuse me what do you call this masterpiece https://upload.wikimedia.org/wikipedia/commons/e/ea/Ellipses...

Wow! That's nice. I started to imagine how it could be used in a geeky clock.

Doesn't count unless it's a spooky skull.


> If you want to know what it was like you could take a look at Wikipedia, where you still can get lost in a never ending deeper web of links.

I would have said TVTropes, but the core point is the same.

I remember having to restart my computer, because IE lacked tabs and Windows would let you open so many instances of it that the whole OS ground to a halt.

It's weird to think that now, in the absence of Google, I couldn't find my way from anything to anything else.

I especially miss the blogosphere of the era 1999-2006, before the emergence of Facebook and Twitter. I miss the era when tech people could debate a new technical protocol by posting thoughtful essays on their blogs, and then other technical people would post rebuttals on their own blogs, and the conversation would go back and forth, among the various blogs, but out in the open, and very democratic. Nowadays a lot of the new protocols are, for all practical purposes, designed inside of Google or Facebook or Apple, and then announced to the world, without much debate.

For a close look at the earlier era, see this very long essay I wrote in 2006 (which was popular back in 2006) in which I summarized the tech world's blog debate about RSS:


"RSS has been damaged by in-fighting among those who advocate for it"

I just miss that it was mainly text and could be used in a terminal without missing much.

More content density, better S/N ratio.

Recently I came across the NASA Astronomy Picture of the Day website[1], it's old-school HTML, started in 1995 and still updated today.

All the HTML takes 2-4 kB on average, but you might not get much use from the site in a terminal :)

[1]: https://apod.nasa.gov/apod/archivepix.html

Not specifically directed at you, but sort of directed at you:

I wish everyone who pined for the 25-years-ago days of mostly text pages would, instead of pining, just go out there and produce that content they want to see.

Instead of pining, start writing. Hosting is cheap or free, browsers still parse simple HTML, there's nothing stopping anyone from creating a return to that simpler form.

I do and I test my personal website on lynx. Doesn't change the fact that sites like this are under the misguided impression that it's cheaper to ship 2 MB of JavaScript with each request rather than just responding with the fucking 2 KB of article text so they end up looking like:

    #                                                                                                                                                     I Bought a Book About the Internet From 1994 and None of the Links Worked - Motherboard



So how much would you pay for a bare-bones, text-only version of the content, if you're going to read JS-decorated content?

The question isn't one of us creating it, although, unlike those downvoting you, I understand where you come from. The question is, who will (and how to) create a curation of sorts. A Google for the simple web. A one stop shop that encourages barebones simplicity and fosters a community where people only allow a simplified internet, and adtech is a nonstarter.

Are you suggesting that ads can't appear in text articles? Or that ad blockers don't exist for graphical browsers?

It's impossible to produce that content because the culture was very different.

For one, it was largely un-monetised. There were no banner ads, no ad trackers, no giant shopping sites, no market-driven content design.

But many ISPs offered free web space, and HTML 1.0 was so simple almost anyone could hack together a basic site with a plain vanilla text editor.

So people did. For fun. Lots and lots of them.

It looked like crap, but it was weird/hilarious/insane/inspiring in equal measures in a way that's impossible to reproduce today.

> It's impossible to produce that content because the culture was very different.

What? It's not at all impossible. Get a server, put whatever you want on it. No one is going to force you to monetize or market anything, or use a Node.js backend with AWS and React, or whatever the kids are doing these days. Basic HTML in a plain text editor still works just fine.

> What? It's not at all impossible. Get a server, put whatever you want on it.

Not in Germany. There is a set of laws called "Impressumspflicht" (https://de.wikipedia.org/wiki/Impressumspflicht) which forces you to add a mandatory imprint on your website. If you do something wrong or forget to include some mandatory information (what is mandatory also depends on the kind of website), you can easily get sued (and this has often happened). In other words: It is not easy for a layperson to set up a website in a way that will not easily become a risk of becoming sued.

The context of this conversation seems to have more to do with code and complexity than legal necessities but point taken. The parent was suggesting it was impossible to create the sort of simple, personal, just for fun sites that people used to, but there's no technical reason for that to be the case.

It just happens that people add unnecessary complexity to their projects nowadays because 1) they use frameworks and tools that facilitate it and 2) it looks better on their resumes.

> The context of this conversation seems to have more to do with code and complexity than legal necessities but point taken

The laws introduce lots of complexity, which leads to lots of requirements in the code. So this is no contradiction.

> It just happens that people add unnecessary complexity to their projects nowadays because 1) they use frameworks and tools that facilitate it and 2) it looks better on their resumes.

And 3) because the law requires such complications (in my personal perspective the largest problem that causes the most headaches). Just to give another, "more EU/less German" example:

> https://en.wikipedia.org/wiki/Privacy_and_Electronic_Communi...

which requires that the user has to explicitly opt-in before a cookie is set. This of course has to be coded - otherwise risk being sued.

> There were no banner ads

There were, however, tons of animated 'under construction' banners, which gave the same 'visual noise' problem.

You could stop all animations on a page by pressing escape (or the stop button, I think) in Netscape, but Firefox removed that feature a while back.

All the content I read will still be the way I don't want it though.

For my site, I created a Pelican theme which is HTML-only. There's no JS at all! I plan on publishing the theme eventually, once I'm a bit happier with it. You can see it at https://brashear.me. I'm very happy with how quickly it loads. I mentioned to one of my friends that it feels like upgrading to DSL back in the dial-up days.

If you very badly want to rip off the theme, it's stealable from my GitHub copy of my site: https://github.com/KeenRivals/brashear.me/tree/master/themes...

Rent is expensive in my city, do you think it is reasonable to tell someone who can't afford rent to just buy their own building?

That's what you are suggesting.

You're not going to be in the Alexa top 10,000. Go run a docker container for $3/month to host your site, cache it with cloudflare's free plan and pay $12/year for a domain. Its only $48/yr to host all your content. It's really not that ridiculous.

For the kind of static content we seem to be talking about you can use https://neocities.org/.

No need to spend any money.

It is ridiculous because time is scarce. We have to decide how to spend our time. That doesn't mean we have to shut up and accept everything else.

I don't write on-line content (no blog for example) but if I did I'd test it in (e)links for sure. :)

Maybe Neocities should add a webring feature.

Hi, founder of Neocities.

I get this request a lot actually. The reason I decided to not do it was because webrings, though nice, had a lot of problems. The main issue was that people's sites would go away, and then the ring would break. I also didn't want to introduce any functionality that would make sites depend on Neocities backend APIs to function. Web sites are more long-term and durable if they remain (mostly) static.

I tried using "tags" that could bind sites together on Neocities, but to be honest the idea has largely been a failure. People will tag their site "anime" and their site will have nothing to do with anime... but it's a popular tag so they add it in just so they're on a popular tag. Geocities had this problem to a certain extent too (a tech site being in the non-tech neighborhood). You can get a flavor of the problem here: https://neocities.org/browse?tag=anime

One idea I'm considering is to only allow a site to have one tag, rather than 3 like I do right now. Maybe that will stop people from adding tags that are irrelevant to the content of their site. Or it may compound this problem. I'm on the fence about it.

Another idea I'm considering is allowing people to create curated lists of their favorite sites on Neocities, similar to playlists on Youtube. The "follow site" functionality kindof does this, but in a generic way, and it tends to be a bit... I guess nepotistic (hey you're popular, follow me so my site can get more popular too!)

I'm always happy to hear ideas on how to improve this. I do like the idea of related sites being able to clump together, but in practice it doesn't work as well as I would like it to. But maybe it works well enough and I'm overthinking it.

What about using machine learning? It sounds like a discovery issue, and recommender systems work very well for that (eg. the side bar of Youtube).

I just discovered Neocities BTW, it sounds very interesting!

I've got a fancy 1080Ti and Tensorflow. If you have any particular things I could try or should read about, I'm happy to look into doing some research! Googling for "Tensorflow recommender" gave some interesting starting points.

Is this a joke?

No, I get this request quite a bit from people that sincerely miss web rings. I'm not sure how much of it is anachronistic nostalgia and how much of it is a true desire to bring it back. To give an example of this, I've gotten more than a few requests to add the Gopher protocol to Neocities.

But aren't web rings just composed of vanilla hyperlinks? Why does that need to be added as a feature? What's stopping people from making them now?

IIRC the old webrings had a CGI backend that would collect the addresses and then you would click "next" and it would take you to the next site. But yeah you could just make a vanilla one. You could also just make one outside of the context of Neocities.

But if the next site went down or didn't link to the next site correctly, you couldn't proceed. That was always my problem with webrings. They depended on each site to embed the ring code properly, and usually they didn't, so you were stuck trying to find a working one. It was a pretty lousy UX overall.

It may have been lousy UX, but I suppose it also provided a social convention of not breaking the chain. Weakest links and all that. Relevant to the OP about linkrot!

Maybe a modern equivalent would redirect downed sites to the IPFS archive.

That's interesting the ones I remember were literally rings of static links, coordinated by the webmasters, presumably over email. I see what you mean now.

those were not webrings. those were "just" link exchanges.

I miss the dial-up BBS boards.

Related to this, I had trouble finding examples of pre 1996 web design. The internet archive has a lot from 1997 onwards. The oldest live examples of sites from that era, that I know of are:




The BBC also donated its Networking Club to the Internet Archive: https://archive.org/details/bbcnc.org.uk-19950301

The Space Jam (film) website is a golden example and I find it amazing that it is still functional:



The trailers still work!


A good demonstration of how far video on the internet has come.

160px X 120px, 8 frames per second

You could try my site for Pinball Expo 1994.


Did it by hand with MS Notepad, MS Paint, Apple QuickTake 100 camera, Chameleon TCP/IP, and Mosaic for PC.

I did, however, drop the webmaster email address about 20 years ago.

As a student at liu who had no prior knowledge about lysator it's always interesting to see lysator links in the wild, they seem to pop up when least expected.

How come the site is hosted at lysator and how come it's still up?

Sidenote: The man hosting the site has a very on-topic profile page[0].

[0] https://www.ida.liu.se/~davby02/

David was a pinball fan like myself and couldn't attend the expo in Chicago that year.

I was commiserating on Usenet about not being able to find affordable hosting for the website. This was way before hosting-only companies were available or even Geocities. David stepped up and generously offered room on Lysator.

As to why it's still up, I'm not sure. There was a short period where it looked like it was offline, but it's back now. Perhaps because Expo '94 is on a lot of "oldest websites that still work" lists.

Lysator is incredibly nostalgic for me; when I was just starting on the internet around 1994-1995, a lot of my favorite websites were on lysator, including the gigantic Wheel of Time Index.

Some good screenshots here: http://www.telegraph.co.uk/technology/0/how-25-popular-websi...

CNN's original online coverage of the OJ Simpson murder trial (1995) is still online and mostly intact - http://www.cnn.com/US/OJ/

Welcome to Netscape (94) - http://home.mcom.com/home/welcome.html

It's later than 96 but I don't know who is still paying for hosting for a stadium that was demolished over 16 years ago http://www.3riversstadium.com/index2.html

There's always the famous Dole Kemp '96 campaign site (a fair representation of mid 1990s web design):


And this one from CNN (still 1996, but appropriate representation of that 1995/96 era when design had changed a bit from the earlier plain white backgrounds & basic text layouts):


I actually watched the debates and couldn't believe Dole closed out his first debate with Clinton imploring youngsters to "tap into" his "homepage" (the above link)


Oh, man. Dole/Kemp was definitely designed in the "Make sure everything fits on a 640 × 480 display, and downloads reasonably fast on a 14.4k modem" era.

I absolutely adore that CNN 1996 site. Man, designs like that were great.

I agree--I think that design is wonderful.

The credits link & page are pretty bad, otherwise no complaints.

And hey, it still works well on small screens. Not flawless, but close.

I think it's fascinating the change in how we view site navigation. Another commenter gave a link to an old Microsoft site that had links all over the place. But for the most part, it seemed like sites started to standardize on navigation vertically on the left side. Now, we generally see them horizontally on the top, or in hamburger menus. But it's interested how that paradigm shifted. It seems like vertical side navigation would be more prevalent now, given how much wider monitors are.

With wider monitors, my browser is actually narrowed. I split the screen in half and devote one half to the browser and the other half to a text editor and terminal. I used to have two monitors to do that, but now just one is fine, but the side effect is that I browse in a pretty narrow window. A narrow window also makes reading somewhat nicer on some sites, since it's harder to read very long lines of text.

>> A narrow window also makes reading somewhat nicer on some sites, since it's harder to read very long lines of text.

Curious to know how this works with ad-heavy sites. Do responsive sites display differently then too?

Been flirting with going to a single 32" monitor for a while. Just wondering if you think it's worth it with your experiences since I do development as well and have a similar setup (one monitor for editor and smaller laptop for browser/terminal) and would like to hear your input.

I use an adblocker, so it's hard to know what ads do (though I disable it on some sites that I believe to be reasonably trustworthy about ads, and that I want to support, like reddit and some major newspaper sites; though I tend to pay for a subscription for sites that I really want to support and continue to block ads).

But, generally, it's fine. Most sites are entirely usable. There's a few that are quirky, but generally, modern websites are designed to scale down to tablets and phones, so they don't act too weird for a narrow lapptop/desktop browser.

I think the most common problem is sites that switch to hamburger menus at too high a resolution (so I get the mobile hamburger navigation on some sites, even though it's kinda silly looking and slightly less ergonomic). It's not super common, though. Most switch to hamburger a little lower than my browser width.

I recently got the Dell 42.5" 4k monster (P4317Q), and while the dpi and color reproduction aren't good enough for serious design work, I'm able to divide my desktop into thirds for browsing and text editing. I usually keep a browser on the left, documentation in the middle, and a terminal on the right. I still get a little giddy when I turn it on every morning. What I really want is for apple to start producing 42" 8k "retina" screens, but I won't hold my breath. :)

I've used a media query in the past to make the menu horizontal along the top in a narrow window and vertical on the left in a wide one.

> given how much wider monitors are.

Monitors might be wider, but screens in general are much narrower.

God I miss when the web was designed for people who could read.

Then you'll "love" Microsoft's 1994 page: https://www.microsoft.com/en-us/discover/1994/

"If your browser doesn't support images, we have a text menu as well." coool

"WWW.MICROSOFT.COM is running Microsoft's Windows NT Server 3.5 and EMWAC's HTTPS"

EMWAC supported TLS 1.2 already back then. Amazing!

Ha, well it is a recreation and obviously not the original. They didn't have the original code so they had to work out how it was made. There's a readme which explains the process: https://www.microsoft.com/en-us/discover/1994/readme.html

I get infinite redirects to profile.microsoft.com for that link if I load it in a browser with live.com cookies. Kinda sad, really.

I guess HTTPS stood for HTTP Server.

Is microsoft.com currently not loading for anyone else? I'm getting ERR_SSL_UNRECOGNIZED_NAME_ALERT in Chrome, but I'm not seeing it mentioned anywhere else (e.g. Twitter).

Works for me. For future reference, isup.me

I don't miss flash intos though. Or frames.

Good thing they were usually optional.

Not so sure about frame, they solved 90% of the problems that SPA's are used to solve today, they could have been improved rather than replaced.

Frames have been improved into iframe and then deprecated in HTML 5.

Frames were a nightmare. You can't link to a page in frames, you can't bookmark it either. Frames break the back button. Come in via search engine? You're only in the main frame, your navigation frame is missing. Want to print? Lol, good luck with that. You always ended up (either intentionally or unintentionally) with a browsing session within someone else's unrelated frameset.

I also heard they were bad for screen readers.

Nitpick: iframes haven't been deprecated in HTML5. In fact they've been extended with new attributes like sandbox and srcdoc. Loading content via javascript is often a better approach but iframes allow you to sandbox content from your website's context, e.g. to prevent XSS attacks etc.

A research project presenting pages from Geocities: http://oneterabyteofkilobyteage.tumblr.com/

Pages from 1996 start 102 pages prior to the last page (at present: http://oneterabyteofkilobyteage.tumblr.com/page/10743 )

Ironically, I think you'll have much better luck looking at web design books from that era, instead of links on the web. I just pitched a bunch of my design books from that area, and they were full of "cutting edge" examples.

That O'Reilly GNN site is beautiful. I miss this simplicity.

Need 3MB worth of JQuery and some swooping in animations.

And a couple-dozen home-phoners, but try to keep the number of per-click and scroll trackers down to under 10.

It probably predates the blink tag, but I still think it needs it. Some marquee might help.

I genuinely like the graphic + nav

It loaded so fast!

There's even a Twitter account (created in 2013) that checks if it's still up, every 3 hours.


/r/abandonedweb. It's also ironically now a victim of time.

If you want to go all the way back, UNC still hosts ibiblio.org, which has links to the first website at CERN http://info.cern.ch/ and TBL's first page.

Hard to believe Softhome.net is still around, and has changed little since 1996:


Also funny that my email address I signed up for in 1996 with them is still active, even though I dont check it for years at a time.

My web page is still around http://homepages.ihug.co.nz/~keithn/ it was mostly done pre 96 .... not that it was really well designed, I just spammed bezels and had a play with this new cool java thing.

its an embarrassingly amusing slice of life :)

The US Presidential office has archived the many iterations of it's 'White House' webpage.


See : zafka.com while a small amount has been added, it was designed around 1994.

This is a very important reason why books, in general, contain better information that websites. On websites, people care a lot less about the correctness of the information. You can just update stuff later (of course, this doesn't always happen).

Also, sites are a very volatile medium. I often bookmark pages with interesting information to read later, and it inevitably happens once in a while that a site went down and I just can't find the information anymore.

> Also, sites are a very volatile medium. I often bookmark pages with interesting information to read later, and it inevitably happens once in a while that a site went down and I just can't find the information anymore.

I had the same experience and that's why I made a browser extension that archives pages when you bookmark them. (https://github.com/rahiel/archiveror)

Maybe something that archives to IPFS would be interesting. As things are marked as interesting, they are both archived and distributed based on interest.

I still have my bookmarks.html file I started building in 1995, but almost everything in it has rotted away. It's a shame too because a lot of the stuff in there would still be useful or interesting, but nobody wants to pay even a nominal fee to keep it online.

I collected a list of ~15+ archival tools on a discussion of Wallabag last month: https://news.ycombinator.com/item?id=14686882

Happy to discover yours!

Here's one more! https://www.pagedash.com

Launching soon. We use a browser extension to mine your current tab on a click of a button!

> I often bookmark pages with interesting information to read later, and it inevitably happens once in a while that a site went down and I just can't find the information anymore.

I've recently had this problem with some online fiction that I had bookmarked. Now, I was able to recover thanks to the Wayback Machine, but I really shouldn't depend on that.

I should really put some thought into archiving pages I like or getting a Pinboard account.

I have this problem too, thankfully archive.org has been able to resurrect most of the text based sites I bookmarked ages ago. Such an invaluable resource.

Linkrot is a real problem. Especially for those sites that disappear before the archive can get to them.

On another note, the more dynamic the web becomes the harder it will be to archive so if you think that the 1994 content is a problem wait until you live in 2040 and you want to read some pages from 2017.

Turns out the solution to every stack overflow post will be

"JavaScript is required"

Content from Stack Overflow has higher odds to survive than this, they've uploaded a data dump of all user-contributed data to archive.org: https://archive.org/details/stackexchange. It's all plaintext. This is really generous of Stack Exchange and shows they care for the long-term.

I assume the anonymisation is just on votes? It doesn't seem 100% clear at first glance.

That's actually one of the reasons all my personal stuff gets built as HTML/CSS, then just use Javascript for quality of life stuff (image lightboxes that work without putting #target in browser history, auto-loading a higher-res image -that sort of thing).

I know I won't be maintaining it forever, but I want it to be accessible through the archive.

Server-side rendering please save us.

Well, there's now Chrome headless which is slowly edging out PhantomJS for such use cases.

Are you suggesting running an entire web browser on the server-side just to render each client request to HTML?

That VC money must really be flowing...

This article was posted on HN a while back suggesting exactly that: https://hackernoon.com/leaner-alternatives-to-server-side-re...

It's just as ridiculous as it sounds.

No. It's only needed to playback archives of web pages that only work with old JavaScript libraries enabled.

But then again there's WebRecorder for exactly that.


It's actually fairly easy to record web sites despite how dynamic they are; all you have to do is save the response data of each XHR (and similar requests) and the rest of the state (cookies, urls, date/time, localStorage, etc).

For even more accuracy save a Chromium binary of the version at the time so it'll look exactly as intended.

Solution: https://ipfs.io

> The average lifespan of a web page is 100 days. Remember GeoCities? The web doesn't anymore. It's not good enough for the primary medium of our era to be so fragile.

> IPFS provides historic versioning (like git) and makes it simple to set up resilient networks for mirroring of data.

IPFS is good and useful, but it only retains what people choose to retain.

If geocities.com/aoeu isn't popular, then IPFS won't store it unless someone bothered to pin it. And as soon as they stop, it'll disappear.

You need a dedicated host (like archive.org) to retain it, or volunteers willing to coordinate and commit their resources. Otherwise it's just more resilient (a good thing), but not permanent.

> Otherwise it's just more resilient (a good thing), but not permanent.

It's not "just" more resilient, It's also much more elegant and convenient: with the current web you need to go find some archive version of the dead link you found, while with IPFS the link can simply work, even after the creator stops hosting it.

If literally no one in the world bothers to keep a piece of content, not even archive organizations, what do you suppose could possibly work?

Hello fellow dvorakist!

IPFS is not a real solution at the moment. It's hard to use it, the default daemon is so agressive Hetzner nearly blocked my server due to it's scanning, and your site needs to be relative-url based to be put on IPFS.

On the other hand, nobody is talking about the problem of domains: yes, linkrot is a thing, but many are due to dead domains and dead blogging/content silos.

I've had to deal with Hetzner and IPFS too -- my conclusion is that it's Hetzner who are aggressive here. In of the cases I had fixed the dialing-local-networks behaviour, and then Hetzner still continued to block the server for about a week. They blocked it on 25-Dec and released it on 31-Dec.

> Remember GeoCities? The web doesn't anymore.

Bad example:


Mixed example. Only a small fraction of Geocities pages / content have been preserved. Most of it is lost permanently.

Rather the opposite, a very large fraction of the original pages have been preserved. Many of them were deleted long before by the owners themselves and/or rehosted elsewhere.

Original Geocities content is probably the oldest large body of internet content that will be preserved for many years to come and it will look roughly how it looked on Geocities because it wasn't relying on much of anything other than basic HTML and some images.

For whatever reason, my site (Hollywood/Academy/4430) disappeared from Geocities before the mass archival happened. I certainly didn't delete it.

I search for it once a year, hope springs eternal.

Ai. I searched a bit for you, as well in the original crawls but to no avail. Most likely there were very few inbound links to that site which meant it wasn't discovered before it got wiped. I was still archiving stuff long after Geocities had officially shut down, for weeks new stuff turned up. 4428 is there, the next one after that is 4460, I suspect that if it had been archived you'd have found it by now.

Have you checked Jason Scott's torrent of all things geocities as well?

LOL, same here.


Looks like I even got some of the images on disk here, but no html.

Jacques, I never knew it was you behind reocities, thanks for saving a large part of the old content geocities! The story on what you did at creating reocities alone is a worthwhile read. [1]

[1] http://reocities.com/newhome/makingof.html

edit: Just found again (and remembered) web.archive.org has 2 of its pages, so some of it is still out there.

I similarly look for the first page I ever created. I was maybe 14 years old and created a pretty thorough Goldeneye 007 fan site, with walkthroughs etc, on angelfire.

I distinctly remember a misspelling in my url--- /liscensetokill --but I can't be sure of my parent category anymore. I think /mi5/

Haven't seen it in over 15 years, though.

I was in geocities/South Beach/lounge but I can't remember the number and it annoys me. I know for a fact archive.org indexed it but they didn't at all after geocities moved to ~ urls. (That was after the Yahoo buyout IIRC)

Do you remember any text fragment? I can grep for it.

Either that or you can spend an hour clicking :) :


Thanks for the link! It looks like reocities didn't pick it up. (I downloaded all 250 links and grepped myself)

Clever. :( Sorry!

A similarly really annoying thing is when you find old technet articles, stack overflow questions, or blog posts that seem potentially really useful, but that have broken images, broken links, etc... so the content (possibly extremely useful at the time) is completely useless now.

It really stresses the importance of directly quoting / paraphrasing the content you want in your plain text, and not relying on external resources for posterity.

The one I hate is when I find old forum posts explaining how to do something in the physical world and all the embedded photos are broken. Not because the image host went out of business or the user deleted them, but just because they didn't log in for a year and the host deactivated their account. This is why whenever I link a photo I upload it to my own server and I never change the URL.

Or when they're broken because the image host discontinued third party image hosting/started charging for it, despite said feature being the only reason their site caught on in the first place.

Looking at you Photobucket. And all those useful images now replaced with a meaningless Photobucket placeholder.

I apologize in advance as there's no non-morbid way to ask this but... what happens to the images on your server if you die tomorrow? It would be exactly the same situation, right? They would exist until your bill is due in a couple years then your account will be deactivated and your images will linkrot.

I have the server paid up for 10 years and the domain for 15. But it's true, eventually all things must end. I do have credentials for all the things written into my will so I guess it ultimately depends on how much my children care about preserving my helpful forum posts.

He's not advocating people rely on his server, he's saying hosting it himself allows him to quote freely without worry. If someone else wanted to quote him and followed his example, they would not have a problem when he died.

Also 500px (I think--if not a similar image host) has recently banned all 3rd-party images, at least at the free level, which has broken a TON of the old forum posts I want to see. It was the defacto image host, kind of like imgur is now.

Wikipedia also hit pretty hard by link rot, nice thing there at least is that volunteers can try to fix it

For anyone curious, you can help fix dead reference links on Wikipedia in just a few seconds. If you find a page that has a dead link (or several), click the "View History" tab at the top, then click "Fix Dead Links" to run the InternetArchiveBot on the page.

More info: https://en.wikipedia.org/wiki/Wikipedia:Link_rot

Genuine question. Why isn't this done automatically? Astronomical (even suspicious) amounts of continuous outgoing bandwidth?

Just speculating but possibly the difficulty of finding a truly dead link. Often they just redirect to a registry asking if you want to buy the domain.

but redirects are also present in legitimate pages

Wikipedia is working with the Internet Archive to automate the prevention of link rot.

I'm reading Raymond Chen's Old New Thing blog articles from 2006. Most of the links that I try (75%?) are dead.

See also: Best of the Web '94 Awards. Presented at the First International Conference on the World-Wide Web, Geneva, Switzerland, May 1994.


What's cool isn't how fast some of these technologies become obsolete, such as various Java applets and cgi-bin connected webcams. It's the static content that can survive until the end of time.

Like Nicolas Pioch's Web Museum. Bienvenue!


I must say the swiss-prot links from then still work. You are redirected to the uniprot.org website but the links work.

> [The Rolling Stones] actually streamed a concert on the platform in November of that year, using an online provider named MBone for some reason..

The MBone was not a "provider", it was an IP multicast network. This was the only way to efficiently stream video content to thousands of simultaneous clients before the advent of CDNs. https://en.wikipedia.org/wiki/Mbone

MBone lost its reason to exist when the internet backbone turned out to be much easier to upgrade than the edges. There's not much need to implement a complicated proxy system to save bandwidth on the backbone when almost everybody is constrained by their last mile link.

For years I thought TV stations might connect to the MBone to do simulcasts for people on the Internet once broadband became widespread, but the world moved on before it could become a reality. Part of me still thinks this is a missed opportunity but it's too late to cry over it now.

My internship, in 2006, involved working on an IPTV prototype for Philips. I remember working on a multicast server and the set-top-box client -- the server had a DVB-T card, and rebroadcast all the BBC channels over the LAN. Since the stream was always there, changing channel was extremely fast. DVB broadcasts information packets regularly, containing the TV schedule and so on. These were also forwarded over IP, and then cached by the STB.

It was neat, but presumably most of the multicast stuff was abandoned not long after I left.

Someone else worked on a system to help the user schedule their TV around the broadcast times.

The problem with using multicast for content delivery is more down to the subscriber end and how ISPs manage their networks. I worked on a (recent) project that uses multicast in the broadcast context at the mezzanine level. When you've got full control of the network, end to end, it works a lot better.

The subscriber end...and the goddamn Internet backbone. You can't route multicast across the core, which is a huge impediment to its adoption.

There was a vicious circle where the core didn't do multicast so the big iron routers didn't put multicast support in hardware, which made it impossible to support multicast on the core...

AT&T U-verse uses multicast for live TV streams. They start off as unicast streams but after about 60s the multicast stream is picked up.

IPTV still uses routed multicast, it just doesn't work on the internet anymore.

I've got a book about javascript from 1995. It mentions closure once and says something like "... but you'll never need to use that feature of the language."

How I laughed.

Never say "Never" or "Always".

Ok, it was 96 or 97. But the point still stands.

Somewhat Unrelated.

I noticed that the wayback machine no longer lists historical sites if the latest/last revision of robots.txt denies access. Has anyone else experienced this?

In the late-90's I helped build one of the first fortune-500 e-commerce web sites. The website was shutdown years ago, but it view viewable on the wayback machine as recently as a year ago. The company in question put a deny-all robots.txt on the domain, and now none of the history is viewable.

It's a shame -- used to use that website (and an easteregg with my name on it) as proof of experience.

Applying denials retroactively makes no sense whatsoever -- it defeats the whole point of archiving.

Do you know if archive.org still has a copy?

I read an internet article from 2017 and none of the stuff worked without access to all sorts of third party scripts and crap.

Yip, the irony is that the article about this does not even load on my tablet. Perhaps because of ad blocking at dns level, perhaps not, it does paint a picture though.

edit: also doesn't load on a desktop here.

I Bought a Book of Restaurant Recommendations from 1957 and None of them would Serve Me Dinner Any More

There are plenty of restaurants that are older than 1957. So, you would probably find at least one that was still open. I think the oldest in the US is Fraunces Tavern. New York, NY. 1762.

But, worldwide several predate that. Botin Restaurant dates back it 1725... Oldest I could find was Stiftskeller St. Peter. Salzburg, Austria is still in the same building from ~803.

I bought a book from 1850, and I could still look up all the references in a university library, or even find most of them on Google Book Search.

Not all information on the Internet is created equal and not all information needs to be available in perpetuity.

"Free $tuff From the Internet" from 23 years ago is closer to a restaurant guide than a book containing literary or academic references. I'd imagine most of the references in an 1850s coupon book or almanac would also be long dead.

This. I'm really excited about projects like IPFS but I'm not totally comfortable with the "everything persists forever" philosophy as it stands now. Preventing link rot is a very worthwhile cause, but content creators should have control over what persists and for how long (see: "Right to Be Forgotten").

> content creators should have control over what persists and for how long (see: "Right to Be Forgotten").

Strange idea.

The "Right to Be Forgotten" in EU law refers to personal data not to works that you have published, it is about privacy not copyright. See the Commision's factsheet on the subject: http://ec.europa.eu/justice/data-protection/files/factsheets...

True, it's not exactly the same thing. But I think there is room for a conversation about "published" content as well. The internet covers a much broader scope of content than say print media. I think it is interesting to consider what should be considered "published works" online. Some people think it's fair to say anything you put online is fair game to persist forever. Others like myself think maybe we need a bit more of a fine-grained definition of what constitutes "works" and what persistence properties they should have

I think this is a best (but not perfect) analogy for hyperlinks.

I bought a telephone book from 1942 and none of the phone numbers work

The phone numbers accompanying the phat 90s slang scrawled across my 1996 yearbook no longer are a way of contacting my old classmates.

Pennsylvania-6-5000 still works.


Hopefully, one of the advantages of having better technology is to avoid repeating the same mistakes of the past.

My uncle Pat wrote this book (and multiple others in the same series). I'm amazed Vice is talking about it over twenty years later and I'm sure he will be too once I show him the link!

I had lots of fun reading them as an Internet-addicted kid -- but several of the links were dead even before it was officially published.

"It was possible to get on a text-based version of the internet for free in many areas, using only a modem and a phone line. An entire section of Free $tuff From the Internet offers up a lengthy list of free-nets, grassroots phone systems that essentially allowed for free access to text-based online resources."

Makes me want to try to read a Markdown-only Internet browser, which treats native Markdown documents as the only kind of Web page.

You would have to give up all the dynamic convienience we take for granted. Menus would be just links. Basically you would have HN like sites for the small ones and the big ones would have images. That's it.

On second thought that wouldn't be so bad considering the bloat we have to deal with nowadays. (1 MB per page for just news from sites like CNN ugh.)

I long for the day that browsers and search engines support RFC 7763.

I owned (and still do own) this book! I would spend many hours as a teenager going through the links and accessing all the cool stuff in the book. This really brings back memories!

And yes, the way I got on the internet in those days was to dial into a public Sprintlink number, then telnet to a card catalog terminal in the Stanford library, and then send the telnet "Break" command at exactly the right time to break out of the card catalog program and have unfettered internet access. Good times.

I was lucky. A public library had a dial-in account with lynx as the shell and the card catalog an inter-library loan systems served as web pages. I just had to hit 'g' and type in any URL, including Gopher, WAIS, Archie, or telnet ones. This left me no great way by itself to get things downloaded locally, but I could telnet into a shell elsewhere, download things to there, and feed them back via zmodem through the telnet and the dialup.

That was before I had a local PPP provider, of course.

Ahh telnet, good times, I miss that web...

Did you try looking them up at archive.org? I expect that many of them will work there.

The web is ephemeral unless somebody archives it. Many companies offer an archive service for your sites for a fee, and archive.org does it to provide a historical record,

Yup. Recently promised a colleague a pdf. I knew what I was looking for, who wrote it and and which site it was on (regional site of my employer). It even featured highly on google (showed up on related searches).

Zilch. Nada...couldn't find it anymore. Gone. Something I had easily chanced upon before I know couldn't find with directed searching. They must have restructured their site.

The article indicates that the "free" stuff on the internet was hidden away in weird places - ftp servers and the like. No google to find it for you, the only way was by word of mouth, or I guess via published book.

Answers a question I always had about "Snow Crash" by Neal Stephenson. The main character, Hiro Protagonist (I still giggle at that name), sometimes did work as a kind of data wrangler - "gathering intel and selling it to the CIC, the for-profit organization that evolved from the CIA's merger with the Library of Congress" (Wikipedia).

I always wondered what made that feasible as a sort of profit model, and I guess now I know - that was the state of the internet in 1992, when the book was published. Seems like a way cooler time period for Cyberpunk stuff, I'm almost sad I missed it :(

We did have search engines, just not as fast or encompassing as Google. Archie came online in 1990 ( https://en.wikipedia.org/wiki/Archie_search_engine ). Jughead and Veronica followed ( https://en.wikipedia.org/wiki/Jughead_(search_engine) , https://en.wikipedia.org/wiki/Veronica_(search_engine) ). More often than not a search started with a Usenet newsgroup FAQ.

Don't forget we had Gopher and WAIS before the Web, too.

Of course, still, finding the good stuff, rather than whatever has been SEO'd in the google listings is difficult in some subjects.

Well if you purchase a yellow book from 1994 I doubt you will find many businesses listed there to still exist and show correct phone numbers...

??? Huh ???

You think most businesses are less than 24 years old? Or do you think businesses change phone numbers every few years?

Most businesses fail within the first few years. 24 years later much would have changed.

Man, I really miss FTP. I remember when you would just FTP to the site you were using and grab a binary from their /pub/. Mirrors were plentiful, and FXP could distribute files without needing a shell on a file server.

I remember when you could just mount \\ftp.microsoft.com\pub as a drive on your PC. Raw, unauthenticated, unencrypted SMB over the public Internet. At least it was read-only share. Good times.

I remember Geocities pages would always be trying to read from my floppy drive. People would not realize you had to upload your pictures before other people could see them and just would put D:\pictures\goatse.jpg as the src of the img tag because they didn't really understand the web (not that I blame them.) Of course, the browser had no problem trying to load local images from untrusted code on a remote host, security policies were, uh, extremely permissive at the time.

(Of course, there would be plenty of attempting to read from the C:\ drive too, but that didn't make a loud, unexpected sound like reading from the floppy drive did)

Simple times.

Yeah, you had to remember to change all the links in the html because FrontPage Express kept saving the local full path so you could keep working on and previewing your site on a browser. HTTP server software (on Windows anyway) was so obscure that you had no idea how to get a free one running on your desktop unless you had access to Windows NT with IIS. All that changed when I installed Mandrake for the first time...

I don't remember that era working all that well except if you were only talking about personal homepages at a few large universities. FTP was a bit faster at first but the experience of tracing around an unfamiliar directory hierarchy and guessing at naming conventions is not something I get nostalgic about.

Also, consider that right-click save as works the vast majority of the time and the exceptions are for content which simply wouldn't be available (e.g. video streams) for direct download due to IP concerns.

The unfamiliar directory structure was always an issue, but even worse was using ftp on windows, which defaulted the transfer mode to ASCII, so if you forgot to change it, you'd end up with corrupt files most of the time.

I really miss gopher...

I miss that "UH-OH" sound from WS_FTP

It's the same as for an ICQ message?


I mean, this isn't all that surprising. Not unlike buying a twenty-year-old visitor's guide to a city and finding that a number of the shops and restaurants have closed, the stadiums have different names, etc.

I don't know about that.

People tend to think that our society is very well documented but if you look at what is left of old societies it is usually whatever was engraved in stone or if you're lucky what remained on paper. With the internet replacing most or even all of the paper storage in the short term it is true (besides our present day paper not being acid free to last long enough anyway), we are better documented than ever. But in the longer term it may be a huge gaping hole in history.

And that's different than cities changing at their regular pace and books becoming less dated. It's like the visitor guide itself is no longer readable and therefore you won't even know what was there in the past.

This is more like discovering streets don't exist or don't go anywhere anymore

In a way it's convenient. The URLs fail instantly. Trying to find things referenced by an old book can be pretty painful.

Knoxville! Knoxville! Knoxville!

But cool URLs don't change.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact