
This Page is Designed to Last - tannhaeuser
https://jeffhuang.com/designed_to_last/
======
shasheene
There's no reason why a web browser bookmark action doesn't automatically
create a WARC (web archive) format.

Heck, with the cost of storage so low, recording every webpage you ever visit
in searchable format is also very realistic. Imagine having the last 30 years
of web browsing history saved on your local machine. This would especially be
useful when in research mode and deep diving a topic.

[1]
[https://github.com/machawk1/warcreate](https://github.com/machawk1/warcreate)

[2] [https://github.com/machawk1/wail](https://github.com/machawk1/wail)

[3]
[https://github.com/internetarchive/warcprox](https://github.com/internetarchive/warcprox)

EDIT: I forgot to mention
[https://github.com/webrecorder/webrecorder](https://github.com/webrecorder/webrecorder)
(the best general purpose web recorder application I have used during my
previous research into archiving personal web usage)

~~~
tempestn
This was what made me convert from bookmarking to clipping pages into Evernote
around 6-7 years ago. I realized I had this huge archive of reference
bookmarks that were almost useless because 1) I could rarely find what I was
looking for, if I even remembered I'd bookmarked something in the first place,
and 2) if I did, it was likely gone anyway. With Evernote I can full text
search anything I've clipped in the past (and also add additional notes or
keywords to ease in finding or add reference info).

Since starting with replacing bookmarks, I've moved other forms of reference
info in there, and now have a whole GTD setup there as well, which is
extremely handy since I can search in one place for reference info and
personal tasks (past and future). Only downside is I'm dependent on Evernote,
but hopefully it manages to stick around in some form for a good while, and if
it ever doesn't, I expect I'll be able to migrate to something similar.

~~~
charlesdaniels
Shout out to [https://joplinapp.org/](https://joplinapp.org/)

I was an Evernote user when I was on macOS. When I switched to Linux, a proper
web clipper was something I really missed. I'm now on Joplin and it does
everything I used to use Evernote for and then some.

It even has vim bindings now!

As far as longevity goes, I think they got their archive / backup format right
- it's just a tarball with markdown in it.

~~~
dragonsh
No need of proprietary code and apps why not build it into browsers. I have
seen Firefox and Chrome can download web pages. So it will be nicer if they
can download the bookmarked pages and store in a local html, css, image
folder. I think it's pretty easy to achieve.

Also people need to move away from those esoteric reactjs, angular, vuejs and
plethora of CMS as API or static site generators relying on some js framework
which won't last even 2-3 years. Use a static site generator which can
generate a plain html, like static site generators built on pandoc, python
docutils or similar.

Personally I like restructuredText as the preferred format for content as its
a complete specification and plain text. So the only thing in this article I
will change is that content can also be in rst format and then generate html
from it. Markdown is not a specification as each site implements their own
markdown directives unlike restructuredtext specification and most of the
parsers and tooling are little different from each other.

~~~
dbtx
> Markdown is not a specification

Not by that name... [https://commonmark.org/](https://commonmark.org/)

~~~
dragonsh
It is stil not a specification like restructuredText[1]. Also wikiMarkup
(which really started this markdown) is different from GitHub markdown, which
is different from other markdown editors. Also many sites use their own
markdown versions.

If you are in restructuredText world there is one specification and all
implementation adhere to it, be it pandoc, sphinx, pelican, nikola. The beauty
of it is that it has extension mechanisms which provides enough room for each
tool to develop it. But markup can be parsed by any tool.

[1]
[https://docutils.sourceforge.io/docs/ref/rst/restructuredtex...](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html)

~~~
m463
I don't know why markdown is so popular other than maybe "it was easy to get
running" or "works for me".

It's better than "designed by a committee" standards, but it lacks elegance or
maybe craftsmanship.

~~~
setr
Because its inherently appealing, close to what you wanted intuitively, and if
you're only dealing with a single implementation of it, works fairly well.

You don't really get bit by its lack of a standard and extensibility until
after you've bought in.

It's essentially designed by the opposite of a committee -- rather than
including everything but the kitchen-sink, it contains support for almost no
usecases except the one. Which is very appealing, when you only have the one
usecase.

~~~
dragonsh
Well rst is better than markdown from day one. The only reason it became
famous is thanks to wikimarkup.

So markdown needs to thank the popularity of Wikipedia for its success, as rst
did not have any application like Wikipedia. But still rst is used widely
enough with its killer Sphinx, readthedocs and now its kind of de-facto
documentation writing markup in Python and many open source software world.

------
mark242
I fundamentally agree with the principle -- that pages should be designed to
survive a long time -- however the steps the author lays out I completely
disagree with.

"The more libraries incorporated into the website, the more fragile it
becomes" is just fundamentally untrue in a world where you're self-hosting all
of your scripts.

"Prefer one page over several" is diametrically opposed to the hypertext
model. Please don't do this.

"Stick with the 13 web safe fonts" assumes that operating systems won't
change. There used to be 3 web safe fonts. Use whatever typography you want,
so long as you self host the woff files.

"Eliminate the broken URL risk" by... signing up for two monitoring services?
Why?

I think this list of suggestions does a great disservice to people who just
want to be able to post their thoughts somewhere. There's an assumption here
that you'll need to be technically capable in order to create a page "designed
to last" and frankly that is not what the internet is about. Yes, Geocities
went away. Yes, Twitter and Facebook and even HN will go away. But the answer
sure as hell isn't "I teach my students to push websites to Heroku, and
publish portfolios on Wix" because that is setting up technical gatekeeping
that is completely unnecessary.

~~~
notatoad
>Use whatever typography you want, so long as you self host the woff files.

or use Google Web fonts, and set let last option in your font-family to be
"serif" or "sans-serif" to let an appropriate typeface be used if your third-
party font is unreachable. That's the beauty of text, the content should still
be readable even if your desired font is unavailable.

~~~
ryandrake
Or don’t specify any font at all and leave it up to the user’s preference. Why
presume you know better than the user?

~~~
hashmal
When you go to a restaurant you let the chef prepare food for you.

Telling him to back off and let you cook because he can't know better than you
(his user) would be absurd.

Same thing with design and typography. It requires skill and taste, and
hopefully people will be delighted or simply consume the content for what it
is, because the design/cooking just reveals that content in a
convenient/useful shape.

~~~
bjoli
Most fonts picked by designers suck. Plain and simple. I override fonts for
most websites I frequent.

~~~
hashmal
Can you elaborate on why/how they suck? Do you have example links, to set a
common ground for the conversation?

I think most fonts that get your attention suck, the best ones are invisible
and get you directly to the meaning of text, without getting in the way. So
maybe there's a kind of bias (selection or sampling bias?) operating here?

~~~
a1369209993
Because they they are not the single system default sans-serif and single
system default sans-serif-monospace fonts that all websites MUST use, period,
no discussion. As you put it:

> fonts that get your attention suck

If I can tell the difference between your font and the system default font,
your font sucks; if I _can 't_ tell the difference, what's the damned point?

~~~
strenholme
> the single system default sans-serif and single system default sans-serif-
> monospace fonts that all websites MUST use, period, no discussion.

The web standards allow a website to use any WOFF (or WOFF2) font they wish to
use. Please see [https://www.w3.org/TR/css-
fonts-3/](https://www.w3.org/TR/css-fonts-3/)

~~~
a1369209993
The web standards are wrong. This shouldn't be surprising, since they also
allow a website to use javascript and cookies.

~~~
strenholme
Well, if it makes you feel any better, my website renders just fine on Lynx
(no Javascript nor webfonts needed to render the page), complete with me
putting section headings in '==Section heading name==', which is only visible
in browsers without CSS. Browsers with modern CSS support see the section
headings as a larger semibold sans-serif, to contrast with the serif font for
body text. [1]

[1] There are some rendering issues with Dillo, with made the mistake of
trying to support CSS without going all the way, making sure that
[http://acid2.acidtests.org](http://acid2.acidtests.org) renders a smiley
face, but even here I made sure the site still can be read.

[2] Also, no cookies used on my website. No ads, no third party fonts, no
third party javascript, no tracking cookies, nothing. The economic model is
that my website helps me get consulting gigs.

[3] I do agree with the general gist of what you’re trying to say: HTML,
Javascript, and CSS have become too complicated for anything but the most
highly funded of web browsers to render correctly. Both Opera and Microsoft
have given up with trying to make a modern standards compliant browser,
because the standards are constantly updating.

~~~
a1369209993
> Well, if it makes you feel any better, my website renders just fine on Lynx

It doesn't; I only use lynx when someone tricks apt-get into updating part of
my graphics stack (xorg, video dirvers, window manager, etc) and researh is
needed to figure out how to forcibly downgrade it, and then only because I
_can 't_ use a proper browser without a working graphics stack.

> the general gist of what you're trying to say: HTML, Javascript, and CSS
> have become too complicated for anything but the most highly funded of web
> browsers to render correctly.

This is subtly but critically wrong; I am saying that it is _necessary_ than
web browsers _do not_ render websites 'correctly'. The correct behaviour is to
_actively refuse_ to let websites specify hideous fonts, snoop on user viewing
activity, or execute arbitrary malware on the local machine.

> Browsers with modern CSS support see [...] the serif font for body text.

My point exactly.

------
atoav
When I studied media science one of the most lasting experiences I had was a
talk with one lady of the viennese film museum (on of the few film museums
that store actual films instead of film props).

As a digital native I never gave it a thought, but she told me that there is a
collective memory gap in films that have been shot or stored digitally. With
stuff that has been stored on film, there was always _soem_ copy in some
cellar and they could make a new working copy from whatever they found. With
digital technology this became much _much_ harder and costly for them, because
it often means cobbling together the last working tape players and maintaining
both the machines and the knowledge of how to maintain them. With stuff on
harddrives a hundred different codecs that won’t run on just any machine etc
this combined to something she called the _digital gap_.

I had never thought about technology in that way. Nowadays this kind of
robustness, archiveability and futureproofing has become a factor that drives
many of my decisions when it comes to file formats, software etc. This is one
of the main reasons why I dislike relying soly on cloud based solutions for
many applications. What if that fancy startup goes south? What happens to my
data? Even if they allow me to get it in a readable format, couldn’t I just
have avoided that by using something reliable from the start?

I grew to both understand and like the unix mantra of small independent units
of organizations — trying as hard as possible not to make software and other
things into a interlinked ball of mud that falls apart once of the parts stops
working for one or the other reasons. Thinking about how your notes, texts,
videos, pictures, software, tools etc. will look in a quasi post apocalyptic
scenario can be a healthy exercise.

~~~
Twisell
On this subject you can dive into the story of the missing "Doctor Who" TV
serials.

Some tape of master's were infamously reused to store other contents. Beside
the whole archive problem come from the reusable nature and scarcity of the
chosen storage. I think I've read something about reusing paper as well in
medieval time.

[https://en.wikipedia.org/wiki/Doctor_Who_missing_episodes](https://en.wikipedia.org/wiki/Doctor_Who_missing_episodes)

~~~
vlz
> I think I've read something about reusing paper as well in medieval time.

This mostly happened with parchment, not paper, but otherwise you are right.
It is called a palimpsest.[1] Sometimes the writing under the writing can be
reconstructed as happened with the oldest copy of Cicero's Republic.[2]

[1]:
[https://en.wikipedia.org/wiki/Palimpsest](https://en.wikipedia.org/wiki/Palimpsest)

[2]:
[https://www.historyofinformation.com/detail.php?entryid=3059](https://www.historyofinformation.com/detail.php?entryid=3059)

------
BiteCode_dev
The author says:

"that formerly beloved browser feature that seems to have lost the battle to
'address bar autocomplete'."

But at least in firefox, if you type "*" then your searched terms in the URL
bar, it actually queries your bookmarks !

There are many such operators, you can search in your history ("^"), your tags
("+"), you tabs ("%"): [https://support.mozilla.org/en-US/kb/address-bar-
keyboard-sh...](https://support.mozilla.org/en-US/kb/address-bar-keyboard-
shortcuts)

My favorite is "?", which is not documented in this link. It forces search
instead of resolving a domain name.

E.G: if I type "path.py", looking for the python lib with this name, Firefox
will try to go to [http://path.py](http://path.py), and will show me an error.
I can just add " ?" at the end (with the space) and it will happily search.

It's a fantastic feature I wish more people knew about.

It very well done as well, as you can use it without moving your hands from
the keyboard: Ctrl + l gets you to the URL bar, but Ctrl + k gets you to the
URL bar, clears it, insert "? ", then let you type :)

It's my latest FF illumination, the previous one was discovering that Ctrl +
Shirt + t was reopening the last closed tab.

~~~
abrolhos
Not sure you're aware of this one too... But, you might like the "Ctrl+Tab"
shortcut as well. With it you can alternate between the last few active tabs,
with thumbnails. Really handy.

~~~
chrismorgan
I don’t think I’ve come across a single Firefox user that ever uses keyboard
shortcuts that has left it that way—all have found the “Ctrl+Tab cycles
through tabs in recently used order” preference and turned it off, so that it
goes through tabs in order, like literally every other program I’ve ever
encountered does with tabs. (Yes, _Alt_ +Tab does MRU window switching, but
that has never been the convention for _Ctrl_ +Tab tab switching.)

Mind you, MRU switching is still useful behaviour; Vim has Ctrl+^ to switch to
the alternate file which is much the same concept, and Vimperator _et al._
used to do the same (on platforms where Alt+ _number_ switched to the numbered
tab, rather than Windows’ Ctrl+ _number_ ), no idea whether equivalent
extensions can do that any more. I have a Tree Style Tab extension that makes
Shift+F2 do that, and it suits me.

~~~
ihuman
If you keep Control+Tab set to cycle through tabs in recently used order, you
can use Command-Shift-Left/Right or Control-PageUp/PageDown to cycle through
tabs in tab-bar order instead.

Additionally, you don't need an extension to jump to a tab anymore.
Command-[1-8] goes to that number tab in the current window, where 1 is the
leftmost tab. Command-9 goes to the rightmost tab.

------
joshspankit
I would actually shift this quite a bit to say if you’re designing your page
to last 10 years, put it on the internet archive on day 1.

Invite them to crawl it, verify the crawl was successful, and even talk about
that link _on your page_.

It removes the risk of domain hijacking, hosting platforms shuttering, and the
author losing interest. P.s. The internet archive is doing excellent work.
Support them.

~~~
majewsky
And as you give a content donation, please also consider a monetary donation
to keep the lights on at the Internet Archive:
[https://archive.org/donate/](https://archive.org/donate/)

~~~
sudo_rm
I started a recurring donation through your link. Thanks for posting this.

------
PaulRobinson
A shift to independent publishing is needed. I used to have sites that died
because the upkeep became tiresome, and if - a professional developer with
almost 25 years experience of writing web applications - find it tiresome, can
we blame people for wanting to use the big platforms?

I think using a static site generator might be OK. Common headers and footers
help, and RSS might definitely be a good thing, but that seems to be dying.

One idea from this article I liked was "one page, over many". I don't think he
meant have one single page on your website, but rather one per directory, and
like he has with this article have one directory for a thought or essay or
piece of something you want documenting, and just have an index.html in it.

I like this because I think the one thing that has killed off most personal
websites is not the tech tool chain, but that "blogging" created an
expectation of everybody becoming a constant content creator. The pressure to
create content and for it to potentially "go viral" is one of several reasons
I just tore down several sites over the years.

Around this time of year I take a break from work and think about my various
side projects, and sometimes think about "starting a blog again". I often
spend a few hours fiddling with Jekyll or Hugo, both good tools. Then I sit
and think about the relentless pressure to add content to this "thing".

I like this idea instead though. No blogs. No constant "hot takes" or pressure
to produce content all the time. Just build a slowly growing, curated, hand-
rolled website.

I still think there might be a utility in having a static site build flow with
a template function, but a simple enough design could be updated with nothing
more than CSS.

Bit to think about, here... interesting.

~~~
lunchables
I use a combination of asciidoc and hugo to generate my static website. It
means that I can easily regenerate the website using whatever tool I want in
the future or even just easily update the template for the existing site. If
something happens to asciidoc, there are lots of converters that would allow
me to move to another format or presumably some format in the future. Markdown
and restructuretext are also really good options.

------
fyp
I don't think there's any good solution to the dead link problem. For example
there are 11 links in this article:

    
    
      https://jeffhuang.com/
      https://gomakethings.com/the-web-is-not-dying/
      https://archivebox.io/
      https://webmasters.stackexchange.com/questions/25315/hotlinking-what-is-it-and-why-shouldnt-people-do-it
      https://goaccess.io/
      https://victorzhou.com/blog/minify-svgs/
      https://evilmartians.com/chronicles/images-done-right-web-graphics-good-to-the-last-byte-optimization-techniques
      https://caniuse.com/#feat=webp
      https://uptimerobot.com/
      http://www.pgbovine.net/python-tutor-ten-years.htm
      http://jeffhuang.com/designed_to_last/
    

How many of these will still be alive in 10 years? How many times do you have
to fix your page to make your page "last"?

~~~
notatoad
I think the Stack Overflow guidelines have "solved" this problem in about the
cleanest way currently possible: expect links to die, and include the relevant
information in your answer.

If the link still works when it gets clicked on that's a bonus, but it
shouldn't need to be available for the content you're reading to be
understandable.

~~~
juststeve
And there's also the HTTP 300 codes if content has been moved.

~~~
hk__2
These work only if you move stuff around on the same website. If you switch
domains you can't just ask the new domain owner to redirect requests to your
new website.

------
einpoklum
Make sure that archive.org - the Internet Archive - catches your website in
"The Wayback Machine". Catering to that is a pretty good strategy for
archiving for at least the next couple of decades, considering that
institute's staying power.

And on that note - consider donating to them.

~~~
1f60c
The Internet Archive is a fantastic resource. And right now, they happen to
match every donation two to one (so $5 becomes $5 + 2 * $5 = $15 dollars!)

~~~
kortilla
How do they match a donation to themselves?

~~~
trishmapow2
They currently have a deal with a donor who will donate $2 for every $1 the
Archive gets in that time period.

~~~
bonoboTP
Just sign such a deal with two donors and boom, feedback loop, exponential
growth, infinite money!!

------
whack
Let's be honest with ourselves. The best way to make your content last for a
_long_ time is to host it on a platform that is free and very successful. For
example, whatever photos I posted on Facebook 12 years ago? Still alive and
kicking. The articles I've published on wordpress.com 7 years ago? Still in
mint condition, with 0 maintenance required.

In comparison, the websites that I've built and hosted or deployed myself,
have constantly required periodic work just to "keep the lights on". I went
out of my way to make this as minimal and cheap as possible, but even then, it
hasn't been nearly as simple as the content I've published on wordpress.

At some point, people's priorities change. Perhaps due to new additions to the
family, medical circumstances, or even prolonged unemployment. And when that
happens, even the smallest amount of upkeep, whether it is financial,
technical or simply logistical, becomes something they have no interest in
engaging with.

If we really want our content to last, not just for 10 years but for a
generation, our best bet is to publish it on a platform like wordpress.com.
One which requires literally zero maintenance, and where all tech
infrastructure is completely abstracted away from you. I know this isn't going
to be a popular idea with the HN crowd, and I do not blame anyone at all for
wanting to keep control over their content. People are free to optimize along
whatever dimensions they wish. But if I had to bet on longevity, I would bet
every time on the wordpress article over the self-hosted one.

~~~
winfred
>Let's be honest with ourselves. The best way to make your content last for a
long time is to host it on a platform that is free and very successful. For
example, whatever photos I posted on Facebook 12 years ago? Still alive and
kicking. The articles I've published on wordpress.com 7 years ago? Still in
mint condition, with 0 maintenance required.

You view on timeline is too short. We're not talking about keeping something
online for 7 years, but for 70. If I had followed your advice a few years ago,
I would have deployed on Geocities. Do you know what happened to those
websites?

The question is, is wordpress going to be around in 70 years? No one knows.
But that static HTML page will still render fine, even if it is running in a
backward compatibility mode on your neurolink interface.

~~~
whack
> _The question is, is wordpress going to be around in 70 years? No one knows.
> But that static HTML page will still render fine_

The question isn't whether wordpress will be around in 70 years, but whether
it will outlast your self-hosted website. Anything that is self-hosted
requires significantly more financial/logistical maintenance, and what is the
likelihood of someone continuing to do that for 70 years?

~~~
lunchables
For me it's very easy because those domains are also tied to my email and all
of my other hosted services (gitea, tt-rss, etc.) all use the same domain. So
it's very easy to remember to keep them all alive and active. I've had domain
names active far longer than Wordpress has existed.

------
tgbugs
The issues outlined here are one of the reasons that I am moving as many of my
workflows to org-mode as possible. Everything is text. Any fancy bits that you
need can also be text, and then you tangle and publish to whatever fancy
viewing tool comes along in the future.

I don't have a workflow for scraping and archiving snapshots of external
links, but if someone hasn't already developed one for org I would be very
surprised.

In another context I suggested to the hypothes.is team that they should
automatically submit any annotated web page to the internet archive, so that
there would always be a snapshot of the content that was annotated, not sure
whether that came to fruition.

In yet another context I help maintain a persistent identifier system, and let
me tell you, my hatred for the URI spec for its fundamental failure to
function as a time invariant identifier system is hard to describe in a brief
amount of time. The problem is particularly acute for scholarly work, where
under absolutely no circumstances should people be using URIs or URLs to
reference anything on the web at all. There must be some institution that
maintains something like the URN layer. We aren't there yet, but maybe we are
moving quickly enough that only one generation worth of work will vanish into
the mists.

~~~
Mediterraneo10
> The issues outlined here are one of the reasons that I am moving as many of
> my workflows to org-mode as possible. Everything is text.

That works for some, even most people. Unfortunately, the content I create
will inevitably cite material in languages other than the main document
language. That means that I have to heavily use HTML span lang="XX" tags to
set the right language for those passages, so that (among other things) users
with screenreaders will get the right output. As far as I know, org-mode lacks
the ability to semantically mark up text in this way.

~~~
tgbugs
If it is for blocks of text then you could use #+BEGIN_VERSE in combination
with #+ATTR_HTML, or possibly create a custom #+BEGIN_FRENCH block type, but I
suspect that you are thinking about inline markup, in which case you have two
options, one is to write a macro {{{lang(french,ju ne parle frances}}} and the
other would be to hack the export-snippet functionality so you could write
@@french:ju ne parle frances@@ and have it do the right thing when exporting
to html. The macro is certainly easier, and if you know in advance what
languages you need it shortens to {{{fr:ju ne parle frances}}} which is
reasonably economical in terms of typing.

------
scotty79
All the points about using simple html don't do anything for website rot.

Sites I built with tables and custom js display framework over decade ago,
before people started abusing floats for layout and before js frameworks
happened, still display today perfectly.

Pages die beause domains and hosting gets abandoned or because websited get
upgraded without paying attention to old link format.

If you want your pages to last buy hosting that automatically charges your
credit card and use a company that encourages your cc info to be up to date
(like Amazon).

Also never revamp your sites just make new ones in subfolders or on new
(sub)domains. And if you absolutely need to upgrade existing site pay very
close attention so that it accepts old url format and directs user to correct
content.

------
xwowsersx
Maybe I'm dense, but I'm having trouble understanding what is so difficult
about keeping content around. It seems like the issue of webpack and node and
all the other things he mentions on the article aren't really problems with
content per se. You can just publish your thoughts as a plain text file or
markdown or whatever and you're good to go. I'm having a hard time thinking of
types of content that are really tied to a specific presentation format which
would require a complex scaffolding. A single static page with your thoughts
is sufficient and should require no maintenance to keep around. I do agree
though that even static site generators create workflows that get in the way.
I'd love to see an extreeeemely minimal tool which lets you drop some files in
a folder and then create an index page that links to those. You could argue
that's what static site generators pretty much do, but they do seem to be more
complex than that in practice. Remember deploying a web site with FTP? I have
to say that was simpler for the average person than what we have today. I
think that, in some ways, the complexity is what ends up pushing people
towards FB, Medium, etc as publishing platforms.

~~~
every
"I'd love to see an extreeeemely minimal tool which lets you drop some files
in a folder and then create an index page that links to those."

I use the tree command on BSD to do just that. It has the option of creating
html output with a number of additional options.

An example: tree -P *.txt -FC -H [http://baseHREF](http://baseHREF) -T 'Your
Title' -o index.html

~~~
xwowsersx
Oh wow thanks for sharing. I use tree all the time, but had no idea you could
do this. baseHREF should be the full root domain, e.g. example.com?

~~~
every
Yeah. And play around with it. It's quite flexible. Color, CSS and other
goodies. Takes me about 2 seconds to update my entire site...

------
mellowdream
This is why I've been saving PDFs/HTMLs or even just taking screenshots of
webpages I find especially meaningful or important to me... Archiving things
as files this way can get kind of tedious and definitely feels primitive at
times (we made the LHC but I'm here not expecting the same pages of cool
interviews, designs, etc. to be up next month), but what can you do?

But it's nice to know you're not alone in wanting nice things to last :)

~~~
whateveracct
Same here. The new iOS full-page capture + iCloud files works great for this.

~~~
saagarjha
"Printing" the page might be higher-fidelity in the amount of information it
contains.

------
saagarjha
I've put a bit of thought of what'll happen to my website if I were to die.
It's hosted on GitHub Pages right now, so at some point GitHub is going to
disappear or stop offering the service. Even before that, I honestly don't
know what happens to renewing payments–I guess my Google Domains payment will
stop and somebody will squat on my domain soon after? archive.org might be the
only thing keeping the information around…I hope I've done a good job of
making it archivable; as a matter of policy, there's no JavaScript. There's a
snapshot from earlier this year and it looks fine, so maybe it'll outlast me?

~~~
lioeters
I recently learned about GitHub Archive, which plans to last "at least 1,000
years".

[https://archiveprogram.github.com/](https://archiveprogram.github.com/)

It's specifically for open source software, but I wonder if it can be spun out
into personal archives or websites.

\---

Some years ago, I had an idea for a thought experiment called the Eternal Bit
- various angles on the practicality of preserving a bit state "forever".

------
Brendinooo
This is nice, but that webfont bullet is a weird take. If you're hosting the
font yourself and you're using a modern format, fonts don't add a lot of
overhead, and if they happen to fail then the browser will take the next item
down the stack. It's textbook progressive enhancement. Nothing about adding a
webfont properly will prevent Web content from lasting and being maintained
for 10 years.

"your focus should be about delivering the content to the user effectively and
making the choice of font to be invisible, rather than stroking your design
ego" \- these aren't the only two options?

~~~
Pigo
This definitely reads like something one of my professors would have written.
Someone with a lot of good knowledge, but lacking some context for things you
learn in the field.

------
rodw
I think this article is excellent, but one small nit: isn't it contradictory
to say _don 't_ minimize HTML but _do_ minimize SVG?

The justification in the HTML case is that "view source is good" and "it's
compressed over the wire anyway". Don't those arguments apply equally (or
nearly equally) well to SVG?

~~~
nrp
I think the argument is that HTML can and should be human readable and
editable, while you really need a tool to meaningfully create and edit SVGs
anyway, so minifying isn’t a loss.

~~~
chrismorgan
Depends on the SVG. I regularly hand-write SVG diagrams. But I won’t claim to
be normal!

------
lmm
HTML is a terrible authoring format. CSS is a terrible everything. If you want
something to last, the thing that will last is the human-created source -
probably markdown.

I'm not worried about my blog posts sitting in their git repository being
lost. The Jekyll pipeline that adds a Javascript header/footer might go away,
as might the Javascript that pretifies my raw posts, but the markdown is
durable, and a future archivist could always regenerate a pretty version from
the markdown - or even read the raw markdown.

Give browsers a good way to view markdown, give site creators a good way to
link to canonical source for their pages, and then we'll have durable links.

~~~
chrismorgan
No, I trust HTML and CSS to stand the test of time far more than Markdown. Do
you know how many different variants of Markdown there are, how they affect
the interpretation and appearance of the content, how they can break
surprisingly much? And when it comes down to it, as soon as you want to do
anything even _remotely_ interesting in Markdown, you have to drop straight
HTML in there, and pray that the Markdown engine will do the right thing,
since the rules of how it should all work are _insanely_ complex, and vary
widely by engine, for all that there’s a general trend towards CommonMark
which at least _specifies_ the madness and folly. (I’m sad that Markdown won
over reStructuredText, which was actually _designed_.)

HTML is a quite acceptable authoring format, one that readily lets you do
interesting things if you desire—though making it _possible_ can certainly be
a footgun. CSS is a reasonable styling language.

~~~
lmm
> Do you know how many different variants of Markdown there are, how they
> affect the interpretation and appearance of the content, how they can break
> surprisingly much?

One plus a bunch of non-standard extensions. Do you know how many different
variants of _HTML_ there are, how rare it is for real-world HTML to actually
conform to any of them, and how many different interpretations of that there
are. (To say nothing of CSS, the standard so complex that it's never actually
been implemented).

> And when it comes down to it, as soon as you want to do anything even
> remotely interesting in Markdown, you have to drop straight HTML in there

Just say no. If your goal is writing something intended to last, you should be
able to convey it in mostly-plain text.

~~~
chrismorgan
Nope, _definitely_ not one Markdown. Reddit uses three different engines with
major mutual incompatibilities and mostly forbids all HTML (which frankly I
deem enough to call it not real Markdown); Stack Overflow uses two with what
used to be critical deficiencies and incompatibilities in the one used for
comments, but I think they’re now mostly minor only; some things still use
Markdown.pl which does many bizarre things; some, other weird engines with
idiosyncrasies of their own; most more recent Markdown engines are only
_mostly_ CommonMark-compatible, regularly deviating in important places, and
_very often_ adding incompatible extensions. It’s a _disaster_ at present, and
I don’t expect it to get much better for at least a decade, if ever. (HTML got
better with HTML5, but I don’t think Markdown is likely to unite so firmly,
because people want more than non-HTML Markdown offers, and so will continue
to extend it.)

HTML, though? Since the HTML5 spec about a decade ago, there has only been one
HTML, with all browsers parsing and handling documents identically. CSS is
similarly parsed and interpreted according to well-defined algorithms now.
There are some visual rendering differences between browsers in how CSS is
handled, but it is exceedingly rare for them to affect the content.

And for your complaints about CSS, it’s not _intended_ as a single thing to
implement in one piece—it’s deliberately designed as something that is
extended over time. But if you write CSS that works in browsers now, then
presuming you haven’t used vendor-prefixed stuff, it’s reasonable to expect it
to work the same way indefinitely.

------
fouc
Webpages should be archiver friendly.

Imagine sticking a proxy between our browser and the internet that
automatically archives all webpages we attempt to browse to, and then only
lets you view the archive. How much of the internet would you be able to see?

------
zZorgz
This website needs to scale text better on mobile (on an iPhone) so it's not
hard to read. Especially for a post advocating on using vanilla HTML and how
nice and powerful it is.

I don't think I particularly disagree with any of the post, but I found it a
little long-winded.

~~~
superkuh
Catering to energy limited, heat dissipation limited, UI size and precision
limited, and network limited (random rtt) smart phones is why the web has
become as bad as it has.

~~~
saagarjha
Not really. With a little bit of effort, it’s quite possible to make websites
that work on mobile devices as well a desktops.

------
6510
I have to compliment Jeff for first telling me unpopular things I already
believe then adding sensible things to it (except the link back to his page)

The idea ForEach person to write just one single web page is really wonderful.
I'm going to have to deeply ponder that and make one.

While I really like the permanent web and losing centralized control over data
is a price worth paying (.... no wait, I would pay money not to have this.)
the winning method of making cars last 100 years is maintenance not build
quality.

That said, here are some similar ideas of mine:

I often add a torrent magnet uri when I link to youtube. I seed those torrents
myself and sometimes someone helps. If the content was good enough and yt
deletes it for whatever [stooopid] reason there would be more seeds.

Here [http://dr-lexus.go-here.nl](http://dr-lexus.go-here.nl) I, in stead, try
to "sell" the idea that [besides from stuff breaking] people use and will use
a combination of Adblock Plus, NoScript, RequestPolicy, Ghostery and JSOff to
break your stuff.

You can put a really tiny bit of css inline and at least render the page
properly if the css fails for whatever reason.

I really dislike how our websites merely provide just one location for
everything so I wrote this:

    
    
       <img src="http://example.com/img.jpg" 
         data-x="0" 
         onerror="a=[
           'http://example.com/img.jpg',
           'http:/example.com/img2.jpg', 
           'http://img.go-here.nl/michael-faraday.jpg'
         ];
         this.src=a[this.dataset.x++]" >
    

The page has an example failing back to data-uri which imho looks surprisingly
decent for its size. Its way better than having a hole in the page.

~~~
ShamelessC
Hm I really like the idea of using torrent as a Fail-Safe for media. How do
you go about generating a magnet link for your videos?

~~~
genuinebyte
I'd like to chime in here and suggest you have a look at PeerTube.
[https://joinpeertube.org/en/](https://joinpeertube.org/en/)

I haven't tried it yet but it uses the BitTorrent protocol for video
retrieval, so there is your magnet link? Of course it is self-hosted if you
can't find an instance that'll take you, but it might be a good option.

------
simonsarris
Good rules, I follow all of them, making a pure HTML/CSS single page homepage
of uncompressed HTML, except... it's not at all designed to last. I nuked my
last site to make it, and I'll nuke this one one day when I wanna make
something even cooler. Website as art installation plus a few links.

My _content_ is spread about, on medium, on github, twitter, instagram,
wherever. But this too I feel is mostly ephemeral. It's still not clear to me
why that's a bad thing. I dislike hoarding physical objects. I'm not sure
about digital, either, I suppose. So one alternative is to free yourself from
the idea that it needs to be hoarded, that all your works need to be maximally
legible and easy to find, etc. I do suppose if you're a professor your
students should probably be able to find the syllabus from your website (which
he has, though it's hosted on brown's site).

~~~
JasonFruit
> It's still not clear to me why that's a bad thing.

Human advancement comes from the process of building knowledge on knowledge,
using what our predecessors learned to move our starting point forward.
(You're the predecessor in that sentence.)

------
maple3142
I still don't get why not minify HTML... When you open your browser's devtool,
all the HTML elements is properly formatted and interactive. If you mean
"view-source:", there are tons of HTML beautifier there.

------
dirktheman
This is somewhat of a concern to me: I can imagine a distant future where one
of the great mysteries would be the rise and fall of literacy, since almost no
written communication is left and computers and the internet are long
forgotten, obsolete technology.

"In the beginning of the 21st century people suddenly stopped reading and
writing. In the short timespan of only a few decades almost all form of
written communication has vanished..."

~~~
perlgeek
Even if you count only printed books, the number of books published by year is
still rising in the US: [https://www.theifod.com/how-many-new-books-are-
published-eac...](https://www.theifod.com/how-many-new-books-are-published-
each-year-and-other-related-books-facts/)

It seems that the time spent reading decreases slightly.

------
mmsimanga
> But if not, maybe you are an embedded systems programmer or startup CTO or
> enterprise Java developer or chemistry PhD student, sure you could probably
> figure out how to set up some web server and toolchain, but will you keep
> this up year after year, decade after decade?

This describes me. Started on Drupal now on Hugo but still have to retrain
myself everytime I need to update my site. I don't even understand my own Hugo
template. I once set up batch file to build and upload web site. I upgraded
computer and I couldn't get my batch file to work. It was something on my PC
or AWS. Most likely my incompetence and lack of time to find issue. I am going
back to HTML and FTP. Funny thing is I learnt HTML in college and 20 years
later I can jump into it without much fuss.

------
Animats
I have pages on my web site that are 24 years old. Here's one.[1]

[1]
[http://animats.com/papers/articulated/articulated.html](http://animats.com/papers/articulated/articulated.html)

~~~
saagarjha
Older than many Hacker News readers!

------
tannhaeuser
I'd like to point out SGML as a format for very long document storage and
sustainable authoring. I've written up a tutorial for using SGML for
preserving content [1] that I held at this year's ACM DocEng conference.

[1]: [http://sgmljs.net/docs/sgml-html-
tutorial.html](http://sgmljs.net/docs/sgml-html-tutorial.html)

------
vkou
> Well, people may prefer to link to them since they have a promise of working
> in the future.

In a world where 95% of the google search results for <common problem> are
forum threads, with everyone 'answering' the question by saying 'just google
it, dumbass'[1], I don't think people - as in - the common homo sapiens -
cares about the longevity of the content they link to as much as the author
thinks.

Quality of the content (Does it have the information you want) is king.
Longevity is a 'future me' problem, and 'future me' is incredibly
shortsighted.

[1] Thanks, guys, how did you think _I_ found this forum thread?

------
franze
I wanted to make a pre MVP leads landing page this week. Researched new static
site builder and templates. Choose one, ran into a build problem, then saw
that docs were not up to date... all in all half a day with no hard, ready to
launch outcome.

Took a step back, what is the most minimal thing I needed.

Took html5 boilerplate, copy&pasted the index.html, put normalize.css inline.

Got rid of everything else. Minimal Html, an image, form mailer. Done and
launched [https://www.securrr.app](https://www.securrr.app) on Netlify in less
than an hour.

Overengineering is real. Choose a goal, use the most simplest solution to get
there.

------
carlosdp
I was expecting the solution to be mirror your generated pages on IPFS
([https://ipfs.io](https://ipfs.io)), so they just don't go away at all (as
long as someone has them pinned).

The proposed solution set seems extremely convoluted and don't actually solve
the issue.

~~~
jtbayly
That’s quite the caveat, and speaking as somebody who has attempted it, you’ve
introduced quite a bit of complexity. The whole point is that complexity
militates against keeping it online. Keeping it simple, the author’s theory
seems to go, is the single most effective way to make something likely to be
able to be available long term.

I think he’s probably right.

~~~
carlosdp
None of what he wrote mitigates the case where he no longer maintains his site
and stops paying for the server, which is the very reason all those links he
mentioned were dead (defunct websites/hosts).

~~~
chippy
The article was about not having to maintain a site for non techies like us.
WE can happily maintain a range of things, but a chemistry doctoral student
will hit a hurdle and may give up.

Paying for a server - mentioned in the article about the provider changing
access.

its all about hurdles.

------
ignoramous
Also see, _how to build a low-tech website_ :
[https://news.ycombinator.com/item?id=18075143](https://news.ycombinator.com/item?id=18075143)

------
Abishek_Muthian
>I don't even know any web applications that have remained similarly
functioning over 10 years

HN is a good candidate to be included in his example, it's been the same for
12 years?

~~~
kristopolous
It's changed a bit, fairly fundamentally (hiding points, web view, collapsible
conversations, ...) . The chan sites haven't, they're probably the closest
interactive ones. Metafilter also hasn't but it's pretty small

~~~
jacquesm
> Metafilter also hasn't but it's pretty small

That is why it is still around. Things that grow eventually grow out of
control and then people will move on.

------
alpb
Frequently appearing [http://danluu.com/](http://danluu.com/) on HN is one of
these sites that are absolutely minimal.

~~~
juliend2
On my large screen, the lines are so long that it's difficult to read. But
other than that, it's a nice inspiration for how minimal a website can be.

~~~
taftster
FYI. The content looks real nice in "reader" mode (at least in Firefox).

I kind of think this is the best approach. Write your content with as minimal
markup as possible. Let the user-agent render the page in my own styling.

I wish the web was a bit more like the gopher protocol, where the page markup
was very minimal. Just give me the plain text (or close to it) and let my
client render it the way I want to read it.

Markdown (or similar) would be an ideal representational format. This would
push all the rendering decisions to the client.

~~~
saagarjha
Even better (IMO, because that’s what I do) is to provide your own markup that
is reasonable, but let the underlying semantic HTML be there too so people can
discard it and use their browser’s reader mode to view it too.

------
Mojah
Re: "Eliminate the broken URL risk"

This approach of monitoring a single URL is useful, especially if you can use
the free version of monitoring services. It doesn't, however, go far enough.

Does your site only consist of a single page? Are there no links between
pages? Or links to external resources?

There are few things as off-putting to your visitor by presenting them a link
that ends in a 404 page.

Forgive the shameless plug, but this is the exact reason we built the Oh Dear
[1] tool. We crawl the site, much like Google, to find those pages and present
them to you so you can fix them.

It's not that expensive and it covers your _entire_ site, not just a single
page that is designed to last. I hope your _entire_ site is designed to last,
we like to help make that happen.

[1] [https://ohdear.app/](https://ohdear.app/)

------
decasia
"I don't even know any web applications that have remained similarly
functioning over 10 years"

I have a WordPress instance that has functioned flawlessly since 2007. I would
call WordPress a web application (though I realize this is not the bespoke
sense of the term that the OP has in mind).

------
el_cujo
I like the general principle here of trying to keep website design relatively
simple and self-reliant, but I'm not sure I agree with the idea that "link
rot" is some horrible problem we desperately need a solution for. Good
articles will be lost in time, but I think that's OK. New good ones will be
written too. It's OK for content to be ephemeral, we don't have to be so
obsessed with FOMO and making sure every blogpost ever made is etched into
diamond to survive for 100 generations. If you read some article you thought
was great, it's OK to just be happy you read it rather than spamming it to the
four corners of the earth to give it as many eyes as possible.

------
trynewideas
Why minify SVGs, but not HTML? Aren't both gzipped by the server? Is it
because you're not expected to hand-edit SVGs in this scenario?

~~~
mr__y
HTML may contain style and script tags. Sometimes minification may break CSS
or JS - however rare that is, if you want your archive to be reliable you
don't want functionality that may break something. If you intend to reduce
storage space usage, HTML may be gzipped - this is "losless" so it does not
carry a risk of breaking anything

~~~
maple3142
If minification break your CSS or JS, isn't that the minification tools'
problem?

------
turkeydonkey
I'm disappointed that it's not served up by the one true http server for
static sites,
[https://acme.com/software/thttpd/](https://acme.com/software/thttpd/)

------
JoelMcCracken
Often when I am trying to find a dead link I can find an archived version on
archive.org. Just a reminder for anyone who it might help.

------
keithnz
bit disappointed, mainly because I was expecting something a bit different. I
was hoping to see something that has some ideas that would make things last
beyond the original author, say for 100 years. Of course that's hard to
predict given tech will change, but what the best guess effort to make your
page last as long as it possibly can into the future

You need to address the problem of domain names...the author has a vanity
domain name that needs to be paid for that will need to continue to be paid
for, so may need to pigggyback onto another domain

Hosting might be tricky

My best guess is MAYBE something like Github would solve both issues....

~~~
susam
> the author has a vanity domain name that needs to be paid for that will need
> to continue to be paid for, so may need to pigggyback onto another domain

This is a very good point. I host my blog with a vanity domain name[1]. The
domain name was accidentally taken away from me[2] for a while due to which my
blog became unreachable.

As a result of this experience, I now host a mirror of my blog using GitHub
Pages[3]. Further, I have made sure that the entire blog can be downloaded[4]
to local filesystem and viewed locally, i.e., all inter-linking is done using
relative paths only.

[1]: [https://susam.in](https://susam.in)

[2]: [https://susam.in/blog/sinkholed/](https://susam.in/blog/sinkholed/)

[3]: [https://susam.github.io/](https://susam.github.io/)

[4]:
[https://github.com/susam/susam.github.io/archive/master.zip](https://github.com/susam/susam.github.io/archive/master.zip)

------
Tepix
I agree with all the points except for "Obsessively compress your images". The
same argument as point 2 ("Don't minimize that HTML") applies: It's an extra
step in the queue, and you lose some quality in the result.

Besides, bandwidth and storage will only get cheaper. Unless you have a site
that contains thousands or more images, chances are you won't really notice
the difference the compression makes.

Maybe for preservation we should even go in the other direction instead: Make
every image a link to a full-size non-resized version of the image with
optimum quality.

~~~
chispamed
I disagree, image compression is still highly relevant and will stay relevant
for a long time. It’s not about hosting costs and hosting bandwidth, it’s
about user bandwidth.

I frequently travel between countries by train and bus and when outside cities
it’s almost impossible to get any non-optimized page to load. And this doesn’t
even only extend to the time spent inside the train. Depending on the country
(all developed) and which phone operator my phone decides to switch to, I have
slow mobile internet about half of the time. This is also one of the reasons
why I prefer to read HN over other sites: HN loads, others don’t.

Most images on a modern website don’t really contribute anything substantial
but are merely decorative. Image quality loss should therefore not be a
problem for ~90% of images.

~~~
Tepix
Compare the available mobile bandwidth and cost today with what we had just 5
years ago. We're talking about regular web pages here. Sure, you don't want to
put full resolution images straight into your page. That makes no sense. But
if you have a regular page with a couple of images, its size up to a certain
size won't matter in most places in a few more years, just as it doesn't
matter today if you live in a more densely populated area. I have a mobile
phone contract where I get 10GB of traffic (which seems a lot to me but which
is laughable if you compare it to other markets such as Kuwait where contracts
with 1TB/month are normal) for less than 20€ per month and I only manage to
use 3GB per month or so most of the time.

------
sergimansilla
It sounds like the perfect case for the Beaker browser
([https://www.beakerbrowser.com](https://www.beakerbrowser.com)), which is
backed solidly by the Dat protocol.

~~~
chippy
an _experimental_ browser is to be considered something which will last a long
time? Or would it need lots of maintenance? I can predict that in the next 5
years there will be some api change and users will have to migrate. And some
users will not. If users will be dropped then it is not a perfect thing that
makes pages last.

Yes it probably is great to get started for newbies, but the article was about
things that _last_

~~~
sergimansilla
Doesn’t seem like you know what you’re talking about. API changes? Sure, if
you’re referring to the HTML language changing. The content does not depend on
the browser being there. It’s just “dat” addresses. Dat is a public p2p
protocol with good success stories, especially in the scientific community.

------
pergadad
Does Google Web Fonts not do some tracking along the lines of analytics? I'd
never integrate a Google service in my site and sell user info out for
little/no benefit.

~~~
stkdump
Yeah, he does recommemd to host the fonts yourself. I guess that renders the
point that the users already have them cached moot...?

------
hoten
The point on not minimizing HTML doesn't make any sense to me. Who looks at
HTML via View Source to understand a site, when the Elements panel is right
there?

~~~
r3drock
I think it has a value for technically less literate people.

------
vikeri
One of the issues highlighted is that the ecosystem moves fast and
dependencies tend to break your setup. One thing that at least partially has
mitigated this problem for me is using a stack that is mature and very careful
about breaking changes. Clojure is a great example of that. My gut feeling
tells me that if you had a repo in Clojure that created your static pages 5
years ago (when Java 8 was released) it will still work today.

------
jerezzprime
I'm confused about the widely held opinion that it is paramount that all web
content persist for infinity. That seems parallel to someone expecting my
personal handwritten journal to be archived and available to all of future
humanity. I'm not convinced that is a positive thing, nor is it a necessary
thing. How egotistical do you need to be to think that your personal blog is
worth persisting forever?

~~~
ljosa
TFA is written from a reader's perspective: the author had found a bunch of
links important enough to bookmark, and then most of them disappeared.

------
Loranubi
I am all for archiving important web content, but imo most of the web content
is not important and impossible to archive anyway. Too many pages use dynamic
content which make web archives useless. Also having too much data just
dilutes the useful stuff.

If you have useful content (the author should) store it in some other way. Put
articles in any simple markup format like markdown in git, distribute videos
using torrent and so on. For some types of content I couldn't think of a good
solution yet. For examples JS games or applications should be in some self-
contained single-file "web-executable" format. Images need to be embedded
often, so torrent is not a good medium, but they don't belong in git also. So
not really sure where to put them...

Also make sure to use standardized formats with proper structure. Browsers
should remove quirk modes and flat out refuse to render anything which cannot
be interpreted 100% unambiguously.

Webpages are always just a temporary distribution medium. They should never be
the original storage place.

Sorry if all of that sounded a little bit rant-y, but I think the current web
is a very bad state.

~~~
saagarjha
> Browsers should remove quirk modes and flat out refuse to render anything
> which cannot be interpreted 100% unambiguously.

Unfortunately the web seems to have de-facto standardized on these quirks and
large swaths of the internet would break if you removed them.

------
ynniv
Don't throw away those bookmarks! Even if they don't work by default, they're
still links to valuable content in the WayBack Machine!

------
biznickman
I was thinking about this exact subject matter the other day. We invest so
much time in our digital content but what happens with death? How long can we
really expect our content to last?

Ultimately we are all dependent on the functionality of the internet and
making a physical copy is the best way to make something last. However if we
intend to not only have it last but be accessible, some of the suggestions are
helpful but the most significant is: where can I host something indefinitely?

Will Netlify, Github Pages, AWS, etc all be around in 50 years? 100 years? 500
years? Heck, the internet hasn't even been around for all that long.

As I write this, my only thought is that you need a system of fallbacks.
Frankly, this seems like a business opportunity in which the cost of the
infrastructure is purchased based on some formula. DNS is configured
programmatically. File locations are distributed and redundant. I'm not sure
what the best approach is but one thing is certain: the page accessibility is
the least of this author's issues. He just needs the site to be hosted...

------
bad_user
I completely agree with this article, however ...

> _(4) End all forms of hotlinking_

Any advice on what to do if you want to embed videos?

Bandwidth can cost a lot (maybe I'm wrong?) and AFAIK Cloudflare doesn't cache
videos.

I would stick with Vimeo, but they can delete your videos if over threshold
after you stop paying. YouTube at this point is sadly a better bet if you want
to maintain a low cost website for decades.

Any alternatives I'm missing?

~~~
Double_a_92
If bandwidth (and not storage) is the only issue, you could upload it on
youtube but still add a direct link to your self-hosted file.

------
metapsj
here's another option to throw into the mix...

ward cunningham's federated wiki
[https://wardcunningham.github.io/](https://wardcunningham.github.io/)

------
lurenjia
While in China, contents online are expected to disappear any minute and
screenshot is the first choice to save & share.

------
panic
To last a long time, you need a lot of people hosting copies. Think about PDFs
-- it's easy to host a local copy of a PDF, so popular PDFs tend to be widely
available. But a site hosting a local copy of every webpage it links to would
feel weird.

------
cellularmitosis
I feel like philip greenspun deserves a mention as one of the OG's in this
practice.
[https://philip.greenspun.com/sql/trees.html](https://philip.greenspun.com/sql/trees.html)

------
nichochar
Wonderful write-up. In particular, I appreciated how actionable this was. As a
backend developer that loves the web but isn't part of the whole frontend
mania, I completely relate, yet still learned a couple things that I can apply
easily.

------
tomtomtom777
The problem is clear but the solutions presented don't seem to help all that
much.

This is why I hope for a future with content-based addressing, as used in
bittorrent and IPFS.

Bittorrent has its focus set on files, and IPFS is becoming a bit convoluted
but the core idea is very powerful.

Immutable content persists by "pinning". And even if the author no longer
wants to maintain the resource, there are likely some "users" that will (eg.
linked pages). And even if there is little interest from users there are still
archiving services that might want to keep it. Persistent unless it's so vile
or irrelevant that even archiving services ditch it.

All addressable by the same content hash.

------
thadk
Some of the classic articles from the Kuro5hin website mentioned are available
at
[https://web.archive.org/web/20061124151022/http://ko4ting.ma...](https://web.archive.org/web/20061124151022/http://ko4ting.maddash.org/cgi-
bin/k4/Classic_Stories)

If you need any particular ones not available on
[http://atdt.freeshell.org/k5/](http://atdt.freeshell.org/k5/) let me know
here and I can pull them out of an old archive for 2000-2007.

~~~
devurand
Good ol' localroger. I never knew I wanted to read singularity fiction that
included zombie rape and forced incest until Metamorphosis of Prime Intellect.

[https://en.wikipedia.org/wiki/The_Metamorphosis_of_Prime_Int...](https://en.wikipedia.org/wiki/The_Metamorphosis_of_Prime_Intellect)

[http://localroger.com/prime-
intellect/mopiidx.html](http://localroger.com/prime-intellect/mopiidx.html)

------
luxuryballs
This is also why if you ever want to make a platform or service that will rely
on permanent links make sure you consider making configurable/trackable
routing as part of a resource’s stored data record, not just whatever your
application decides at the time. That way when you inevitably want to upgrade
the system and change the routing you can always maintain the old ones because
they aren’t purely defined by application implementation. This is the
difference between crafting software to do something and crafting software to
build something that manages the thing you want to do.

------
akouri
I love this! I was going to write something similar. The increasing prevalence
of sites that shard their content over multiple pages to increase ad
impressions seems beneficial to nobody; the consumer, producer, and advertiser
all have degraded experiences. The problem is that the companies which are
paying out for ads (Google) don't penalize this type of behavior, so it is
financially beneficial to have super bloated websites which require 15 page
refreshes to view all the content.

wikihow is a very good antipattern to follow. This site should be deindexed
from google entirely.

------
ggm
pURL are hard. I have one which is an X.509 enforced pointer to a PDF which is
'terms and conditions' and the pain to ensure the link never dies (its
embedded in binary objects which are long lived, and out in the world beyond
our control) is non-zero. Change publishing model, change CMS, the pURL is at
risk.

Publication models where you pay the cost for somebody to be the archival
reference make some sense. URI pointing into a store. Buckets in Google? But
if you decided to move on, could you get a retained 302 redirect to point to
the new home?

------
tripzilch
> I think we've reached the point where html/css is more powerful, and nicer
> to use than ever before. Instead of starting with a giant template filled
> with .js includes, it's now okay to just write plain HTML from scratch
> again.

This is so true, I do it for all my personal webpages. Well I currently have
Github Pages in between, but markdown converts to pretty neat vanilla HTML and
I wrote the templates myself.

On another note, to his third point, I couldn't access index.20191213.html :)

------
dpcan
The only pages I ever go back to are the ones where I hope, or need, to find
NEW content. Forums, stock art, reddit, hacker news, facebook, twitter, github
(for updates).

There is really only a small subset of Internet content that will ever need to
stay the same - or even should stay the same. So, while slapping an HTML file
on the web is great for, say, the synopsis of a movie, I think the majority of
the web can be discarded when new and relevant content needs to take its
place.

------
jrjrjrjr
Maff, Mht, it's the best way I found to save snapshots of websites. if you
want to save an entire site, there's PDF.

I keep them yearly in sub directories marked by year, it's pretty convienient
and you can rename each Maff to be most relevant to a directory view, special
pages go below the year directory.

currently only waterfox still has this option, since the snooty mozilla guys
can't bring themselves to carry on a 15 year WORKING method that wasn't Not
Invented Here...

~~~
jrjrjrjr
>> just saved my comment in 10 seconds, on the desktop for later filing if it
is deemed worthy.

someone needs to bring this feature back.

waterfox is only 64 bit, i wish it could be a replacement for the rotting
mozilla firefoxes that I can't upgrade.....

jr

------
juststeve
XML tools should be able to validate the HTML locally, as these pages have
already been "rendered" ahead of time.

And it would be interesting to see some stats on the types of pages that are
returned for the broken links. Are they returning HTTP 404s? 30x? 200's, or
500's? Does their site even allow for 300 redirection?

But it makes sense to reduce infrastructure to serve static content, which are
all points of failure anyway, or at least a maintenance burden.

~~~
tannhaeuser
Not XML but SGML [1], the superset and origin of both XML and HTML.

[1]: [http://sgmljs.net/docs/html5.html](http://sgmljs.net/docs/html5.html)

------
punnerud
Archive.org also support adding custom pages to be saved. Is miss a feature on
the WeyBack Machine chrome extension to automatically check every bookmark,
and auto save if it is not already saved:
[https://chrome.google.com/webstore/detail/wayback-
machine/fp...](https://chrome.google.com/webstore/detail/wayback-
machine/fpnmgdkabkmnadcjpehmlllkndpkmiak)

------
ageofwant
All my projects have a README.md, some a README.org. These are simple
uncluttered formats which everyone with the most basic of readers can read. A
few years back I fount my decades old masters thesis, written in plaintext
latex. It compiled first time to pretty pdf on a raspberry pi.

If you care about content, or anything really, keep it simple.

[https://plaintextproject.online/](https://plaintextproject.online/)

------
miguelmota
Is there a browser extension that can automatically add a site that you
bookmark to the Wayback Machine and fetch the bookmarked site from the Wayback
Machine if it 404's? I really hate it when I bookmark something and come back
to it months or years later to find that the page doesn't exist any more so
having something to automatically pull it from the web archive would be
amazing

------
artur_makly
Right after we launched, we were pleasantly surprised that a good portion of
our users started using our automated visual site-mapping platform [1] to
archive their sites ( while redesigning newer versions of them ) and other
public sites of interest.

[1] [https://VisualSitemaps.com](https://VisualSitemaps.com)

------
neilobremski
These points look like the old (timeless) guidelines I remember for creating a
web page ... solid, good, and as plain as a saltine cracker. I love it and it
stores well but convincing the world while we romp through the hype of
technological progress is like following a mob through the streets telling
them to pick up their garbage.

------
nayuki
Jeff's article has similar sentiments and technical recommendations as Tim
Berners-Lee's classic piece "Cool URIs don't change" (1998):
[https://www.w3.org/Provider/Style/URI](https://www.w3.org/Provider/Style/URI)

------
ChrisMarshallNY
One thing that I do, and it does get me a few sneers, is I tend to eschew the
use of a lot of fancy "eye candy."

I invest in fairly basic CSS, and tend to avoid using JS for any site-required
functionality (unless it's a JS site).

I have found my sites age well. There's a couple that I've barely changed in a
decade.

------
zzo38computer
I do believe that, usually, you should make it possible to be archived. (In
one of a few cases where it isn't, consider if HTTP(S) is even the correct
protocol for what you are doing; sometimes it isn't.)

I agree that you can (almost always) write HTML without JavaScript codes (CSS
is not always needed either, but nevertheless it can be helpful). This
improves portability too. If you do want to use JavaScript to generate static
documents, consider using Node.js and have it output a plain HTML document,
which will then be the hosted document, rather than hosting the JavaScript
version. (This way, the code only has to run once.)

About fonts, I think usually the fonts are not an essential part of the
document. (Still, you don't usually need to use so many, but often you do not
need to specify the font at all, except you may still need to specify if it is
monospace, bold, etc.) Also, you do not usually need so many pictures on your
web page (except picture galleries).

I do think if you need a copy of a web page then you can copy it.

You can also use plain text documents (without HTML); it is what I often do.

~~~
hashmal
"monospace" is a property of a given font but is not (usually) a variant of a
single font family as is "bold" (font weight).

While bold text is often used for emphasis and setting it to "regular" doesn't
change the meaning much, using a monospace font signals specific things
(often, it's used to represent code snippets, but in some contexts it can mean
"work in progress"). Changing that font to a proportional one strips a lot of
meaning and readability from the text.

In that regard, fonts families _are_ an essential part of the document, and
not just their stylistic/emphasis variants (e.g. size and weight, which can
still convey important meaning on their own. Think titles and headings.)

~~~
chrismorgan
Honestly, in almost all places that monospaced fonts are used, they’re purely
a stylistic choice. (The remainder is mostly ASCII art, or plain-text
representations of tables—in which case _alignment_ is important, not the
font—or something where tabular figures are desirable for number alignment.) I
don’t say that that makes it useless in any way, but honestly monospaced text
in code editing is overrated. You can live without it easily, and will
probably get used to it very quickly.

~~~
hashmal
I agree about monospaced text in code editing. I was thinking about uses such
as code excerpts in literature (class names, inline bash commands, etc).
Monospaced fonts[1] plays a major role in helping reading and understanding
the content. This is more than a stylistic choice.

[1] actually the font doesn't need to be monospaced, but it must be very
different than the font used for main copy.

~~~
zzo38computer
While in some cases the font doesn't necessary need to be monospaced, some
documents might require that a monospaced font is used, so it should be
monospaced anyways. Although, as you say, different from the main copy; so if
all of the text is monospaced, there should be some way to distinguish it,
such as by selecting a different monospaced font, or displaying it with a
different colour.

(One case where a monospace font is required is if you are automatically
including text from another document which uses plain text format; there is no
way to determine what format is needed, but monospace will always work.)

------
imhoguy
We need a common distributed storage protocol - kind of Bit Torrent for web
pages. Dead simple in use and built into (every) browser transparently.
Anytime a public page is visited/bookmarked then it is preserved in a swarm of
caches until the last seeder dies.

~~~
sergimansilla
Sounds exactly like what the Dat protocol and the Beaker Browser
[https://www.beakerbrowser.com/](https://www.beakerbrowser.com/) offer.

------
juststeve
Well, a new HTTP status code or header could be created for sites that are
closing, then if a browser navigates to the website, it could prompt the user
for action? e.g. update bookmark, archive, or email site admin (if on the
hosting side).

~~~
kortilla
Most sites close quite abruptly (domain didn’t renew, web server crashed, site
was “modernized”, etc). Most link rot is due to the fact that people aren’t
actively trying to keep things alive. An HTTP code won’t help with that.

------
chrisMyzel
Using wallabag (o.s. Pocket like alternative) for bookmarks and it saves a
copy of the website in a screenreader mode fashion so text and images - so far
I have around 2k links in there and it takes no significant size so far

------
Pigo
> We can avoid jquery and bootstrap

Amen to that at least. Nothing bums me out more than someone using jquery just
for a simple DOM selection, or having a simple layout and using bootstrap just
because you know their columns.

~~~
epicide
For those that just want to do DOM selection, querySelector(All) is widely
supported at this point.

If you want a simple CSS framework (because you don't want to write CSS) that
includes a grid system, Milligram is a useful alternative.

~~~
Pigo
I haven't seen Milligram yet, I'll check that out.

I've seen people include the full jquery just to select something by ID. But I
guess it's hardly my worst horror story.

------
lucasmullens
HTML is worth minifying just to get rid of comments. I don't want my
"TODO(mullens): " comments to be visible to end users if possible. And gzip
doesn't get rid of comments at all.

~~~
saagarjha
I enjoy seeing authentic HTML comments. Sometimes they can provide invaluable
context to the page.

~~~
52-6F-62
I agree. I can understand not wanting it in a business context, but on
personal sites or open projects it’s interesting.

------
markvdb
The solution to this designing web pages to last will have huge overlap with
the missing layer of services on top of free and open source software. If that
ever gets kissed into existence...

------
every
With the exceptions of my root index.html generated by the tree command and
the subdirectory listings handled by apache, my entire site is plain .txt
files. Should be good for a while...

~~~
oneeyedpigeon
Kinda sounds like you don't link to any other content, though (I guess you can
include links in plain text, but that's a bit of a pain for your readers)
which means your site doesn't _really_ fit the use case.

~~~
every
Yes, it's a dead end, a cul de sac that links to nowhere. And that's exactly
what I intended. But if you see something that is of further interest,
highlight it and search. That will almost always produce topical results...

------
incompatible
The number of potential re-hosters is increased by releasing everything with a
free license. Some won't violate copyright laws, e.g., Wikimedia Commons for
individual media files.

------
ogre_codes
For this exact reason I've been pushing to use static HTML on my personal
blogs and I spend a lot of time optimizing images for both retina and 1x
resolutions. Basic HTML & CSS are good now, and there is no reason fairly
basic content can't be stored as static pages.

You can convert PHP or other dynamic pages to static fairly easily, it takes
about half a day of scripting and a little time with configuring your web
server. The pages load quicker and the load on your server is lower.

I've long considered the idea of a permanent hosting solution, where people
pay to get their content up and I guarantee it's longevity, but the business
model on that is tough.

------
Thorentis
I find it an interesting social phenomenon that the finite nature of content
on the Web is lamented, when content has been temporal for all of human
history.

~~~
makapuf
Carved stone is a bit more durable that that nice Myspace page was, however.
Yes it was not widespread and was difficult to setup but millennia of duration
is nice.

I think that the medium, be it a hosting site or you own domain, will fail.
What we need is maybe : a naming scheme that can transfer to different people
(replicated files on different servers indexed by content hash? Aka anonymous
ftp mirrors ... ) and common media ie readable by anyone, gracefully
degrading.

------
BrandoElFollito
Infinite scroll is an abomination and requires the JS the author wants to get
rid of.

I prefer a page which loads completely and which I can search (directly or via
Google)

------
Olshansky
I feel like this problem can be solved by pocket's (getpocket.com) premium
service where they store and index articles you've read and archived.

------
hsivonen
For preservation, using self-hosted fonts is better than trusting the “Web-
safe” fonts being available on the client (already not true e.g. on Android).

------
echelon
HMTL was a bad idea that just kept sprouting cancerous growths on top of
itself. Just like the author recommends not using non-web fonts, we honestly
don't need much markup beyond "monospace", "em", "strong", "title 1-5", and
"p". Everything else HTML does is a huge distraction.

In the past thirty years, there have only been a few actual kinds of content:

\- Article

\- Comment

\- Reference manual

Nothing really deviates from this. If it does, then it's probably in the
"application" domain and deserves all that javascript stuff.

I imagine a new semantic web inspired content type that bakes in permanent
content URLs, author signatures, and supports sharing P2P and rich archival:

    
    
       <article>
         <id>
           (Computed immutable content hash and signature. Subsequent revisions
            will invalidate this and require new content hashes. Links can be made
            to the old version.)
         </id>
         <authors>
           (metadata, homepage, public key)
         </authors>
         <contents>
           <title> Semantic Web was actually brilliant </title>
           <body> 
             <h1>The Web was Embraced, Extended, and Extinguished</h1>
             <p>Lorem ipsum dolor sit amet...</p>
           </body>
         </contents>
       </article>
    

It can be a really simple model. No divs or presentational CSS or anything.
(Well, maybe some support for LaTeX and I18N.) The client decides how it
looks, which is how it always should have been.

If every entity in the universe has an ID, they can link to each other in a
distributed fashion. It won't matter if the original source URL goes down as
long as someone somewhere has a cache.

It's also probably better to binary encode these (protobuf or something).
Images and media could be inlined instead of href'd.

Instead of building on the layer cake, we should take a moment to reflect on
what we're trying to do. We have a lot of history and bad evolution that we
can garbage collect and streamline.

If our objective is to _publish and share content_ , a lightweight version of
the semantic web is the way to go.

Oh, and check out my follow-up to the author:

    
    
       <article>
         <id> (my own post's content hash) </id>
         <parent> (the author I'm responding to) </parent>
         <author> me </author>
         <contents>
           Hey you! I disagree!
         </contents>
       </article>
    

Again, these can be distributed on _a web_. It can be p2p, federated, or live
atop the classical WWW. It doesn't matter. The clients can richly use the
semantic data model.

The anger for disappearing content, walled gardens, and AMP will eventually
reach a boil. As technologies go, the pendulum is always swinging. This is the
direction we'll eventually head back to.

~~~
zzo38computer
I agree, it is correct, we do not need much markup other than the five you
specified, plus hyperlinks; those are good enough. Yes, the client should
decides how it looks.

~~~
masswerk
I'd include lists and tables for data presentation. Also, we may want some
(additional) tags to manage footnotes and references (these were in SGMLguid,
but didn't make it into HTML.)

~~~
zzo38computer
O, yes, we forgot that. Footnotes, lists, and tables, are good too.

~~~
tjchear
What about img? It's not inconceivable that an article or a reference manual
should have images of some kind.

~~~
zzo38computer
You could just link to them I think, possibly with a hint to specify inlining;
the client decides whether to obey that hint or to ignore that hint. (This
would be done the same whether the picture is part of the document or is a
separate file, I should think. It makes many considerations easier to work
with.)

~~~
masswerk
This is actually how it worked in the beginning. Browsers also had an "display
images" toggle button in the main control bar.

------
nyolfen
[http://html.energy](http://html.energy)

~~~
saagarjha
Is this an exhortation to write raw HTML?

~~~
nyolfen
it's a celebration of writing html

------
Zamicol
Isn't the obvious missing technical component cryptographic hashes?

A hash can represent any resource.

IPFS, or something like it, is the missing component for long term retrieval.

EDIT: or something like
[https://github.com/google/trillian](https://github.com/google/trillian)

~~~
wcarron
Ugh. I thought blockchain hype was dead by now. Additionally, no it's not. If
zero nodes rehost and serve the content, it's not available.

~~~
Zamicol
I had no intention of bring up "blockchain", only that cryptographic hashes
must play an important part of any archival scheme intended to last
generations.

Otherwise you must trust that an archiver hasn't changed history. Why trust
when you can verify?

~~~
inimino
Where is someone generations hence going to get the hash from?

~~~
sjy
From the link they clicked on. The article is about link rot, not search.

------
matt_morgan
Everything old is new again.

------
zsgoldberg
This is great, but you may want to fix the color contrast on your links =)

------
seinecle
What would be a lasting server to host this lasting page?

------
boycaught
The HTML/CSS suggestion is key. "KISS"

------
DonHopkins
"Last First Page Design"

------
peter_retief
Web development is way too complex.

------
edisonjoao
i love this

------
tyzerdak
I don't get about Prefer one page over several.

Anyway some points seems reasonable but I bet most people will use wordpress
and when they are not interested they will abandon it and it's gone.

Imo better force web archive to make more archives and donate it or smth.

------
LukeWarmPiss
test comment

------
capnkap
$ dig jeffhuang.com

jeffhuang.com. 60 IN A 162.243.124.123

$ whois 162.243.124.123 | grep Organization

Organization: DigitalOcean, LLC (DO-13)

Designed to last... Hmm... Seems like a tall order for someone forced to
maintain their own host.

~~~
theandrewbailey
As long as the same URL has the same bytes, it doesn't really matter where
they come from.

------
doublement
Disappearing content is a blessing, not a curse. Let it all be replaced by new
people doing the same things slightly differently, instead of constantly
having to confront prior art.

~~~
9dev
This - so, so, much. I don't know where the assumption that everything is
worth preserving comes from. To me, this is even part of the beauty of the
internet: Things appear, then vanish again.

~~~
JohnFen
Often, the antique content is irreplaceable. If it disappears, then valuable
and important reference material is lost forever. Consider, for example, if
you are working with hardware or software created decades ago -- if
information about such systems is lost, you're pretty much hosed.

As another example, I keep every line of code I write (that wasn't written for
an employer) forever. I often pull up code that I've written decades ago to
use in new projects.

The beauty of the internet is that the old and the new aren't mutually
exclusive -- there's plenty of room for it all.

------
gdgtfiend

      > *from my high school years, where I first tasted the god-like feeling of dominance over software*
    

This piece of writing killed the rest of this article for me. While his point
is worth making, the way he went about making it just sounds very childish.
Especially for a Professor.

------
waiseristy
"Return to vanilla HTML/CSS"

This is by and large, impossible. The hoops you have to jump through and
downsides you have to endure are just a death by 10,000 cuts. Try writing a
tabulated container in raw HTML and CSS that flex sizes and behaves nicely
with the browser back button. Partial page reloads, containers with native (no
reload!) sorting, there's so many features in modern web design that are just
downright impossible without JS.

~~~
JohnFen
I am hesitant to put words in the author's mouth, but I suspect he would agree
with my stance here...

The solution is to not require things like partial page reloads, etc. This
shouldn't be a huge hardship -- a properly designed modern site can degrade
gracefully anyway, so users that don't have or allow things like JS can still
use it, even if in a "degraded" form.

