
Make your 404s into 302s - teh_klev
https://4042302.org/
======
Faaak
The idea is interesting and even kind of tech-funny.

But man, you really have to explain how it works a bit better. At first I
thought that we should redirect 404s to your website and I was: "??".

What I understood: Which each iteration of the website, you archive the old
one on a specific subdomain. Then, you redirect all 404s of the new website to
the old one. Like that: no link is broken.

~~~
leethargo
Agreed, I thought this is a service that would provide automatic redirects to
the Archive or similar.

~~~
Heliosmaster
FWIW Brave does that. If you reach a page with a 404 a banner will appear on
top with a button to try and navigate to the latest version of the page you're
looking for at the archive.

~~~
Jaruzel
There's an official Wayback Machine extension for Chrome:

[https://chrome.google.com/webstore/detail/wayback-
machine/fp...](https://chrome.google.com/webstore/detail/wayback-
machine/fpnmgdkabkmnadcjpehmlllkndpkmiak)

I use it, it's really useful.

~~~
andrewshadura
Except it very often misdetects a successful load of a page as an error and
redirects you away.

~~~
Jaruzel
Really? It's not done that to me yet. It just pops up a 'see this page on the
wayback machine' dialog, which I can dismiss if I want.

------
fizzizist1
404s are useful though.. how annoying would it be if you wanted to get to a
specific part of the site, and you kept getting redirected to somewhere else
without being informed that the part you are trying to access doesn't actually
exist. 4 __errors exist for a reason.

~~~
davnicwil
This is right - 404 pages are the right UX when you have a 'stateful' resource
that can be deleted, and you need to show that the URL (or ID param within) is
correct, and once pointed to a resource, but now that resource has been
permanently deleted and can't be shown any more.

In a sense this information conveyed by the 404 page _is_ now the immutable
'resource' that will stay permanently at that URL. Doing a redirect breaks
this, it's lossy and usually a bad UX.

~~~
hamilyon2
"Was here but deleted" has its' own http code now, that is 410 AFAIK.

~~~
throwanem
Has had since 1997 at least, cf.
[https://tools.ietf.org/html/rfc2068](https://tools.ietf.org/html/rfc2068) \-
it's just that nobody uses it.

~~~
OJFord
It's used in APIs, just not user-facing documents (or rather would-be
documents). Too late for that though, while laymen pretty much get '404',
introducing a subtly different numeric code to co-exist would be a bit much, I
think. And also pretty worthless, anecdotally I think actually displaying big
'404' text is on a downward trend, probably because of the prevalence of
SPAs/webapps in general.

~~~
throwanem
Yeah, I mean, for the lay public to have to know what "404" means feels like a
side effect of the early web being a frontier. What's important to tell the
user is "there's nothing here", and that can and should be done without
invoking any magic numbers at all.

------
simias
The home page is slightly misleading, the nginx config is of course the easy
part, the hard part is to correctly archive all the previous versions of your
website every time you make a change. I thought this website provided an
archival service but it does not.

People who care enough about preserving history and current links probably
already do that. People who don't care aren't going to start now because of
this page. Especially those who have dynamic content and probably don't want
to keep running a million different versions of their backend forever.

If you like something on the web then make a copy.

~~~
vorpalhex
Doing this as a consumer and doing this as a webmaster are different
processes.

As a webmaster, if it's at all possible to go static (whatever your flavor of
that is), then do that. A static website is easy to host and keep _forever_ ,
and it's usually easy for consumers to archive too.

Not to hate on PHP, but keeping older PHP sites around securely has become a
major undertaking. You can't safely run a wordpress site that hasn't been
updated in 5 years because your security vulnerabilities are exposed to the
wide web. If your static site generator has security flaws.. well that doesn't
affect your current build artifacts and you can still run the thing in secure
ways.

------
at_a_remove
Remember "Cool URIs Don't Change"? I think about it a lot.

It was written toward the end of the BOHF's reign, when a technical specialist
of the web had quite a lot of sway, when their decisions about a site's
information architecture and how it was run was, if not the law, at least a
very heavy hand on the till.

Those days are long past. Now Ted in Marketing wants a URL and who are _you_
not to give it to him? I remember the pain of creating vanity top-level URLs
in SharePoint 2003 because some functionary wanted them, and then they would
promptly forget what they demanded. Yes, I used to use 410 Gones where
appropriate.

That sort of thing has not been in our hands for quite a white, even if it is
probably the best thing to do. After all, has the product URL changed? Or will
it be back? Or has it been discontinued? The correct HTTP response, properly
and widely used, would be very helpful in moving so much of the web forward
but that is not under our control. Hasn't been for a while.

------
throwanem
This is a good idea, but a bad design.

First, it unduly burdens the server, in sending multiple redirects to cover
the entire search space of possible versions of a URL - for a mature site,
this could be a _lot_ of redirects. It also unduly burdens the client, in
following them, and the network between the two.

Second, 302 is the wrong type of redirect to use here, because it is
temporary; a well-tempered user agent will treat it as such, necessitating the
same cascade of redirects on followup visits. The right way to do this is with
a 301, which has a semantic of permanency, and is treated as such by user
agents. But it's still the wrong thing to do.

Maintaining access to older versions of websites is, again, an entirely
desirable thing to do. But if you're going to do it in a way that requires
work on the server (as this design also does), you're better off just having
the server maintain version information and serve the latest available page at
a given URL, in a 200 response, when the URL is accessed.

~~~
mudita
I don't have any opinion on whether 302 or 301 is the better choice, I'd just
like to point out that using 302 seems to have been a deliberate decision:

from [https://4042302.org/how/](https://4042302.org/how/) :

>We use a 302 and not a 301 (permanent redirect) because we want the latest
site to have the chance to override the URL in the future.

~~~
Jaruzel
Google treats 302s and 301s differently when it crawls. A 302 is effectively
ignored and the search index isn't updated - the original URL that threw the
302, is left in. Whereas 301s result in the new redirect target URL replacing
the original in the search index.

By filling up your sites with nested 302s (following this to it's conclusion,
in ~10 years time), is not only a management headache, but may fall foul of
Google (I'm not sure nested 302s, will send positive signals to Google) and
result in your whole site being de-indexed.

------
Macha
So I'm aware of at least BBC using this approach, where opening really old
articles reveals they have their 90s site still up. I'm also aware of at my
last employer, an unmaintained wiki with open security issues that nonetheless
had vital information for still in use legacy internal software was replaced
by saved static HTML grabs so the information wasn't lost.

But for a lot of medium sized companies with dynamic websites, this isn't
always practical. They may not have the know how to dump their 2000s drupal
install to static HTML files, and don't have the IT staff to upgrade and
secure it.

~~~
WJW
I wish people would would change their language regarding tech debt. The
companies of your second paragraph _choose_ not to upgrade and secure their
websites. It's not something unavoidable.

~~~
indigo945
They point your parent poster is making is that these companies instead choose
to shut the website down, which is a legitimate alternative to updating and
securing it -- unlike just leaving it up without maintenance, which is not.

~~~
WJW
That's why I specifically indicated the companies referred to in the second
paragraph of the parent post. The BBC and the unnamed company where Macha was
working did it correctly.

------
moimikey
Have to agree with all of the disagreements to this. The 404 is a 404 for a
reason, just like 301 and 302 are different for a reason. It's not uncommon
though for WordPress to do things like this, or blogs for that matter. If an
author changes the title or date of their post, and the URL structure is
reliant on those two pieces of data, then the URL will change. The old URL is
preserved in a DB and if accessed again, 301s to the newly named resource.
Others, will throw a 404 and give a cutesey Levenshtein message, "did you mean
x?" at which the user can decide to go to the new resource. It's all
circumstantial... It shouldn't be enforced.

Re: Google and PageRank, pretty certain they've addressed this and recognize
302 and 301s and treat them the same. Previously, this was an issue.

~~~
C1sc0cat
Looks like the op is trying to solve a problem of broken links but has come up
with a "brave choice" as they say on yes minister.

The actual solution is put in the work and redirect the old missing page to a
relevant new one.

If I had a link from vogue the BBC etc back in 1996 point to a product page to
want to redirect that now broken link with a 301.

------
Jaruzel
How I've tackled this problem in the past:

    
    
      1. Create a 404.php (or whatever your preferred back-end is)
      2. mod_rewrite real 404s to serve that script.
      3. In that script have a lookup table/db/file that lists all the redirects you need.
      4. Extract the requested URL from the server variables.
      5. Use the lookup table to find the correct URL, and issue a 302 for it to the users browser.
    

It's kinda seamless, and I've been doing it for years.

------
LeonM
There was a post here that was deleted by the author before I could write my
reply. But he ended his comment with this:

> You know why the web is broken? Because nobody cares.

I agree with this.

404s aren't a technical problem, they are a maintenance problem. If there was
time, budget or interest to fix it, the 404 wouldn't have existed in the first
place.

~~~
onion2k
_404s aren 't a technical problem, they are a maintenance problem. If there
was time, budget or interest to fix it, the 404 wouldn't have existed in the
first place._

Doesn't that assume that all things on the internet are permanent? Why should
that be the case? If I have a page on my website and I decide to delete it
then I should be able to do that. Having links that pointed to it return a 404
is correct. 404s are useful. They convey real information.

Sites that do things like 302 redirection to the home page when the link is
apparently incorrect are annoying - you can never tell if the page is really
gone or if the website has _incorrectly_ bounced you to the home page.

~~~
cxr
In addition to minikites's comment:

> Sites that do things like 302 redirection to the home page when the link is
> apparently incorrect are annoying

You seem to be operating on the assumption that this thing is doing something
that it doesn't do.

~~~
onion2k
That wasn't a comment about the thing in the article. It was a comment about
why 404s can be useful.

------
romanr
I’ve read all the pages on that site and still don’t understand. And probably
can be explained in one sentence by someone?

~~~
tartrate
I didn't either, but now I do.

The site is suggesting a best practice to 302 FOUND-redirect you from:

    
    
        <version x>.site.com
    

to:

    
    
        <version x-1>.site.com
    

Until it goes beyond the oldest version in which case you end up with a 404.

~~~
sokoloff
I’d like to be able to show any eventual 404 from the _current version_ of the
site though, which means there may need to be a wrapper around the terminal
site (or more reasonably, code that runs locally on the server to find the
right page URL and 302 directly to that rather than a client round trip for
every version searched).

------
ogre_codes
I've maintained link integrity for a long time on my personal site using a
similar tactic, however for me anyhow the new Wordpress version of the site
had very little URL overlap with the old version so I just converted the old
PHP version to static HTML and and set up nginx to serve both. When I migrate
off WordPress, I expect I'll maintain a similar approach. No reason to
maintain an older version of the site which could have security issues.

------
codingdave
If I'm reading this correctly, this concept work only if you never take down
old sites, and you have a full archive.org copy out there.

This is not realistic for large SaaS apps. I run a SaaS app that has been
online for over a decade, with millions of public documents. While we do
strive to minimize URLs changes, and we do have robust redirects in place for
old URL formats... it is simply not feasible to keep old versions online.

------
ptsneves
My apologies if I am repeating what others have said, but just yesterday I
chose to give a 404 for an operation that could not be concluded because a
resource is not available. A 404 is not about pages but about resources. It is
perfectly legitimate that a resource is not available anymore or is
temporarily down in a non internal error kind of way.

A REST request on a IoT device can return 404 if the device is not available
at the moment. Any redirect is meaningless and actually breaks the semantics
of 404. My understanding is that 404 has been so connected with the file like
persistent resources, that people forgot the elegance of the wording in the
standard.

I can understand the popular usage change, but I think then the standard
actually becomes incomplete because it only fits web pages or document type
resources when an HTTP status code is about Hyper text _Transfer_ Protocol.

Alternate view disagreeing with what postulated above is that, actually HTTP
has been abused and is being used in scenarios that have nothing to do with
Hyper Text, looking at you JSON REST :D

~~~
jrockway
I don't think 404 is the right code for "resource not available". If you want
to send a request to a device, and it's not online, that is "503 Service
Unavailable". The device exists, after all, it just isn't available to handle
the request.

Ultimately, HTTP has a very weak set of status codes for application layer
concerns. The vast majority of the ones that sound good mean something about
the HTTP layer rather than the application layer -- for example, it makes
sense to return "precondition failed" if you request deletion of a directory
that isn't empty (the precondition for deleting directories is that they are
empty), but "412 Precondition Failed" means that the client supplied an HTTP
precondition like "If-Match: abc123" and it failed. Trap for the unwary.

For this reason, I think REST largely fails to provide what API authors and
users desire. If your API server can successfully convey an error to the
client, you might as well say '200 OK {"error": "The Raspberry Pi you were
looking for exists, but it's turned off or someone used its Ethernet cable to
test their scissors and the test went very well."}' Now the client actually
knows what's going on.

~~~
ptsneves
The Ethernet cable scissor test success made me laugh :D. We agree on the
general weak suitability of http codes to convey application layer status, but
I kind of disagree on the 503 because it implies an overload and an internal
error, which is not the case. Also I really believe the resource not a
available is as accurate as it gets. In the end I agree with you we may be
splitting hairs with a very unsuitable status report and should just send 200
with the error. I have done this before actually, but it tickles my neck a lot
to see an OK, then error.

------
throwaway2048
If the redirect chain gets too long browsers will just give up and stop
redirecting.

~~~
torh
That has happened to me, trying to log into Microsoft partner network. Guess
they didn't like Chrome, because it worked when using IE.

------
bfred_it
I recently dropped my last shared hosting in favor of Vercel/Netlify and some
content was lost. This solution wouldn’t work for me because the very reason
why the content was lost is that I don’t want to pay for hosting I barely use.

A better solution would be an intermediary part that never changes — say,
CloudFlare — that caches HTML pages forever, automatically adds a “Archived
content” header to the page, and warns the author so that they can either
allow the archived version or make it a 404/410 instead.

Nobody wants to maintain servers forever, but serving static/frozen pages is
much easier and cheaper.

------
djhworld
On a similar note, my blog is hosted on github pages.

Due to laziness and inertia I've kinda just left it there as it just works,
but the lack of control makes me uneasy.

So at some point I want to move it, hosted by myself (with CDN) and under my
own domain. Except all the links that people have used to link to my GH pages
blog are still out there and I'm not sure how to cleanly redirect visitors to
my own hosted version of the blog.

window.location hack? Has anyone else done this (I appreciate my lack of
foresight on my behalf is a problem of my own making...)

~~~
karagenit
Well, if it stays as a static site, you could always set your server to push
any changes to GitHub as well, so you effectively keep your GitHub pages site
up as a mirror of your personal domain. You could also make all of your
navigation links absolute links that redirect to your new domain.

Or, like you suggested, make a skeleton site out of the current version of the
site where each page just has a bit of JS to redirect a visitor.

~~~
pbhjpbhj
Or use a meta "redirect",
[https://stackoverflow.com/questions/5411538/redirect-from-
an...](https://stackoverflow.com/questions/5411538/redirect-from-an-html-
page), assuming they are able to write to the head-tag.

------
StavrosK
This is a good idea, but I'm afraid it's rather unworkable in practice. If
you're going to remove a page, you either didn't want it up, or you moved it
somewhere else, or your entire site is unavailable. If you moved it somewhere
else, you should add a 301, but for all other cases it's not for lack of will
that the link died.

The nice thing about IPFS is that it has this out of the box. Pages can never
die as long as at least one node has them in their cache, even if the original
owner went away.

------
jonnypotty
At least you can look up what 404 means and reason about what's going on. What
does this change solve? I mean if you moved your site then you would issue a
301 from the old host showing people it had moved. I don't know when exactly
people make lots of different versions of the same website where it makes
sense to fail over to the old one if the new one doesn't work. Baffling

------
gmuslera
It have a very specific use case, and will keep giving a strong visibility to
the old, unmaintained site, for people that had old links to it.

A more comprehensive 404 page (that says that what what you were looking for
is not there, and links to the current and the old site), or redirects for the
most accessed URLs of the old site in the new one are better approachs in my
opinion.

------
phh
After a very quick overlook I thought it was 302-ing to archive.org. I'm
almost sad this isn't the case.

------
tomaskafka
The hard thing isn't the redirect, the hard and expensive thing is finding
where to redirect.

------
globular-toast
I remember learning one of the golden rules of the Web is you don't break
links. It's a shame it's regressed this far. Now it's normal to get many 404s
every day.

Shouldn't it be 301 rather than 302, though?

------
badrabbit
I thought this was about a service that tries to find the right url and
redirect back to your site at first. That would be a nice idea, like a spell
correcting url finder for GET requests.

------
based2
[https://1997.4042302.org/whatdidyoudo](https://1997.4042302.org/whatdidyoudo)
Is it a 404?

------
naringas
"but what about SEO?" \--boss' first question

------
veselin
This is as good as the CVS versioning on a per-file basis. Its problems are
well known and it is not widely used now for a reason.

------
germs12
This is neat but hosting costs could be a burden.

~~~
aidenn0
If you crawl the old version of the site and save off static html, then the
hosting costs would be minimal.

If you added a banner at top saying it's "archived content" then that would
also solve the issue with people being confused by the redirect that other
comments have had.

------
lalos
Clever usage of subdomains

------
swiley
Isn’t 302 the permanent redirect? You’re kind of giving up that URL on that
domain forever then.

~~~
_-___________-_
No, 302 Found is the usual redirect, which is not permanent but also not
explicitly temporary.

~~~
throwanem
It is explicitly temporary.
[https://tools.ietf.org/html/rfc7231#section-6.4.3](https://tools.ietf.org/html/rfc7231#section-6.4.3)

~~~
_-___________-_
thanks, TIL!

------
rafaelturk
1997 version is the best!

