Hacker News new | past | comments | ask | show | jobs | submit login
Designing a website without 404s (pillser.com)
112 points by lilouartz 5 months ago | hide | past | favorite | 102 comments



I understand why the author thinks this is a good idea, but 404 exists for a reason. Literally "Not Found", not "I will guess at what you are looking for".

The "implications for SEO" will almost certainly not be "positive" for quite a few reasons. As a general rule in my experience and learning, Google's black box algorithm doesn't like anything like this, and I expect you will be penalized for it.

There are many good comments already and my suggestions would merely be repeating them, so, just adding my voice that this is likely to be a bad idea. Far, far, better to simply have a useful 404 page.

Edit just to add: if you do something like this, make sure you have a valid sitemap and are using canonical tags.


This is absolutely right. The duplicate content flags alone will cancel everything out.

I don't understand why people don't just read Google's own published guidelines ( https://developers.google.com/search/docs/fundamentals/seo-s...) on how to properly SEO one's site.

It's not some dark art. Google themselves tell you exactly how they index and why. All you have to do is read the guidelines.


In that document they explain that duplicate content is not a big deal: just set a canonical, and respond maybe with a 301.

I did this during a migration of a site that used '/<pk-int>/' and was changed to '/<slug>/' with corresponding 301. The SEO not only didn't punish the migration, but it seemed to like the change (except the Bingbot, that after five years still request the old Urls).

The problem I see with the OP strategy is that the bot can hit a Url that still doesn't exist, get 301'ed with that technique, and the Url is reused afterwards for new content: following their example, somebody links to '/supplements/otherbrand-spore-probiotic', that gets a bot following that link 301'ed to '/supplements/spore-probiotic'. Later you actually add 'otherbrand-spore-probiotic', but that Url will never be visited by the bot.


What about using 302?


What about 308?


I mostly agree, but also... it kinda is a dark art? I just read the "Reduce duplicate content" section [1] because my gut reaction was "yeah, I agree, this guy is going to get SEO-penalized for this" but of course I don't actually know. And although my few years of experience dealing with this stuff still want to say "you will get SEO-penalized for this", Google's own guidelines do not: "If you're feeling adventurous, it's worth figuring out if you can specify a canonical version for your pages. But if you don't canonicalize your URLs yourself, Google will try to automatically do it for you. [...] again, don't worry too much about this; search engines can generally figure this out for you on their own most of the time. "

[1] https://developers.google.com/search/docs/fundamentals/seo-s...


> It's not some dark art. Google themselves tell you exactly how they index and why. All you have to do is read the guidelines.

This reminds me of something similar in the mobile world: Apple's App Review guidelines[1]. They're surprisingly clear and to the point. All you have to do is read and follow them. Yet, when I worked for an App developer, the product managers would act as though getting through App Store review was some kind of dark wizardry that nobody understood. Them: "Hey, we need to add Feature X [which is clearly against Apple's guidelines]." Me: "We're going to have trouble getting through app review if we add that." Them: "Nobody knows how app review works. It's a black box. There's no rhyme or reason. It's a maze. Just add the feature and we'll roll the dice as usual!" Me: "OK. I've added the feature and submitted it." App gets rejected. Them: "Shocked Pikachu!"

1: https://developer.apple.com/app-store/review/guidelines/


While these are clear and to the point, they are absolutely not an exhaustive list of rejection reasons.

Going through any forum thread of "reasons you've been rejected" and trying to find the reasons here isn't obvious.

I couldn't find anything in there pertaining to why we were rejected last month (mentioned Android in our patch notes as it's a third party device we integrate with).


I haven't work with iOS in a few years, so perhaps they are more consistent these days, but in the early days I built a web browser with a custom rendering engine to see if I could get it through review and it was accepted just fine. It was accepted just fine through several releases, in fact. According to the guidelines at the time, alternative browser engines (i.e. not WebKit) were prohibited. It should have been rejected if the guidelines were followed. But they weren't. It really was a dark art figuring out when and when they wouldn't follow them.

I'm with your colleague on this, at least in a historical context.


Doing something not mentioned by the guidelines and hoping it gets through, I can understand. Doing something explicitly prohibited by the guidelines and hoping it gets through seems like an unwise use of resources.


There seemed to be no illusions around in the risk of the gamble in the OP's case. The comment states it was discussed. The consensus simply felt the gamble was worthwhile – even if they ultimately lost.

I was also well aware of and accepting of the risk in my case. I wanted to learn more about certain iOS features, and I was able to do so scratching an itch I felt like scratching. I would have still gained from the process even if it had been rejected. However, as it got approved, contrary to the guidelines, I also got a good laugh at how inconsistent review was and a nice cash bonus on top!

You win some you lose some. Such is life.


If you look at the recently leaked documents Google often do not tell the truth, or the whole truth.

301s are what you are supposed to do with things like changed URLs, so I cannot see it being a problem in this case. A 301 is nor duplicate content - it is one of the things Google likes you do do to avoid duplicate content.

It would probably be good to add a threshold to the similarity to prevent urls redirecting to something very different.


I would use 308/307 over 301/302.

Semantically, using a 303 redirect might be the most appropriate signal for what they are doing.

They could redirect to a specific result page if it's a clear, unambiguous match and they could redirect to a search results page if there are other possible matches.


If the pages redirect and have a canonical irl, there should be no problem, right?


There is always a possibility of a problem, even if you do everything right. If you are redirecting multiple urls to one page, for example, you may believe you are helping visitors reach the most relevant/helpful result but it could also look like you are trying to artificially boost that one page?

Then there are the vagueries of a search engine becoming ever more dependent on ai where not everything makes rational sense and sites can get de-indexed at the drop of a hat.


Do you have any evidence that redirects are bad for SEO? Redirects are a normal part of the web and Google tells you they are fine with it https://developers.google.com/search/docs/crawling-indexing/...


No, Google are infallible and never tell lies...


Except there's no duplicate content - the site redirects


The leaked documents are probably more useful than any "official" guide.


Agreed. You could make a very useful 404 page with the list of candidate URLs being used to do the redirect: It's exactly the thing you want to display in a "Did you mean x?" message on that page.


Yes, I think this is a much better implementation of the idea.


I hate websites that redirect URLs that should be 404s to something else like their homepage or whatever they think is relevant. My brain has to slam on the brakes and make sense of the absolute dissonance between what I expected and what I'm seeing, which is taxing.

A 404 doesn't cause nearly as much dissonance, because the website is at least telling me why what I'm seeing is not what I expected.


Yes, especially when you multi-search and open links in background, only to see a bunch of homepages later and no way to restore search context to try searching it deeper. And then there are “special” sites that redirect to google.com.


The last SaaS application I worked on was having issues similar to what you are saying. Too many distinct urls were arriving at the same page content and Google thinks this is you trying to do something naughty unless they all contain metadata for the same canonical URL, which we weren't handling right.

Not only did it improve our page rank once we fixed it, it also reduced the amount of bot traffic we were fielding.


Why stop at fuzzy matching? Just pass the original page to ChatGPT, telling it to generate new content and Google will think you are producing loads of additional pages… no need to worry about duplicate warnings at all :-/


I don't understand why you think this is a bad idea - 301 redirects exist for a reason. Redirecting to what the user probably wants is better than a 404 page.

That said, typically most people are going to find a site via search or clicking a link, which is why I think basically no one bothers doing this.


301 is "Moved Permanently". The content in question wasn't (necessarily) moved. If you control and know for sure that old URL was renamed to something else, then yes - 301 redirect is appropriate. But guessing and redirecting isn't that.


How is this going to affect Google's indexing of the site? They hit the front page and follow links, right? They don't try random perturbations of the URLs and see if they resolve to something else? Or do they?


No, but Google will find and spider the URLs found from at least two sources:

1) Users may (inevitably will) share links to the URLs on social media / wherever, and Googlebot et al will find those links and spider them.

2) The URLs will be sent to Google for any Chrome user with "Make searches & browsing better" setting enabled.

https://support.google.com/chrome/answer/13844634?#make_sear...


WordPress does this sort of thing, or at least can (I don’t know, I’ve only ever been involved with a couple of WordPress sites). It’s obnoxiously awful. You end up with people accidentally relying on non-canonical URLs that subsequently change in meaning, or using links that seem to work but actually give the wrong content and you don’t notice. Both of these are very often much worse than a 404. There’s also an issue where articles can become inaccessible from their correct URLs, due to something else taking precedence in some way; I wouldn’t say that’s fundamental to the technique, but I do think that the philosophy encourages system designs where this can happen. (Then again… it’s WordPress, which is just all-round awful, technically; maybe I shouldn’t blame the philosophy.)

So no, please don’t ever do this.

But what you can do is provide a useful 404 page. Say “there’s no page at this URL, but maybe you meant _____?” Although there’s a strong custom of 404 pages being useless static fluff, you are actually allowed to make useful 404 pages. Just leave it as a 404, not a 3xx.

(Also, care about your URLs and make sure that any URL that ever worked continues to work, if the content or an analogue still exists at all. Distressingly few people even attempt this seriously, when making major changes to a site.)


Agree. When it's clear the content is no longer there, one of the big values of a 404 page is that you can at least hit Archive.org straight afterward to see if there's a version there.


Technically the 410 status code would be the correct one in that case.


Agree and was my first thought as well. A useful 404 seems favorable in most cases, although the author notes that the URLs are constantly changing, being rebranded, and so it might be the simplest solution for the supplement pages. However, I'd probably __not__ apply it to the rest of the website like the author suggests.


You could even make your 404 page embed your search, knowledge base and FAQs.


There used to be a plugin (or something) that would let you play Zork on a 404 page.


> the implementation is so simple that I am surprised that more websites do not implement it

I might be wrong but for me it’s because IRL this isn’t an issue. Users shouldn’t be finding/using random URLs to navigate the site. Where is this broken URL traffic coming from anyway? Are you trying to solve for people that randomly edit the URL and expect it work, most people don’t care about those users getting 404 because they should expect 404. They’re not real users they’re just playing around.

However, If you purposely changed the URL format after a lot of people have the old format bookmarked or indexed on the web, then do a 301 redirect to the new URL.

I’m not sure of the SEO implications of the described solution, however it seems like only risk and no upside.


Agreed, when I have done a mass migration of old format urls I started with 302s to ensure that things worked then moved to 301s. I actually wrote a script that would capture and collect all urls that led to a 404 (just in case I missed an old format link).

301s also serve to inform search engines that your URL should be different and they update their index accordingly. My transition went very smoothly but it took prep work on my part to understand that 404s were inevitable and that I needed a plan to migrate that traffic.


Link rot is a thing. Scroll around some older reddit threads with links and see how many of them are still active. Site redesigns are often the cause of older links rotting to become no longer valid.


Didn’t my comment address that though? That’s not what the author is discussing. The author is trying to make nonexistent URLs resolve. These URLs they’re solving for were never valid.


link rot doesn't happen by the slug in a url subtly changing


Hey everyone! Thank you for your feedback.

Whether it is positive or negative, I do appreciate it as it helps me to learn and improve the product. I really didn't expect this to get any attention, let alone dozens of comments!

To clarify: This was originally designed to help me auto migrate URL schema. I am learning as I develop this website, and SEO has been one of those vague topics where there are few hard rules. I wanted to leave space for experimentation. As I rolled it out, I became intrigued with how it functions and wanted to share my experiment with you to get feedback.

Based on the feedback, I plan to change the logic such that:

- I will track which URLs are associated with which products - If user hits 404, I will check if there was previously a product associated with that URL and redirect accordingly - If it is a new 404, I will display a 404 page which lists products with similar names

I appreciate everyone hopping in to share their perspective!


> If it is a new 404, I will display a 404 page which lists products with similar names

Best compromise is that. It is still a 404 but with a "Did you mean" element to it, which is still useful to end users


> I am surprised that more websites do not implement it

Maybe because it is not a good idea. Masking errors is harmful. The information "what you're looking for is not there" is very important, because it lets users identify that something is wrong. Smart redirection can be outright dangerous: what if I am buying a medication, and the smart website silently replaces the correct drug with something similar but wrong? Not to mention the pollution of the indexes of search engines with all the permutations of the same thing they may discover. Lastly, the U in URL stands for unique; the web is designed around unique locators and this rule shouldn't be broken without a very good reason.

Show the user (and the crawlers) a 404, and suggest your corrected URL in the content of.the 404, and let the user know that it's a guess, so they make an informed choice about the situation.


I agree with these points, but the "U" is Uniform:

https://datatracker.ietf.org/doc/html/rfc1738


> Lastly, the U in URL stands for unique; the web is designed around unique locators

It doesn't in a literal sense and not even in a practical sense.

The web in no way is designed around unique locators (not even unique identifiers). And for the search engines you point to there is metadata like `link rel=canonical` to help them around the obvious reality that one webpage may have many different locators (and you will find that used on most major websites)


Most people who use this site would use the search bar, which reports errors, and those who directly type it in aren't exactly laypeople.


Lay people certainly might bookmark a medication they buy often, and if "the smart website silently replaces the correct drug with something similar but wrong" they might end up accidentally ordering and even consuming it.


This is the key to why “nothing is better than a best guess” is the right approach.

Answers that appear to be right can be worse than no answer at all.


Until the link changes and google redirects them or their email link incorrectly. Without 404s you get hit with a google duplicate penalty as many urls resolve to the same page.


Well how about this instead:

- Keep the 404 page

- Use autocorrection only for minor typos



If you're going to do this then make sure you use 302 redirects (temporary) and not 301 (permanent). Otherwise browsers (and Google) will cache the redirect, then if your fuzzy matched URL one day becomes a real URL, people might not be able to access it.

Someone could seriously mess up your site by simply publishing their own page with many invalid links to your site, basically a dictionary attack, and if Google was to crawl those links, they'll cache all the redirects, and you'll have a hard time rectifying that if you were wanting to then publish pages on those URLs.

Also to reiterate other suggestions - your idea is not great for many reasons already stated (even with 302s). As suggested, just simply have a 404 page with a "Did you mean [x]?" instead. Use your same logic to present [x] in that example, rather than redirect to it.


I am still rolling this out to other parts of the website, but here is the new 404 page format https://pillser.com/engineering/2024-06-10-website-without-4...

As described elsewhere, here is how the new logic will work:

- I will track which URLs are associated with which products

- If user hits 404, I will check if there was previously a product associated with that URL and redirect accordingly

- If it is a new 404, I will display a 404 page which lists products with similar names


This just seems like it's co-opting the url into a "search". Just have a normal search page where users can type in what they want if they want fuzzy matching. Or have the 404 page contain a search page. I don't think using the url as a "fuzzy search" is the right application.


You can do this and still 404, even. There is rarely a good reason not to 404 when you don’t recognize the URL, even if you want to try to do something helpful with the 404 page.


Yup. I would highly recommend a helpful 404 page rather than "silently" redirecting the user to a different page that they didn't type. Just put a search box, a link to the homepage and "Did you mean these similar pages:" with a list of links.


My 2x: i've seen this implemented in the past and appreciated it. I think they used keyword match too, not just uri.

That said iirc it came up because they didn't have a proper 301/302 configured


This doesn't prevent linkrot but it can definitely cause it. If the author ever so slightly changes his algorithm URLs that previously worked (but were incorrect) could stop working.

I don't see the problem that this is a solution for but I can see a couple of problems that this solution causes.


Not just the algorithm. If a new entry is added that ranks higher for a short URL then you will break the old URL.


Yeah. This is the same mentality that causes search engines, AI and many foolish and frustrating humans to avoid at all costs saying the words “I don’t know”


Oops, I fat-fingered a URL for serotonin and ended up with a sleeping pill instead:

https://pillser.com/supplements/merotonin

I am surprised that more websites do not implement this!



Except that URLs are an API surface and congratulations, now you are supporting an infinite API forever.

Vendors change product names, hyperlinks break! Fix bugs or change behavior, hyperlinks break! Do nothing, believe it or not, hyperlinks break!


I half like the idea.

I do think that it's ok for several URLs to point to the same content. In his example all three are fine. I also tried the product code (6066) without any of the text and it worked fine as well.

I've also noticed Lego's site does a version of this as well. https://www.lego.com/en-us/product/10334 will take you to the product page for the Retro Radio. However, I think Lego's site is just keying in on the product id as "retro-radio" doesn't work, but "ret-rad-10334" does.

But there are limits.

I put in the URL "https://pillser.com/supplements/go-fuck-yourself" to see what would happen. Now, I chose an offensive phrase to increase my chances of not coming close to any real product. I believe that URL should 404, but it took me to the page for the supplement "On Your Game" instead. If I had tried a real name and got taken to something with only the barest resemblance to the name I tried, I wouldn't be thinking "This must be the closest match". I'd think the site did something messed up or I typed something wrong or something malicious had happened.


That seems like a solution for a problem that shouldn't exist in the first place.

> a product is renamed, or the logic used to generate the URL changes.

In that case you should store both urls and have a redirect_to_id or something similar to give search engines and users a proper 301. I don't see a use case for this fuzzy matching which will just make things not very explicit and unpredictable.


Well nice proof of concept, but probably this is not what you intended? https://pillser.com/supplements/fuck Maybe better create a 404 page that shows some "Did you mean..." content?



Bad idea. Better would be to tweak it and instead serve a 404 with a recommendation "Maybe you meant X"


I think this is a good idea for a site like this focused around searching for things because if you're going here to look up "revive" you're already expecting a search result not necessarily expecting to find a precise "revive" out of the gate. Adding this to the rest of the site or for other non-search focused sites showing something the user didn't ask for just because it's similar could be a lot worse experience than signaling "not found!". Especially for how rare a plain incorrect URL is a problem vs an expired resource.


I have to question how many people in 2024 are navigating the internet by going to the address bar and typing “foo.com/thing-im-looking-for”


Every damn time I need to find a ticket in Jira. It's easier and faster to open a random ticket and edit the URL than it is to use the UI.


I, for one, do that quite often where there is a consistent enough URL structure that "thing-im-looking-for" might reasonably be there. I expect I would use that method even more often if the likelihood of success was higher across the vast array of internet resources. It is certainly an ergonomic way to navigate the internet.


This is both a good and bad idea.

It is good because it will help most people and work for them.

It's bad because sometimes it will make people think they've found what they were looking for when what they had doesn't exist at all - but it gave them something that sounds similar.

I would at least have a "redirected from" banner at the top of the page when it triggers.


> It's bad because sometimes it will make people think they've found what they were looking for when what they had doesn't exist at all - but it gave them something that sounds similar.

That would be true if all visitors would understand 404 pages exists, and they expect one for this website. I bet most people wouldn't know what to answer to the question "What you should see if there is no page for this supplement?"

It is true that there coul dbe some sort of message making the redirect explicit.


Don’t listen to people telling you this is a bad idea. 301 similar 404 to the correct page and be done with it. Migrations are all too common. Google webmaster guidelines ruled and restriced creativity for long enough, so damn Google. If getting completely deindexed in Google would floor your business you got it all wrong anyway. Focusing on Google too much will get you exactly in a position where you don’t want to be. It’s their job to be able to index correctly all sorts of server configuration and assign link value correctly so their algorithms get it right. You are doing it well. Let’s just stop Google pleasing and focus on the user and good marketing


I only redirect urls with ids. E.g. the canonical is "/:id-some-slug", so if the url matches the id but not the slug — I redirect it to the canonical.

For everything else you can just have a nice 404 with suggestions of links that probably are a match.


I did something very similar to this but I have since change the way my website works. I pulled everything from a rss after page load then I came up with a way to show a n amount of results the user might have wanted to navigate. I think my approach was cool but ultimately I think it’s better to give a 404 error then to redirect someone. Here’s the post if anyone cares https://decode.sh/redirecting-users-to-the-correct-page-afte...


I think this confuses the distinction between a database and a website. What the OP describes is a ~NLP interface to a database of resources, not a "web of resources".

The web by definition is a lazily-materialized query response graph.


This really isn't crowing but we created a dynamic site that used HyperCard as a CGI and we did that in 1984. Not kidding. Mosaic'd up to the hilt.

Still, it's a good idea.

You can further this idea (especially when the slug returns nothing) by having this page also list "Best Bets" or what people most often come to your site for (regardless of any search query, perhaps, with their referrer, or on this day of the week etc)

And additionally, put the slug (bar the dashes) into a search box so it might be ammended (but tell them that you didn't find anything and they need to try something else).


Must have been later than 1984 or I'm misunderstanding - HyperCard was released in the late 80s, and NCSA Mosaic wasn't released until 1993.


I'd definitely cache+throttle that if possible. Cache the 404 response, and if using a shared dB make a custom user/resource group/workload queue and put this service at a lower priority

Not sure how to best do that in postgres though, closest I can find is reserved connections per user. Idk maybe there's an extension or it's easy to do it in the webserver

https://www.postgresql.org/docs/current/runtime-config-conne...


I'm a huge fan of "do what I say" and try to avoid "do what I mean". Sometimes you want magic but most times I want just static and plain logic over which I can reason about.

Slightly OT but came into my mind when thinking about designing website related stuff: https://www.w3.org/Provider/Style/URI


I dislike when websites have an "access denied" page when accidentally clicking on an inaccessible or non-existent page. It's like a door hitting you on the way out of a store.

Here is a partially fixed issue https://hatonthecat.github.io/Hurl/404.html


On a not too outdated and fairly vanilla and permissive ubuntu/FF+ublock setup I can see the page load then everything clears. Not even reader mode fixes it. Probably unrelated, but the very basics not working makes me want to dismiss design suggestions right off the bat.


Having a small website and query the databse each time some spider comes to the wrong url can be done. But having a large website with milions of different urls, its impossible to query the DB with some similarity function.


If spore-probiotic is enough to identify the page, why use an id in the slug at all?


if you're reading this, please don't replicate the idea

this is not going to end well...


I think a more honest solution would be to 404, and display a page with potential matches (like the top 5-10 similar-ly named URLs).

"Hey that page doesn't exist, but there are some similar pages..."



It definitely has a 404 response, though:

https://pillser.com/engineering/failure%20mode


this is a terrible thing to do. not only does it mean a link to the site will unpredictably change its referent, it's also impossible to archive in a way you can restore


I'm struggling to understand the real-world benefit to this. Do most people even manually type URLs (especially long ones which would lend themselves to mistyping)?


No. Do not do this.

You show me the links you think I want, on the 404 page.


Before reading the post I thought the strategy would be similar to those wikis that allow to create new pages when not found the link


pillser.com/supplements/vitamin-1973-omg-i-can-type-anything-here-and-it-still-works-i-dont-think-this-is-good-idea

clickable:

https://pillser.com/supplements/vitamin-1973-omg-i-can-type-...


You can do that with many websites that have ids in their SEO slugs, this is usually not an issue as it's still standardized in a way that the string is just split into the id and the rest and you can look it up with both parts.

Popular libraries like https://github.com/norman/friendly_id implement it like that too.


Apparently the author also likes websites without content because it just shows a white screen.


If you solve the problem in the database you may bypass restrictions implemented in the server.


Am I the only one who thinks this is a recipe for disaster?


These are 303s then?


> https://pillser.com/engineering/yeah%20right

404 Not Found Nothing to see here.

Pillser


"At the moment, I've applied this logic only to the supplement pages, but I am planning to extend it to the rest of the website."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: