Hacker News new | comments | show | ask | jobs | submit login

Please see my other comments in this thread. You're making a judgement about what is happening without knowing all of the information involved in the situation. We diligently redirected as much as possible, and I'm currently managing more than 50000 301s.

I'm currently managing more than 50000 301s.

I feel your pain. I'm assuming you've already got something that works for you, but on the assumption that any problem worth mentioning is probably going to bite someone here eventually:

1) Don't try to manage 301s in server configs. You'll go insane.

2) Make a simple table mapping old URLs to new URLs. You can update this when you make a change that breaks URLs.

3) If you feel like you want to 404 or 501, and your server is not overloaded (if it is, pfft, let that spider eat a 404 and save resources for real people), check memcached to see if you've resolved this recently. If not, check the table and set the cache accordingly. You can then return control to the webserver to serve the cached error page or, alternatively, send the 301 to the proper page.

4) Give the end-user a quick page which returns what URLs are consistently getting 404ed and asks for a best page to 301 them to. Since that page is behind an admin login, you can make it as expensive as you darn well please -- for example, grepping the heck out of a large log file, goign row by row, and searching for a "best guess". You can let the users approve them with one click. (Last time I wrote one I put a little forecast of how much of the marketing budget was saved due to the users' diligence in assigning 301s. Five minutes of work, got me more pats on the back than most project which take 6 months. Apparently the admin staff was fighting over who got to do the URL corrections every day.)

Your users will love you, your database load will be low, your SEO will be awesome, and your crusty ol' sysadmin will not tear out your intestines and use them for a necklace the 47th time you ask for him to add a 301 to the config file.

It sounds like you've done what you can after someone else's negligent work; thanks for that.

If it makes you feel any better, there are approximately 5000 404s in links "in the wild" (according to google crawl error stats) that I have the capability of improving the quality of the redirect for. I may not be able to get 100% accuracy on the redirect, but I can get the user to an issue page of the archive where they can click the story they intended to see. The problem is that there is no metadata associated with those links to get a better idea of what story it was. news1.html isn't terribly descriptive.

To put this in perspective, I get approximately 40000 unique 404s from spiders every day for things that simply don't exist and never did.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact