Hacker Newsnew | comments | show | ask | jobs | submit login

I'm currently managing more than 50000 301s.

I feel your pain. I'm assuming you've already got something that works for you, but on the assumption that any problem worth mentioning is probably going to bite someone here eventually:

1) Don't try to manage 301s in server configs. You'll go insane.

2) Make a simple table mapping old URLs to new URLs. You can update this when you make a change that breaks URLs.

3) If you feel like you want to 404 or 501, and your server is not overloaded (if it is, pfft, let that spider eat a 404 and save resources for real people), check memcached to see if you've resolved this recently. If not, check the table and set the cache accordingly. You can then return control to the webserver to serve the cached error page or, alternatively, send the 301 to the proper page.

4) Give the end-user a quick page which returns what URLs are consistently getting 404ed and asks for a best page to 301 them to. Since that page is behind an admin login, you can make it as expensive as you darn well please -- for example, grepping the heck out of a large log file, goign row by row, and searching for a "best guess". You can let the users approve them with one click. (Last time I wrote one I put a little forecast of how much of the marketing budget was saved due to the users' diligence in assigning 301s. Five minutes of work, got me more pats on the back than most project which take 6 months. Apparently the admin staff was fighting over who got to do the URL corrections every day.)

Your users will love you, your database load will be low, your SEO will be awesome, and your crusty ol' sysadmin will not tear out your intestines and use them for a necklace the 47th time you ask for him to add a 301 to the config file.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: