Hacker News new | comments | show | ask | jobs | submit login

Have you considered putting a caching reverse proxy in front of the arc app to keep the backend from having to render all of the old pages?

It seems like the only dynamic element of old articles is the "$x days ago" bit and that'd be pretty easy to turn into something static by instead just putting in timestamps in the actual HTML and using Javascript to transform them into how many hours / days ago they were. Then the crawlers would just be pulling out cached, pre-rendered HTML.

There's an example of doing such with nginx here:

http://serverfault.com/questions/30705/how-to-set-up-nginx-a...

With that you'd just have to send out the HTTP header from the arc app saying that current articles expire immediately, and old ones don't.




I believe Rtm has already set one up.


The conspicuous lack of a "Server:" header inclines me to believe that that's probably not the case (most web servers set one indicating the server software and version). Here are the headers that HN sends out from an old post (20 days ago):

  HTTP/1.1 200 OK
  Content-Type: text/html; charset=utf-8
  Cache-Control: private
  Connection: close
  Cache-Control: max-age=0


My favorite part of HN's headers: the lines are separated by naked LFs instead of CRLF, in violation of the HTTP spec


This is common violation that everyone accepts. It's definitely done by 'bad' clients - not sure how often servers send bare LF.

(I used to telnet to port 80 for testing, and type GET / HTTP/1.0 <enter> <enter>, and that should be LF on Linux & Mac)


You don't have a problem with one of the most trafficked sites for programming/web startup-related news implementing HTTP incorrectly?

Do you ignore whether your HTML is valid just because the browser rendered it correctly?


Yup.

I've got real work to do. Making a validator happy is fake work.


By ensuring that your pages are valid, you make it ever so much more likely that you will not have to scramble around wasting time at a most inopportune time when the new version of a browser comes out which handles your non-standards compliant tag soup differently than the current version of the browser.

So, do you want to pay the price upfront when you can plan for it or afterwards when the fix must be done immediately because customers are complaining?


In my experience, I've had to scramble to fix browser compatibility issues every bit as often with 100% standards-compliant code, as I have had to with some incredibly laughably bad HTML.

I'd much rather pay the exact price later, than an inflated price now.


Some of us actually care about interoperability, maintainability and writing good code in general as opposed to just cowboying stuff together as quickly as possible


I used to preach the same thing.

Then, working at a startup taught me that it's not black and white. Several quotes come to mind, but Voltaire's is my favorite:

"The perfect is the enemy of the good."


Having worked at startups with both cowboys with a "get 'er done" attitude and people who actually care about software craftsmanship, I'll take the latter any day.

Cowboys may get things "done" quickly, but that doesn't help when things are subtly broken, have interoperability problems, or are nearly impossible to extend without breaking.


Why don't you bother doing your real work right the first time? As long as there's a well defined spec, you might as well follow it instead of being creative and original when it comes to implementing standards.


You don't know that everyone accepts it. Even if they did, it doesn't make it right.


I fixed submitted a patch for this in the pecl_http PHP library:

https://bugs.php.net/bug.php?id=58442


We use varnish for caching and check the useragent for requests.

If the cache has a copy of an article that is a few hours old it will just give that version to Googlebot while if it thinks a human is requesting the page then it will go to the backend and fetch the latest version.

https://www.varnish-cache.org/lists/pipermail/varnish-misc/2...


+1 for varnish. It's stupidly[1] fast and there shouldn't be much trickery required to deflect most of HN's traffic (e.g. ~10 sec expiry for "live" pages, infinite expiry for archived pages).

[1] 15k reqs/sec on a moderate box




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: