Hacker News new | past | comments | ask | show | jobs | submit login

I never understood why HN has such a peculiar URL for accessing pages. It times out after a while too, is that to stop crawlers?



They are ids to lookup closures in a database. They time out to stop the database overflowing ;) It's called continuation-based web development [1], popular with Lisp and Smalltalk-based web servers (because who else has continuations?)

[1] http://en.wikipedia.org/wiki/Continuation#In_Web_development


There's no database.

EDIT: to all the people arguing with me. Read the source code to Hacker News.

    (= fns* (table) fnids* nil timed-fnids* nil)

    ; count on huge (expt 64 10) size of fnid space to avoid clashes

    (def new-fnid ()
      (check (sym (rand-string 10)) ~fns* (new-fnid)))

    (def fnid (f)
      (atlet key (new-fnid)
        (= (fns* key) f)
        (push key fnids*)
        key))

    (mac afnid (f)
      `(atlet it (new-fnid)
         (= (fns* it) ,f)
         (push it fnids*)
         it))
They are in memory. Which is why they expire randomly when the HN process is restarted.


That's probably true John. If I may mis-quote Greenspun:

"Any sufficiently complicated Lisp program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of an actual database."


Depends on what you consider a database.


The fnids do not expire randomly due to restarts; they expire when there are too many or they timeout so memory doesn't fill up with these continuations. Personally, I don't like this continuation-based approach since "Unknown or expired link" is a really bad user experience.

Way back I wrote a bunch of documentation on the Arc web server if you want details: http://www.arcfn.com/doc/srv.html Look at harvest-fnids which expires the fnids.

The fnid is an id that references the appropriate continuation function in an Arc table. The basic idea is that when you click something, such "more" or "add comment", the server ends up with the same state it had when it generated the page for you, because the state was stored in a continuation. (Note that these are not Scheme's ccc first-class continuations, but basically callbacks with closures.)

(The HN server is written in Arc, which runs on top of Racket (formerly known as PLT Scheme or mzscheme))

Edit: submitted in multiple parts to avoid expired fnids. Even so, I still hit the error during submission, which seems sort of ironic.


There's always a database.


Racket (arc's host language) keeps continuations on the filesystem, or you can write your own "stuffer" to do what you want with them (store them in a database or whatever). But you have to keep them somewhere or else (assuming the server uses continuations) you can't keep track of the user's path through your code as they click through links and such.

Racket does have an option to serialize the continuations, gzip them, sign them with HMAC, and then send all of that to the client so the server doesn't have to keep track of anything, but HN doesn't use it.

See http://docs.racket-lang.org/continue/#(part._.Advanced_.Cont... for a quick introduction.


HN doesn't use racket. It's a custom lisp based on scheme.


Sure it does. HN is written in "Arc":

https://github.com/wting/hackernews

Arc runs on Racket:

http://en.wikipedia.org/wiki/Arc_(programming_language)

See the "OS" section where it says "runs on the Racket compiler"

See also the Arc source code, https://github.com/Pauan/ar/blob/arc/nu/arc Note the "#lang racket" at the top.


Then where is the data that is associated with "b7VO4wED8MRumCeiX5fCnF" stored? How is that data requested? There certainly is a database, it is just most likely not a traditional database that most people think of.


where is the data that is associated with "b7VO4wED8MRumCeiX5fCnF" stored

In the Racket process that's running the Arc code for news.yc


Oh, wow. I had assumed that people who visited around the same time got the same next page URL, maybe as part of a caching strategy or something.

This way seems impractical, TBH. Certainly for the user - the expiration a bit of a nuisance, as I'll get it more often than not if I read a couple of stories and then click 'More'.


I believe it's a continuation-style server—hence fnid: function id—and the continuations are only kept around in memory for ~5 minutes.


I never understood why the continuation couldn't instead be addressed by a URL path. It could even get constructed from URL/query data if it moves out of memory, so keeping them in memory would only be a caching mechanism.


Every continuation framework I've looked at radiates an intense desire to treat the web as something other than what it actually is.


Yes, but the way hn works, the continuations map uri's to browser sessions so they can be expired on a more granular basis than all at once or lru or what have you. I'm guessing here, though, I've not looked through the code.

[addendum] I Reread your comment and realized that wasn't what you meant at all. What would be the benefit of path based uris over query string params in hn's case? I only see how they be equal, not better.


I meant that the continuation could be identified by (session_id, URL data), and re-created based on this data if it goes missing.

URL data can come from path or query string, it doesn't matter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: