

Ask HN: How would you implement a threaded forum? - mixmax

How would you go about making a threaded forum, like on HN assuming that you want it to be efficient and deal well with edge cases? And also assuming that you want to load entries from a database, as opposed to having the information in a flat file (As I believe HN does)<p>The obvious and easy way would be to write a small recursive function that loads posts from the database where the parent is the current post. There are two disadvantages to this approach as I see it:<p>1) It doesn't deal well with posts that have children being deleted.<p>2) You would need to make a DB call every time you call the recursive function, resulting in a large number of DB calls to show a thread.<p>The first problem could be solved by not actually deleting the DB entry when an entry is deleted, but merely deleting the text. When you display the post you can check whether the textfield is empty and display [deleted] or something similar.<p>The second problem is a bit more hairy, I think. You could load all the entries of a thread into an array in one DB call and have your recursive function look through the array every time it runs. The problem is that this would be using a Shlemiel the painter's algorithm[1]. So maybe the solution is to delete entries from the array as the recursive function uses them.<p>How would you do it?<p>[1] http://en.wikipedia.org/wiki/Schlemiel_the_painter%27s_Algorithm and http://www.joelonsoftware.com/printerFriendly/articles/fog0000000319.html
======
lacker
You don't have to reflect the threaded nature of the discussion in your
database. Just give every comment a foreign key to the original post or forum
topic, and whenever you're displaying anything about that original post, fetch
all comments about it from the database, and then after the DB part is done,
then worry about which comments you actually want to display. A single DB call
to fetch at most a few thousand comments should be faster than a DB call for
each comment.

------
tom_b
If you need to deal with hierarchies in SQL, look into nested sets. It allows
you to make a single db call to retrieve a full set of parent/child related
items.

It's basically an augmented data structure (adding a left/right value pair to
the data) to allow you to query somewhat like ErrantX mentions (ie, to find
the child tree of a thread with left=x and right=y, select all data where left
> x and right < y). Re: deleted children, I'd just add a column to the table
to store a deleted flag. You can pretty easily move comments to new parents,
but have to recompute your left/right value pairs - I've done this in SQL
before, it's no biggie, but I wouldn't want to be doing it in an app where I
was aiming for high transaction rates (ie, human edits to threads would
probably be ok, driving millions of bank transactions would not).

Joe Celko has a whole book out on hierarchies in SQL and I like it. You can
find decent examples out on the web as well.

------
timmaah
<http://threebit.net/tutorials/nestedset/tutorial1.html>

Using this method on a table with 1.2 million rows, I can get a thread with a
single db call. It hasn't given me a single problem yet, though obviously it
is slower then standard flat list.

------
ErrantX
Someone posted a thread the other day about good ways to store threaded
discussions.

Store it like:

1

1.1

1.1.1

1.2

2

2.1

2.2

etc.

And load it in one database query ( id > x AND id < y or w/e) then traverse
through it. easy.

EDIT: I would handle deleted like you suggest.

~~~
mixmax
There are a few problems with this approach IMHO:

\- If the discussion is sorted by something else than the counter, karma for
instance, it isn't quite as easy.

\- When you're moving posts you need to look at all the children and make
changes ccordingly. This might not be a huge problem in itself, but there are
problematic edge cases. What if someone makes a rely to a thread that has just
been moved for instance? It would get archived wrong.

\- If you want to encapsulate a post and it's children (if for instance you
want to be able to hide children w. javascript) you need to write out the HTML
using a recursive algorithm.

Besides, it doesn't seem like a very elegant solution ;-)

~~~
swolchok
"- If the discussion is sorted by something else than the counter, karma for
instance, it isn't quite as easy." ORDER BY karma?

------
ScottWhigham
I think the first problem could be better solved with a column that says
"Include this row when displaying". You could simply have a DisplayOnsite
column that's a bool. If true, display. In your view/proc/method/whatever (in
SQL), just eliminate any rows WHERE DisplayOnsite=0.

One thing pops to mind: Do you ever want to change the order of the threaded
comments? For example, if you are having a Q&A forum, do you want people to be
able to "Mark as Correct Answer" and then you sort by the most votes for
correct answer first? That would affect the hierarchy.

