

Ask HN: What's a good database schema for "Top comments in last 24 hours" - jmtame

So my friends and I are hacking away at a project, and we're trying to think of a way to display a list of the top comments in the last 24 hours.  What we have now seems a bit too hackish, so any help would be great.<p>What we have so far is a MySQL table with 25 columns.  1 column is ID (arbitrary), and the next 24 are the 24 hours of a given day (12am, 1am, 2am ... 10pm, 11pm, 12pm).  Every comment has a comment_id and points.  When the comment gets a point, its comment_id is inserted into whatever the hour is.<p>So when comment_id 84 gets a point, and the time is 2:30pm, we stick 84 into a CSV list at the 2pm column.  If the comment racks up 3 points, it will end up looking like this:<p>2pm -&#62; 84, 84, 84, ... other comment_ids<p>When we want to display this list, we basically create a list (descending) looking for duplicate comment_ids.  Is there a better way to do this?  I feel like we're doing something wrong...
======
aristus
Um, very. 25 columns?

Just record each commentid and vote like so:

    
    
      create table vote_history (
        commentid  int,
        ts         int
        primary_key (commentid, ts)
      );
    

Where ts is a unix timestamp. Now you can get rankings for any time slice:

select commentid, count(commentid) as points from leaderboard where ts >
<TIMESTAMP> group by commentid order by points desc;

That gives you to-the-second resolution, which is probably overkill. So
instead you can try this:

    
    
       create table votes_per_hour (
        commentid  int,
        hourid     int,
        score      smallint,
        primary_key (commentid, hourid)
      );
    

Where the value of hourid is the unix timestamp divided by 3600.

~~~
jmtame
thanks =] sometimes my mind jumps to the worst solution first, not sure why.
after you look at something like that, it's sort of a "wow.. duh.." moment.

------
paulgb
SELECT * FROM comments WHERE posted_at > SUBDATE(NOW(), INTERVAL 1 DAY) ORDER
BY karma DESC LIMIT 50;

My SQL might be a bit rusty, but you get the gist of it. Why not do that?

Your question is a little confusing without knowing the schema you are using
already. Do you want the top comments of _each_ hour?

------
russell
I would add a time stamp for the last update and a total column. Update the
total and timestamp every time you update a bucket. Zero any buckets between
the saved timestamp and now. To get your list select comments updated in the
last 24 hours and ordered by total. You will have some rows with artificially
high counts, because of numbers older than 24 hours. Adjust the totals for the
row and keep reading until you have the number you need. If you cache the
list, you dont even need to do the select except on restart. Email me if you
want clarification.

------
bprater
Sometimes denormalizing a database is okay. But when you get into
programmatically mucking with CSV data in a field, you probably have pushed
things a bit too far.

