Having written forum stuff a few times over, I’m not sure you realize what you s...

mod · on Sept 2, 2022

> Be sure the dev separates out the messages from ordering.

Can you explain? I'm not sure what you mean by this.

sroussey · on Sept 2, 2022

The temptation is to have the message date and/or ranking number in the message table. This is fine and good.

The next temptation is to have the index on that date or (date, rank) and use that for displaying the list of messages. This is bad.

For the list of threads (messages without a parent) you want to know which to show… but there are a lot of calculations that would make this inefficient. Is it a root message? Is it deleted? Is it marked by spam or other content filters? Is it approved?

You can start to see how this gets messy. How do you show messages that are approved for one group of readers and not for others? And what is their order? So separate the concerns… in the db. SQL is your friend if the db has the right data packaged and indexed the right way.

It’s a different problem and has a different solution. Now mind you, I cut my teeth on this in the late 1990s and early 2000s when we grew quickly into the Alexa 1000 and farther up. China blocked us as one of the original dozen sites on the great firewall (via dns resolved IPs, not the domain itself… hahaha, oh the chaos they empowered!). Databases were more expensive than engineering time back then, so efficiency mattered. People do dumb things now and use giant clusters when a single machine would do, but I digress…

evandwight · on Sept 3, 2022

Why not an index on created for speeding up the "new" sort?

What indexes should you create? One for each sort you have?

(I'm just playing around cloning Reddit and running into all sorts of performance problems. Mostly due to celery taking all my memory)

sroussey · on Sept 3, 2022

Yeah, these are the performance issue solutions. Based on the way open source databases work (clustered indexes), move your indexes to a table that has the sole purpose of indexing, and index everything you sort by index{search constants, then sorting columns, then id }. You can test with your workload, but I found that full covering indexes sat in ram better (so included the id even though not needed).

20yr ago MySQL didn’t have reverse index columns so we had to make (1-timestamp) cols and index those! Also used int for time as it was easier to index back then and you don’t need sub second or anything. Smaller indexes are better.