I can pretty much guarantee there are elements of this you're not considering which are addressed there (though there are also elements which Farmer and Glass don't hit either). But it's an excellent foundation.
Second: If you're going to have a quality classification system, you need to determine what you are ranking for. As the Cheshire Cat said, if you don't know where you're going, it doesn't much matter how you get there. Rating for popularity, sales revenue maximization, quality or truth, optimal experience, ideological purity, etc., are all different.
Beyond that I've compiled some thoughts of my own from 20+ years of using (and occasionally building) reputation systems myself:
"Content rating, moderation, and ranking systems: some non-brief thoughts"
⚫ Long version: Moderation, Quality Assessment, & Reporting are Hard
⚫ Simple vote counts or sums are largely meaningless.
⚫ Indicating levels of agreement / disagreement can be useful.
⚫ Likert scale moderation can be useful.
⚫ There's a single-metric rating that combines many of these fairly well -- yes, Evan Miller's lower-bound Wilcox score.
⚫ Rating for "popularity" vs. "truth" is very, very different.
⚫ Reporting independent statistics for popularity (n), rating (mean), and variance or controversiality (standard deviation) is more informative than a single statistic.
⚫ Indirect quality measures also matter. I should add: a LOT.
⚫ There almost certainly isn't a single "best" ranking. Fuzzing scores with randomness can help.
⚫ Not all rating actions are equally valuable. Not everyone's ratings carry the same weight.
⚫ There are things which don't work well.
⚫ Showing scores and score components can be counterproductive and leads to various perverse incentives.
I'm also increasing leaning toward a multi-part system, one which rates:
1. Overall favorability.
2. Any flaggable aspects. Ultimately, "ToS" is probably the best bucket, comprising spam, harassment, illegal activity, NSFW/NSFL content (or improperly labeled same), etc.
3. A truth or validity rating. Likeley rolled up in #2. But worth mentioning separately.
4. Long-term author reputation.
There's also the general problem associated with Gresham's Law, which I'm increasingly convinced is a general and quite serious challenge to market-based and popularity-based systems. Assessment of complex products, including especialy information products, is difficult, which is to say, expensive.
I'm increasingly in favour of presenting newer / unrated content to subsets of the total audience, and increasing its reach as positive approval rolls in. This seems like a behavior HN's "New" page could benefit from. Decrease the exposure for any one rater, but spread ratings over more submissions, for longer.
And there are other problems. Limiting individuals to a single vote (or negating the negative effects of vote gaming) is key. Watching the watchmen. Regression toward mean intelligence / content. The "evaporative cooling" effect (http://blog.bumblebeelabs.com/social-software-sundays-2-the-...).