At first glance it feels like the most effective way to game this system is to grind user credit through aggregate low polarization support on fairly neutral low impact posts, then strategically 'spend' on higher profile polerizing posts. Is that a fair 'red teaming' observation?
Yes I think this actually could work. Community Notes has a basic reputation system: users need to "Earn In" by rating notes as "Helpful" that are ultimately classified by the algorithm as helpful. Once enough attackers earn in, they can totally break the algorithm.
Breaking it is not as simple upvoting a lot of, say, right-wing or left-wing posts though. The algorithm will simply classify all the attackers as having a very positive or negative polarization factor, and decide that their votes can be explained by this factor.
What would work is upvoting *unhelpful* posts. I have actually simulated this attack using synthetic data and sure enough it totally breaks the algorithm. I write about it in this article: https://jonathanwarden.com/improving-bridge-based-ranking/
Oh hey, I came across your Social Protocols groups while doing my regular rounds for Polis-related projects a few months ago, when I found Propolis! Was trying to figure out why your name was familiar-ish :)
There's also a Polis User Group discord: https://link.g0v.network/pug-discord It's pretty low-key lately, but high density of potentially-aligned ppl. I am hoping to restart the weekly open calls for prospective Polis facilitators and self-hosters, in case you're interested to log in.
Thanks for your posts by the way! I am jealous of your output -- I tend to have a few calls/meetings about Polis per week, but am not so great at producing clean artifacts like this :)
The reasoning was: coming up with (and answering) yes-no questions is more effort and a higher entry barrier for participation than just posting anything and having up/downvotes - like in a social network. Requiring this formalization of all content on a platform creates an entry barrier, e.g. people need to formulate what they want to post as a yes-no question. At the same time, it disallows content, which does not fit the yes-no question model.
Our big insight was: We can drastically simplify the user interaction and allow arbitrary content, but keep the collective intelligence aspect. That's achieved by introducing a concept similar to community notes, but in a recursive way: Every reply to a post can become a note. And replies can have more replies, which in turn can act as notes for the reply. Notes are AB-tested if, when shown below a post, change the voting behavior on the post. If a reply changes the voting behavior, it must have added some information, which voters were not aware of before, like a good argument.
3. Love the idea of using algorithms for moderation/mediation. Seems to be wildly successful on reddit at its aim (of deprioritizing controversial), and very successful on HN.
As as engineer at heart, one thing that would help me get more enthusiastic about this is showing 30 or so example controversial or disputed claims (e.g. who won the election, whatever) and how this algorithm would score various "provided contexts" about them. Of course as a biased individual I can't expect 100% agreement, but it would be nice to see if any wild or surprising results come out of this technique.
Author here. Thank you. And good idea, it would definitely improve the article to pull out actually examples. I could show examples of righ-wing+helpful, left-wing+helpful, right-wing+not-helpful, neutral+helpful, etc. I have looked at many examples myself and they are really quite interesting and not too surprising.
It’s an interesting idea, but I worry about the determination of axes. What if one of the axes is science versus time cube? Feeding versus murdering children? Are these things we want to factor out?
Yes, this is really an important point. In some online forums, the factor that explains variation among users votes may be precisely the thing we don't want to factor out. It may, for example, be expertise!
Yes just one. Adding more latent factors seems to make little difference. A single factor seems to explain most of the polarization among users.
I have run it with multiple dimensions and plotted the results. They are fascinating, even if they don't necessarily improve the algorithm. See the 3D plot in my article on Improving Bridge Based Ranking, the section "3D And Higher-Dimensional Matrix Factorization:
https://jonathanwarden.com/improving-bridge-based-ranking/#3...
That is really interesting! I would have guessed that multiple dimensions would be required, e.g. religious/atheist, conservative/liberal, parent/no-kids, who knows... after all there are a variety of community notes topics.
One possible explanation for why these dimensions don't improve the algorithm a lot is that differences in these dimensions don't cause differences in whether users rate a note as helpful or unhelpful -- at least not beyond what can be explained by the primary latent factor. These other dimensions may contribute to other factors of user behavior -- such as which tweets they like and what users they follow. But once we know a user's left-right polarity factor, these other factors don't make a huge difference on whether or not a user ranks a note as helpful (given they rated the note at all). They do make a difference, but since most of the difference is already explained by the polarity factor, they don't add much to the algorithm.
It's great to see this exploration. Those interested might also want to check out https://vitalik.eth.limo/general/2023/08/16/communitynotes.h... and https://knightcolumbia.org/content/the-algorithmic-managemen... (and how aspects of this might be applied to AI governance https://reimagine.aviv.me/p/governance-of-ai-with-ai-through ).