Hacker News new | past | comments | ask | show | jobs | submit login
A distributed spam attack across the public Matrix network (matrix.org)
131 points by Sami_Lehtinen on July 1, 2021 | hide | past | favorite | 77 comments



Yeah that was always going to happen of course.. When you make an open network, it becomes a lot harder to stop this kind of thing from happening. It's something inherent in its very nature. When you're more open you're also more open to abuse. I'm surprised it took this long (which also says a lot of good things about the maturity of the users!)

I hope they don't have to tighten things down to the point of interfering with functionality because Matrix is excellent as it is.

On the other hand, on IRC which I also use a lot it's just a fact of life.. We just apply some minor workarounds such as +r channels, roll our eyes and move on with life. I'd much rather have a free open network with some spam than a curated one by big tech that's free of spam.

PS: Another thing I do foresee happening if Matrix really takes off is big tech datamining it. It's very hard to make a network open and yet hide as much metadata as you can, e.g. how many users are on which servers, who do they communicate with across the network etc. If some big tech entities control some of the servers it becomes really hard to keep that locked away. Of course Matrix's encryption will help but not for all metadata. Again I think it's something we'll have to mitigate rather than prevent outright.


This is what Cwtch is after right? Doing away with metadata too. They posted here the other day.



Oh I missed that post, I will look for it, thanks! Good to know it's on the radar!



Oh I thought this Cwtch was an extension to the matrix protocol to build barriers to metadata mining. One of the things I like about Matrix is its open nature and the many bridges. I would be less interested in yet something else.

Especially because Matrix is pretty good at what it aims to do already.


> I'd much rather have a free open network with some spam than a curated one by big tech that's free of spam.

I don't think there's anyone that can actually keep their system abuse-free. Things like Messenger certainly have a lot of spam.


It didn't take this long, I've abused open registration before


Honestly, if you're going to clean-slate design an open-membership p2p network, the very first problem you should be tackling is what you're going to do to deal with asshats. Open-membership p2p systems sound nice in theory -- and even tend to work in small-scale settings! -- but once people start to actually use them, the asshats will show up and trash your system for kicks. Plan accordingly.


How to deal with asshats is basically the second half of https://matrix.org/blog/2020/10/19/combating-abuse-in-matrix.... We were actually due to deploy the first cut this week, but ironically the spam stuff got in the way.


Glad to see this! Best of luck with dealing with this problem -- it's not an easy one, even when the system is centralized.


The author of construct tried to warn you about this years ago and you ignored him


I told you so's are rarely helpful.


> the very first problem you should be tackling is what you're going to do to deal with asshats

I agree. One solution is "have a real human identity you can't delegate with real stakes if you're banned." What we need is a blue checkmark for everyday people.


This doesn't work. It's what Facebook, Google and others have been doing for almost a decade now. Has it stopped spam, harassment? No.

Most times, harassers are protected by their legal system, not the victims. But if someone dares insult the police, then they'll find you in a second and put you on trial.

We need better tools to manage trust and consent in a decentralized settings without revealing the social graph. I heard the expression "Fog of Trust" a few times to refer to that.


FB, Google, and the like don't really do it hard enough though. Hacker news types where very against it, but imagine the US issued digital social security cards that could be used to actually verify a person was who they said they were. It would definitely at least cut back on spam


That is definitely not true. As a privacy-conscious Internet user, who hangs out in privacy-conscious communities, i can assure you having a fully pseudonymous identity on the Internet is harder now than it was 10y ago because of such policies, and it's definitely IMPOSSIBLE to open a google/facebook/twitter account without at least a phone number.

Such policies have definitely not cut back on spam (at least not according to my definition of spam), but have prevented countless legit users like me from taking part in such networks. Retrospectively it's a good thing that these networks don't want people like me, we're doing much better elsewhere.

I propose you a challenge: if you can view age-restricted videos on Youtube and create a Facebook group without giving your ID, credit card, address, phone number or IP address... i'll send you a cupcake over mail? :P


Kinda surprised no one's used a blockchain for this. The registration flow could be as follows:

1. User registers a non-transferable username (maybe an NFT?)

2. User signs into the client with the username NFT

3. The server admin maintains a list of shadowbanned username NFTs -- if yours gets added, then the server won't relay your messages.

Non-transferability means no one will squat usernames (since there's no money to be made by reselling them), and since it costs money to register but costs nothing to shadowban, the asshats only lose money by being asshats (and their spam doesn't even get read).


How do you make a username non-transferable?

It sounds like all you've actually invented is "identifying people by their bank accounts", only with more steps.

I suppose if the NFT could be registered using an anonymous cryptocurrency, it might end up being a more privacy-preserving system than getting people to pay for an account using traditional methods.

It also might be cheaper than paying to join multiple services, if you only have to pay once and can use your NFT username across them all. On the other hand, a single malicious admin could try to extort you by threatening to ban you from all those other services.

A better approach would be a blockchain-based anonymous identity system, which is apparently what BrightID is:

https://www.brightid.org/


> How do you make a username non-transferable?

Trivially. You make it so that there's no recognized way to change the private key for a username.

> It sounds like all you've actually invented is "identifying people by their bank accounts", only with more steps.

I've created a layer of indirection between usernames and bank accounts. The system doesn't need to know or care about how you managed to burn tokens for the username.

> A better approach would be a blockchain-based anonymous identity system, which is apparently what BrightID is:

Does BrightID guarantee that asshats have an asymptotically worse time registering usernames than admins have shadowbanning them? If usernames are easy to come by, then so are sockpuppets and one-off spam and troll accounts.


> You make it so that there's no recognized way to change the private key for a username.

Making it impossible to rotate keys doesn't sound like it follows cryptographic best practices, but in any case, there's nothing stopping someone from selling their private key to someone else.

If you just want to avoid squatting/speculating, you could make the user IDs be random unique values but associate them with a non-unique human-readable name.

> If usernames are easy to come by, then so are sockpuppets and one-off spam and troll accounts.

I haven't used BrightID, but I believe it works by having users meet in person and mutually verify each other as being unique humans. It should be impossible for someone to pretend to be two people in the same place at the same time, so that does seem viable.


> nothing stopping someone from selling their private key to someone else.

The fact that the original owner(s) can still use the name would prevent resale. For example, the admins could simply shadowban a username if they verify that multiple users have the same key (e.g. if my private key was stolen, I'd report it to the admin).

> If you just want to avoid squatting/speculating, you could make the user IDs be random unique values but associate them with a non-unique human-readable name.

The literal identifier isn't important to account resale value. User accounts include all of the state as well as the literal identifier, including reputation, longevity, and associated app content. This is all valuable to asshats -- they want high-reputation accounts to broaden their spam audience. But in order to make it costly for asshats to gain high-reputation accounts (more costly for them than for admins to shadowban them), we can't give them any shortcuts -- the system should compel them to spend time and energy to earn their reputation like everyone else. So, account resale shouldn't be supported by the system.

> I believe it works by having users meet in person and mutually verify each other as being unique humans. It should be impossible for someone to pretend to be two people in the same place at the same time, so that does seem viable.

This does not sound like it prevents a small number of asshats from just creating a bunch of fake sockpuppet accounts. If creating accounts is a cheap (or cheaper) than shadowbanning them, then the asshats will eventually overwhelm the admins.


> nothing stopping someone from selling their private key to someone else.

The buyer would know that maybe the seller still has the key, since cannot be rotated

> user IDs be random unique values but associate them with a non-unique human-readable name.

I think scuttlebutt does something like that

> It should be impossible for someone to pretend to be two people in the same place at the same time

A group of people cooperating can pretend to be 99999 people?


> The buyer would know that maybe the seller still has the key, since cannot be rotated

If the buyer is a spammer, they won't care that the seller can still send non-spammy messages with the account. If the seller is a squatter/speculator, they have nothing to gain from interfering with their customer's account.

> A group of people cooperating can pretend to be 99999 people?

It would be easy to determine from the (anonymous) social graph that those 99999 people are only connected to each other and a small group of other (real) people. An algorithm looking at this graph could then select 100 people out of the 99999 group and require them to meet with 2 other distantly-connected people at a specified public place. If less than 102 people show up, then those 100 lose trust points. That's how I guess it would work, anyway.


> If the buyer is a spammer

What if the seller is a spammer or scammer, and first sells the account, remembers the private key, and a bit later starts spamming or scamming

> It would be easy to determine from the (anonymous) social graph that those 99999 people are only connected to each other and a small group of other (real) people

If the "group of people" is small, yes. I didn't say that the group was small though.

If it is larger, and they arrange the connections in realistic looking ways (for detection algorithms), then they can get away with it. Think of an island where most people are connected with others on the island only -- and maybe 10% of them connected to people on the mainland. Something like that can happen in real life I suppose, and the "group of people" (possibly many, paid by a company or a state) could construct such graphs and pretend to be more than what they are

> An algorithm looking at this graph could then select 100 people out of the 99999 group and require them to meet with 2 other distantly-connected people at a specified public place

That's an interesting way to try to handle that. However, first the algorithm would need to realize that a part of the graph is suspicious. (And people would need to be really motivated to, in real life, actually go to somewhere :-) ? what of they're busy with friends and family)


No need for blockchain on it at all, it's just a paid email address in a nutshell.


Hard to make that work in an open-membership p2p setting.


ah yea makes sense


Most dweb startups you hear about are precisely marketing their "solutions" as such. You hear stuff like "but if there's a financial participation for registration you can't have spam so that's why you gotta use our tokens", which is really ridiculous considering the most nefarious actors usually have a lot of resources at hand.


Fundamentally, a struggle between asshats and honest users/admins is a struggle of attrition. The honest participants only win the struggle if they can make it too costly for asshats to continue participating.

To achieve this, it would be sufficient to make it so the cost of registering a user account is asymptotically more expensive than the cost of shadowbanning an account. If the software required that each additional account a user registers costs more in USD terms than the previous account, while keeping the cost of shadowbanning a username constant, then the honest participants will win in a struggle against asshats who must keep registering accounts to circumvent the shadowban.

A strawman approach would be to require all user accounts to be tied to real-world identities. Then, the software can track how many accounts a real-world person has registered, and bill them accordingly. This obviously has other unrelated problems, but it does make it possible to penalize asshats more and more harshly for each infraction.

I was suggesting that a blockchain could be a useful building block here, since it already exists, is widely deployed, and costs money to write state. A more practical approach than the strawman could be to implement a username registration system on a blockchain, such that each registrant must burn some of the blockchain's tokens to register a username (thereby imposing a cost to doing so). Crucially, the number of tokens burnt per name would increase as more and more usernames get registered (or as more and more time passes), and in doing so make it more and more costly for asshats to circumvent a shadowban. This would also make it so asshats would lose a war of attrition against admins, and could be implemented today.


> Crucially, the number of tokens burnt per name would increase as more and more usernames get registered (or as more and more time passes)

This is just a tax on the young.


More like a tax on the latecomers.


no one used blockchain for this because:

1. it's slow, requires insane amount of storage, power and traffic

2. nobody wants non-transferable username for real


> 1. it's slow, requires insane amount of storage, power and traffic

Compared to the infrastructure required to uniquely identify every human in order to stop them from registering lots of troll accounts, and implementing a fair and accountable law enforcement apparatus to make sure the rules are followed? The blockchain might be cheaper.

> 2. nobody wants non-transferable username for real

Which is the worse outcome -- you can't transfer your username (but you can register a new one for a fee), or trolls and squatters can trash the system?


I prefer Matrix way - open network of different servers. Don't like spammers? OK, close registration on your server. Have public rooms? OK, block servers with open registration.

No problems with spammers or trolls at all. And that's literally my experience with 20+ homeservers that I configured

Again, nobody wants unique permanent ID per human.

PS: regarding "a new one for fee" - I hate services that work this way (hello, Blizzard), IMO.


The modern Internet is a war zone, like a failed state. Any protocol or system must be armored for battle. Anything that is at all “open” will be destroyed by spam and other forms of exploitation the instant it becomes popular enough to have any value whatsoever as a target.


I don't know about the failed state part, she's open and truely free, at the mercy of it users. Sure is a warzone though.

Just humans learning how to behave, maybe one-day it'll all be peace and productive quiet, until then, and on our journey to there, a warzone.


A better term would be a 'dark forest.' The Internet is a dark forest.


A prime example of attacks making the target stronger.


Sure, but what if it were an attack which exposes a lot of private information? Just saying that not every kind of attack has a good outcome.


> what if it were an attack which exposes a lot of private information?

Then it wouldn't be "a prime example of attacks making the target stronger", it would be a, err, sub-prime example?


That's not the only way to interpret the original sentence. It could well have meant that attacks generally make the target stronger.


This is the very immaturity I'm concerned about when people suggest moving our IRC groups to Matrix.


Because the same behaviour never exists on IRC?

All things considered, I can't say that the Matrix.org team responded slower than I've seen any IRCops respond. The main difference is you don't tend to see news about this happening on IRC as much anymore because IRC has had spam problems for decades, and there's hardly as many people who are out and about making it news.

Plus, for what its worth, most of us who run our own homeservers never encountered this issue at all, because public registration isn't open in the first place. It seems awfully dismissive to wave off Matrix for purely this one incident as it being "immature" as a protocol / platform.


As a person who maintains a moderately sized irc network (700 users when I last checked) one of the things that had put me off matrix a couple of years ago was the lack of powerful moderation tools.

IRCs are absolutely archaic, for sure. Inspircd’s modules are awkward to use, single characters have special meaning… there’s g/k/u-lines and don’t get them mixed up!

But, they’re very powerful and absurdly flexible. We’ve been able to mitigate absolutely huge spam attacks with quite complex logic, and you have strong moderation tools at every level- from the channels with quite rich permissions (voiced, half-ops, ops, admins, founders) and the network level (the aforementioned lines).

Maybe matrix has caught up- but it’s hard for me to run a public network without such rich moderation tools.

Even if those tools are clunky and awkward.


Moderation tools are slowly improving, but definitely still don't fit all use cases, and it's very unfortunate they don't come natively with synapse. I'm the lead admin of a Matrix ' Space', and our main room now has over 6,000 users (80%+ are inactive or barely active). We use mjolnir(https://github.com/matrix-org/mjolnir) to ban and redact users/servers/messages across all of our rooms, which has been a godsend since I used to have to redact all the racist/gore myself message-by-message. It's still reactive instead of proactive, but I'm hoping these tools will mature in time.


fwiw https://matrix.org/docs/guides/moderation/ enumerates the current moderation approaches in Matrix, which are relatively comprehensive. there's still stuff we can do though (e.g. ability to set rooms to text-only)


I meant immature literally, this is a growing pain that will make Matrix a stronger network/protocol. IRC has been around for decades and Matrix hasn't, that's all.

But this will be one of many _novel_ attacks on what is a more complicated network.

When clients and servers need to be updated to fix these issue, it becomes a drag.


Hasn't spam traditionally been a huge issue on IRC? Perhaps it's a bit less now as IRC usage has fallen and/or better systems to stop it have been put in to place, but back in the day when I still used IRC I remember random bots/people spamming ASCII dicks or just offensive text like repeating "nigger" over and over again, and similar "hilarious" stuff.

Actually, the only time a server under my management was hacked (CEO installed some random WordPress plugin with RCE, sigh) was to run an IRC spambot to send out exactly this sort of abuse. Hurray for firewalls also blocking outgoing traffic by the way: as far as we could tell this was all just dropped and never impacted anyone.

Either way, this doesn't sound like a "Matrix problem".


I was on IRC in 93/94 (mainly IRCNet), a couple of years after it all started, and especially in those days it was a total shitfight. Technology around the platform moved at a rapid pace and there was this constant race going on between network admins and malware writers. At that time IRC clients supported a lot of scripting and much of that was used for abusive stuff. Like auto ban/kick, forcing netsplits in order to become admin by abusing protocol issues etc. The problem was bigger than spam alone. It was so bad it put me off IRC for almost a decade before I got back into it.

Since then IRC has actually evolved into a more mature userbase but its inherent lack of protection to this and the ease of writing a bot makes a few idiots capable of spamming the entire IRC world.

Nevertheless like I said in another post I think this is just a tradeoff we have to learn to live with. Being fully open is great, but comes with drawbacks and implied trust. I still prefer it as it is.


This is what makes the recent Freenode takeover so sad - it was a more mature version of IRC that was hijacked by a gang of people with "old IRC" ideals.


I haven't used IRC in a long time, and never used Matrix, so I could be way off, please correct me if I am. I think the main difference between matrix and IRC in regards to spam is the federated nature of matrix. In IRC, each organization[1] only has to worry about it's own users. So they can add any kind of verification, bans, filters, etc to manage spam. With matrix, you have to rely on every other organization to police it's own users. Kind of like email, which has way worse of a spam problem than IRC.

[1] I said organization, but what is the right word here? Group? Domain? Do IRC and Matrix have their own terms for these things? I wanted to say server at first, but surely there is more than one server.


I think the terminology you're looking for is "homeserver." Homeservers are tied to a single hostname / domain, but may be a collective of servers running the different services that comprise the protocol itself.

That said, you _do_ have to worry about spam from other homeservers, but that only matters if you have users from that other homeserver in your room. You can choose to federate / not-federate on a room-by-room basis, or just not federate at all ever.

Needless to say, you're right that it is a bit different than IRC in that regard (IRC can just kline hosts / IP ranges from the whole service if you're spamming). However, in practice there's a lot less instances of random spam such as in the article than you would think.

For myself, I never noticed any of these spammers at all (anectodal, so mileage may vary). I do have my own homeserver through Element Matrix Services, but I am part of a good number of rooms that exist purely on the Matrix.org server. Certainly I have to "police my users," but that's pretty easy since I run a small homeserver for ~5 users, and we just federate with the greater network when needed.

I guess I agree with you in general that there's different failure modes, but I don't think it's as simple as dismissing matrix because it "has the same spam problem as email" compared to IRC. In practice at least, it doesn't seem to be as much of a problem despite how much larger than IRC Matrix has become.


Thank you for the insight. To be clear I'm not dismissing Matrix, just trying to explain/understand it. I would say that federation is the main reason that email has been so successful for so long, even if it is the same thing that makes spam so hard to control.


Ah yes, the part in my comment about dismissing matrix was in reference to the thread parent. Apologies if you thought that was directed at you.


There's a small difference in that IRC generally is a message passing network whereas Matrix more like a log synchronizer.

So spam becomes a part of the log, which means that it'll consume far more resources than just network resources.


Not sure why sibling comment downvoted: these are problems IRC and others haven't solved without being called "immature".


Is the sibling comment dead because of the explicit mention of the n word? The points raised in that comment seem valid.


What's pathetic here is that a bot spam wave can cause performance problems in the network.


Most of the network was fine, it only affected the main Matrix server and anyone who federated with the spammed rooms. If you weren't in a room being attacked or left it you were fine.


[flagged]


HTTP is also just moving text around and can be quite complicated...


the lack of admin uis makes stuff like this even more annoying. having to write/scrounge a bunch of scripts to manage users on years old software


There is synapse-admin https://github.com/Awesome-Technologies/synapse-admin/

The matrix-docker-ansible-deploy playbook will install it for you very easily. All it takes is enabling one config option.


I wonder if Matrix will get its own "spamhaus" equivalent, and if this will be a single service or a multiple competing ones.


There are public and community-curated banlists than any homeserver admin can subscribe to. Probably not exactly a "spamhaus" equivalent, but it's pretty close.


Seems like a high-effort specialized-skill general attack on a privacy and anonymity platform for its own sake. Cui bono? Ah, right...


It says to email abuse@matrix.org to get “unblocked”, what control do they have to do this?


the control over matrix.org, the homeserver they administrate and they blocked other servers on.


That's for removing a server level block if it was imposed at the height.


A federated server, to be clear, that can federate with Matrix.org users as well as the tons of other servers users are joining/hosting.


The short answer is that, in practice, a huge proportion of Matrix activity is centralized, and they control it.


Can you share more details please?


What details would you like? Server population stats? Is it controversial or obscure information that a very high proportion of all Matrix users are on the Matrix.org instance? (seriously, is it?)

I thought that fell under the umbrella of common knowledge, so far as Matrix goes. If you've got data indicating otherwise, I'd find that interesting to see. I'm actually finding hard stats very difficult to locate (the few I can find seem incomplete, but corroborate this very strongly, for what they're worth), but again, I thought this was basically assumed to be true by Matrix users. I mean, if you search "matrix protocol" or "matrix chat" or "element chat" and follow the Happy Path of clicks to try it out, you'll have a Matrix.org account. The way you veer off that path? You have to know the address of another Matrix server. You won't be presented with other options, just the ability to type in the address of a different server. That this would result in Matrix.org having the lion's share of users seems a predictable outcome.


we estimate about <30% of matrix users are on matrix.org based on synapse phonehome stats.

the “unblocking” requests to abuse@matrix.org are to remove the servers with spambot accounts from the blocklists we publish as matrix.org; nothing to do with centralisation.


The first figure is lower than I'd have guessed, thanks for the correction.

As for the latter, that's immaterial—the more important it is to get server-to-server federation un-blocked with matrix.org, the more that can be taken as a sign of centralization. Moreover, if a large percentage of servers follow the Matrix.org blocklist, that's precisely the kind of centralized control the original poster was asking about.

(mind, none of this is intended as judgement, just clarifying why matrix.org would have substantial de facto "control" of the ecosystem & network, to use the original poster's word)


> the more important it is to get server-to-server federation un-blocked with matrix.org, the more that can be taken as a sign of centralization

This was my point. We never blocked server-to-server federation with matrix.org. We published a blocklist of the abusive servers, and blocked them from the rooms which we manage on the server. There was never any centralised control applied, and there is no mechanism to do so (unless room/server admins opt in to using the blocklists we publish, but they are very welcome to use their own).


Spam on matrix tends to be racist gore images which is 1000% worse than irc text spam




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: