Hacker News new | past | comments | ask | show | jobs | submit login
Matrix.org hacked (archive.org)
324 points by nektro 6 months ago | hide | past | web | favorite | 258 comments

I can see a lot of people trashing on Matrix.org or the "hacker" themselves (the hacker opened a series of issues, detailing how he managed to get in - https://github.com/matrix-org/matrix.org/issues/created_by/m...). However everyone seems to be missing the point - matrix seems like a pretty cool and open project. And someone taking over their infrastructure in such an open way is also great for the community. Even though a little dubious on the legal side of things, I believe it's great it was approached with transparency and a dose of humor.

Some might argue that this is harmful to matrix as a product and as a brand. But as long as there was no actual harm done and they react appropriately by taking infrastructure security seriously, it could play out well in the end for them. This whole ordeal could end up actually increase trust in the project, if they take swift steps to ensure that something like this does not happen again.

On the first issue opened by the hacker:

> Complete compromise could have been avoided if developers were prohibited from using ForwardAgent yes or not using -A in their SSH commands. The flaws with agent forwarding are well documented.

I use agent forwarding daily and had no idea it contained well known security holes. If that's the case, why is the feature available by default?

SSH agent forwarding makes your ssh-agent available to (possibly some subset of) hosts you SSH into. This is its purpose. Unfortunately, it also makes your ssh-agent available to (possibly some subset of) hosts you SSH into.

I never quite understand why there’s not a confirm version. ForwardWithConfirmation or something. I’m active when I need forwarding - would be happy to simply be prompted before it’s allowed.

OpenSSH does have confirmation: use the '-c' switch to ssh-add.


Or "AddKeysToAgent confirm" in ~/.ssh/config

Waaaaaaat?! That could definitely be better known.

TIL :|

Hang in, there.

This could be a sane default.

Hm, anything similar for gpg agent (both for gpg, and as a stand-in for ssh-agent)?

Ed: looks like I need to edit my sshcontrol-file


If you use Yubikey with touch-to-use enabled that'll be basically what you're asking - each authentication will require touching the token.

I just enabled that after seeing this incident.

For people in the same boat, it can be done trivially using the YubiKey Manager CLI: https://developers.yubico.com/yubikey-manager/

Some ssh agent implementations do this, notably the one built into Android ConnectBot can be configured to request confirmation each time it is asked to authenticate. Unfortunately ssh-agent (from OpenSSH) does not as far as I know. It's happy to authenticate as many times as requested without any notification.

It can, and it's determined per key when added to the agent.

Look for -c here: https://man.openbsd.org/ssh-add

Indeed it is - I even checked the man page before posting the comment and completely missed that option.

Is there a secure alternative that achieves the same outcome?

Here are a few ideas that might help.

Use separate keyboard-interactive 2FA (I recommend google-authenticator) for production ssh access.

Use a key system which requires confirmation or a PIN to authenticate (such as a Yubikey). Use a persisting ssh connection with Ansible (ControlPersist) to avoid unnecessary multiple authentications.

Allow connections only from whitelisted IPs, or Uuse port knocking to open temporary holes in your firewall, or require connections to production infrastructure to go through a VPN.

Access production infrastructure from hardware dedicated for that purpose, never do anything else on it.

I wish there was a way in ssh to tag connections and only allow agent forwarding to keys with the same tag. That would prevent agent forwarding production keys from a dev host.

I'm not sure. A secure, backwards-compatible (with older servers) alternative, which only exposes keys you explicitly choose to expose, should be doable and might help.

another option would be for a SSH client to present a full-screen "$HOST is trying to use your your SSH PRIVATE keys. Press enter, then type "~A" to allow." prompt.


Hadn’t seem that before. Article here explains is briefly https://www.madboa.com/blog/2017/11/02/ssh-proxyjump/

This article is .. weird. It mentions SOCKS5, DynamicForwarding and "decent version of nc", while you don't need anything at all for forwarding connection -- SOCKS is not involved in any way, and initial 1995 release of nc would work just fine.

Here is a much better explanation (from [0]):

> ProxyJump was added in OpenSSH 7.3 but is nothing more than a shorthand for using ProxyCommand, as in: "ProxyCommand ssh proxy-host -W %h:%p"

so the same thing that top poster was talking about.

[0] https://superuser.com/questions/1253960/replace-proxyjump-in...

Do you keep your keys on the proxy host, then? Otherwise, "ForwardAgent yes" and you're back to the same situation.

ProxyJump uses the keys from the original host, not the proxy host.

I know. That's why I asked. Chained agent forwarding will serve your keys just the same, so ProxyJump is not "a secure alternative that achieves the same outcome".

Are you disagreeing with the "secure alternative" or the "same outcome"? I thought the difference between ProxyJump and agent forwarding is the following:

Agent forwarding forwards the agent socket to the proxy server. Thus any ssh connection originating from the proxy server can reuse the agent, and with that has the same access to the agent as the originating host.

ProxyJump routes the ssh connection through the proxy host. The crypto takes place between originating host and target host, not between proxy host and target host. ssh connections originating from the proxy host can not access keys from the originating host.

But maybe my understanding of ProxyJump is incorrect?

I know exactly how agent forwarding and ProxyJump work, but I'm having a hard time understanding what you mean.

ProxyJump proxies your ssh connection, so connecting from A to B via proxy X the connections go A->X and X->B.

You can use AgentForwarding with ProxyJump, in which case agent connections go B->X->A.

I cannot see how ProxyJump would somehow be an alternative to AgentForwarding. You can use both independently.

> ProxyJump proxies your ssh connection, so connecting from A to B via proxy X the connections go A->X and X->B.

No, it rather works like this:

A -> B via X establishes A->X and then, through that connection tunnels a new ssh-connection from A->B.

A->X, then X->B would require forwarding the Agent from A to X, so that the connection from X->B can authenticate using that agent. Proxying the connection does not require X to ever authenticate to B, the authentication happens straight from A->B (1). Thus, no agent (forwarding) needed. You can also chain ProxyJumps: A->X->Y->B tunnels A->B in A->Y which is then tunneled through A->X. In that regard, ProxyJump and ProxyCommand can replace AgentForwarding in most use cases. There are some uses where AgentForwarding is the only solution, though.

(1) Added benefit: X never sees the actual traffic in unencrypted form and all port forwards A<->B work

Hehe, I think I figured out the source of the confusion.

I was thinking that the threat is that a compromised B gives access to your keys via agent forwarding. Presumably if you make keys available on B, you need them there. There's nothing ProxyCommand does to help there.

But you're talking about using ProxyCommand as an alternative for connecting A->X and then X->B, so keys are not available on X. That's of course an improvement.

That project looks dead.

I think an issue here is we've been told for a long time "always restrict access to the environment through a bastion host" without much implementation detail discussed after that. Agent forwarding tends to show up as the most efficient way to implement this.

Basically you need to make sure the host you are SSH'ing into with an agent is secure. Otherwise the root user on that host can access your agent socket and connect to any other machines your agent can.

So if you SSH -A to a compromised Jenkins server, and you've got all your production keys loaded in your agent, the hacker can now authenticate to all those production machines as well.

So don't ever SSH -A into a machine unless you KNOW its secure. The way I think about it is unless I trust the machine enough to leave my private keys on that machine, then I'm not going to SSH -A into it.

> I use agent forwarding daily

That's why. It's useful, but you have to be mindful of the security risks involved in using it.

Or the hacker could have responsibly disclosed the issue to Matrix then reported their findings like a professional. Besides, we're still defacing web sites? I though that went out of style years ago. Did the hacker make sure to post that on MySpace too?

Responsible disclosure is about not enabling third parties to leverage the disclosure to gain access. In this case the hacker did not disclose the security holes before they were closed for third parties (i.e. the hacker could only still access the hosts because he had access to the them in the past, new access was (hopefully) not possible anymore).

Which of course doesn't mean that the hacker should have just send an email to the matrix team.

> The matrix.org homeserver has been rebuilt and is running securely;

We should have more bounties. Let users donate and put wallets on servers. Attacker will be able to take these funds. It's a reasonable measure of an infrastructure security.

To avoid perverse incentives, you should also build in some reward for the developers/operators. As in: If the server gets hacked, the money goes to the whitehat. If the server does not get hacked for $TIMEFRAME, the money goes to the people responsible for its security.

Also, there is a requirement for the hacker to actually publish the results of how they did it. Otherwise, you run the risk of the hacker just walking away with the funds or giving a bogus reason (after they've already spent the wallet).

Therefore, the wallets should be stored GPG encrypted in some published location. After the hacker has successfully penetrated and retrieved the file, they need to publish a "how I did it" document along with the hash of the GPG encrypted wallet.

Once devs have confirmed the vulnerabilities exist, they respond with the passphrase to decrypt the wallet.

Unless I'm missing the joke, this is a bug bounty with extra steps.

My idea was to not require any explanations, so that blackhat could grab that wallet too. It's just about being able to say "this server is $1k secure". I think it's fantastic that we have a technology to do that.

You still need some trust that private keys to given wallet are on the server, but apart from that, when you know there's $10,000 dollars on the server for anybody who can access it, it says something about how secure this machine is.

Plus you get instant notification when the server is compromised. Not every hacker is kind enough to let you know.

How would a blackhat grab the wallet if it's GPG encrypted and needs the passphrase from the dev?

I like this idea!

Seems perverse to me as well. Might be a better idea to just fund Matrix enough to be able to have at least someone full time on it. With $3 752 per month on Patreon right now I cannot imagine it's a lot after infrastructure costs and taxes. Certainly not enough to let Arathorn or someone go out of his way to get expensive security training.

Seems like asking for trouble. Basically you're putting a thousand dollars cash in your house, then telling the world I have a thousand dollars cash in my house, if you find a way to break in and take it, it's yours, I won't make a fuss because you're doing me a favor by exposing a vulnerability.

Just please don't take the other valuables and ... oh yeah, please don't mess with any of my family members and maybe please let's try to keep it at no more than one hundred people trying at the same time b/c otherwise things might get out of hand.

You only get the 1000 Euros if you didn't do any of the harmful things you mentioned. It is not"cash" in the house

there are quite a couple of so called guides (opsec playbooks for crime) that I found specifically on Wall Street Market (a darknet market place like the now defunct Silkroad), available for purchase.

Some of them go beyond just instruction booklets but promise access to their chat systems via invitation (upon purchase of the pdf) and offer some kind of limited coaching. It is essentially the recruiting mechanism to bring in lower ranking soldiers starting out as mules, handlers, or basically move up from re-selling goods.

A couple of these guides point out how much Telegram sucks etc, and that they now have moved to p2p based systems. One praised Matrix heavily for it's good security feature.

The tech-savvy-ness of many vendors has picked up considerably since I first started watching. There is a strong push to re-think and refactor both tools and their processes (yes yes - this happens constantly otherwise they get caught, but never as fast or aggressive than these past months).

It's likely that this is just a (s)kiddy enjoying the attention. Though quite a lot of players have more than just an "academic" desire to ensure these (their) systems can withstand an attack by LE. When I browsed the matrix issues on github I couldn't help but immediately recall the strange emphasis on "we have switched to matrix". It's far fetched but I'd say somebody may have a strong interest in seeing these issues resolved (->or has gotten genuinely fed up and wanted to do something, as opposed to this being just a skid that only did it for the attention)

for a good analysis on how some of these tutorials and the philosophy behind them see: "Discovering credit card fraud methods in online tutorials"


There are some weird people in those issue threads..

What did they say? The comments have been deleted.

checking in as an internet weird person here, any time you have a platform that synthesizes anonymity with collaboration/social interaction, weirdos like us are gonna pop out like clockwork because we find a safe haven for our, uh, weird stuff. a place to not be judged or whatever. i think the sjw type terminology for it is a safe space. and of course due to the human element being so easily corruptible, many people do also tend to use such things for illicit purposes.

hey anybody else remember the days of T-Philez?

Github issue got closed or removed it looks like. There is a new issue where people are complaining about the first getting closed:


They were getting a ton of spam messages so they have been locked so that only collaborators can talk. They will be restored when the spam stops https://github.com/matrix-org/matrix.org/issues/367#issuecom....

I would like for matrix protocol and implementation to be better prepared for such cases.

While I didn't loose access to the encrypted messages, since I used the 'Encrypted Messages Recovery' function of Riot.im, I guess a lot of people have. Maybe allow to store more information on the client side?

I do not really like the fact that this feature can only backup keys server-side, so I did not enable it.

I do however have a keys backup dating back some time, that will hopefully restore some of my encrypted messages. But basically, I understand that every encrypted message was at risk of being lost, so it's not that big of a deal.

The backed up keys are encrypted against a client-generated Curve25519 public key, with new session keys being added incrementally (so you don't need to provide the key after you set it up)[1]. Personally I don't see it as much more of a risk than trusting them to host the ciphertext of your messages.

People have different threat models. When chatting with my family, it's more important that we have a permanent history of our messages rather than the worry of them getting leaked. But if you're a whistleblower you have a different set of requirements.

[1]: https://github.com/uhoreg/matrix-doc/blob/e2e_backup/proposa...

You have always been able to export your keys manually to a file.

I agreed with your comment, until I followed the github link and found that all issues had been removed.

Matrix operational security is a joke and developers understanding of security is a joke. This is 2019, not 1992.

Infrastructure with ssh access without hole punching for currently active authorized connections only? Decrypted signing keys accessible over the network? CI servers and developers having root access?

Though the "we had to revoke all the keys so you lost access to your encrypted messages unless you backed them up" takes the cake.

> "we had to revoke all the keys so you lost access to your encrypted messages unless you backed them up" takes the cake

This is just how it works. It's been well documented and mobile clients got updates that backs up the keys automatically. It's also effectively the same as WhatsApp and some other IMs (they just don't even save your encrypted messages). Either way - backup, or lose your history.

When it comes to criticism about backup I have no problems with things getting wiped from the server. I assume a good p2p design has a "little server" and as much client as possible in it.

Enforcing in the clients to properly back up by default, or otherwise properly educating the user of what happens if they don't back-up would be as important as getting the code right. There is little difference to the user whether they lost data because they didn't understand they really had to do backups, or they got their keys compromised and messages deleted by a malicious 3rd party.

I do agree with all of GP's other points though.

I stand by the assertion that it indicates the Matrix people are clueless.

If this is a design constraint, then the security model needs to accommodate that the user keys are the pot of gold, which means that there needs to be a service provided by a dedicated server which is inaccessible in the course of normal operation via any means other than a well defined braindead simple protocol <keyid>:<command>:<message> providing the message manipulations/key store functions from only other authorized production hosts that need to be able to access this functionality.

The server running the service should have a security policy that would prevent one from running any software that is not supposed to be already present on a server ( use SELinux enforcement policy ) to minimize the attack surface; have its own SSH keys not generally accessible during the normal operation, be accessible only from specific IP addresses, etc etc etc. If it is on AWS, it should probably be in a separate account.

I think you misunderstand why the keys were deleted. The keys get deleted on the client when you log out. This is sensible, because if you log out on a device, you probably don't want to keep the keys around in your browser storage. When the users session is destructed on the server, existing clients get a 403 error and told that their session is logged out. When that happens, they go through the normal logout routine which involves deleting the keys on disk.

Deleting the keys isn't something the matrix.org folks explicitly had to do because of the compromise; it's simply how the riot.im client reacts when you terminate it's session.

If user sessions are that important, then there's no way Matrix should be killing them and instead that behavior has to become a design and operations constraint.

Imagine if this was facebook. Or whatsapp. Or signal and this was the result. They would be crucified ( justifiably ). But for some reason we are giving Matrix a pass.

There is currently a bug open in Riot to allow users to save their keys if there was a forced logout by the server (right now, if you try to log out and don't have key backups set up Riot will warn you and ask you to set up key backups).

But, your comparison with other messaging apps aren't really a fair comparison (other than "they are messaging apps"). The reason why they don't have these issues is because they don't provide features that Matrix does -- and those features make it harder for Matrix to implement something as simply others they might. For example, Signal stores all your messages locally and doesn't provide a way for new devices to get your history -- Matrix doesn't store messages locally long-term and all your devices have access to your history. In addition, there is no "log out" with Signal unless you unlink your device.

The reason why Matrix doesn't have e2e by default yet is because they want to ensure issues like this don't happen to every user.

Maybe I'm not clear -- destroying any user's data without user's explicit authorization is unacceptable for any non-joke of a system.

If users' keys are linked to the session key then the system has to be designed in a way that the centralized session key store is protected like a pot of gold. That's a design constraint and dictates operational constraints.

> Matrix doesn't store messages locally long-term and all your devices have access to your history. In addition, there is no "log out" with Signal unless you unlink your device.

If one designs this kind of a system, one accepts the security constraints this system has. That's a basic competence or in this case a lack of it.

If Riot kept around your session keys even if you were logged out I guarantee that a similar complaint would be made about it being insecure since it leaks keys.

I would also like to point out that e2e is still not enabled by default because of issues like this. If you enable it you should know to enable key backups.

Riot has supported automatic key backups for the past few months, and if you'd used that you wouldn't have had a problem (yes it should've existed earlier but there are a lot of things for the underfunded Matrix team to deal with). And the reason it's not default is because making such a system opt-out would also make people start screaming about how Matrix is insecure because "it stores your keys on the server".

I think in many respects, the people working on Matrix are going to get criticised like this no matter what they do. I note you haven't actually suggested a specific proposal for how to fix this -- you're just going on about design cinstraints and how Matrix is therefore a joke system. To me that seems to be more snark than useful advice.

> Riot has supported automatic key backups for the past few months, and if you'd used that you wouldn't have had a problem (yes it should've existed earlier but there are a lot of things for the underfunded Matrix team to deal with). And the reason it's not default is because making such a system opt-out would also make people start screaming about how Matrix is insecure because "it stores your keys on the server".

Encrypt the bloody backup keys with a key derived from a passphrase selected by a user.

> I think in many respects, the people working on Matrix are going to get criticised like this no matter what they do. I note you haven't actually suggested a specific proposal for how to fix this -- you're just going on about design cinstraints and how Matrix is therefore a joke system. To me that seems to be more snark than useful advice.

The snark would be to say "Use Matrix. Who cares about the system not being built to deal with the design constraints"

No one should defend Matrix after this. It was not a mess up. It was an Equifax level fuckup that was totally preventable.

> Encrypt the bloody backup keys with a key derived from a passphrase selected by a user.

Actually the system they have is better than that. You generate a random Curve25519 private key and the public part is stored. This allows your client to upload backups of session keys without needing to constantly ask the user for their recovery password.

You can then set a password which will be used to encrypt the private key and upload it to the homeserver (but you can just save the private key yourself).

So, not only do they have a system like you proposed, it's better than your proposal.

> It was an Equifax level fuckup that was totally preventable.

I agree with you that their opsec was awful on several levels, but you're not arguing about that -- you're arguing that their protocol doesnt fit their design constraints (by which you mean that they clear keys on forced logout without prompting to enable backups if you don't have them enabled yet -- as I mentioned there is an open bug about that but that's basically a UI bug).

All of that said, it's ridiculous that they don't have all their internal services on an internal network which you need a VPN to access.

Yes - on their operational security.

Not the implementation of the encryption code.

Which we'll continue to use from people like www.modular.im, and whoever else springs up. As well as self-hosted servers.

Don't trust them? Host it yourself, and it's easier every day.

That's what drew me to the platform, and what will keep those serious about security, and decentralization/federation.

One of the swift steps should be to address https://github.com/matrix-org/matrix-doc/issues/1194 and https://github.com/matrix-org/matrix-doc/pull/1915 and https://github.com/matrix-org/synapse/issues/4540 properly, so others servers cannot be impacted in any way.

Project lead for Matrix.org here - you can see our initial statement on this at http://matrix.org/blog/2019/04/11/security-incident/.

It will be updated shortly to reflect the DNS defacement linked here (which was because we failed to rotate a leaked cloudflare API token; we aimed to rotate the master API token but rotated a personal one instead). To our knowledge the rebuilt production infrastructure itself is secure.

We've revoked the compromised GPG keys, and are obviously going to do everything we can to improve our production security to avoid a recurrence in future.

We can only apologise to everyone caught in the crossfire of this incident.

https://matrix.org/blog/2019/04/11/security-incident/ has just been updated with details on the earlier defacement.

Any more information on how the Jenkins server was compromised? From a cursory read of the CVEs it looks like an attacker must be able to push code in order to exploit? Was the server completely public, or was the attacker able to submit code via pull request or similar for running on ci/Jenkins?

do you have plans to perform an external security audit?

didn't that just happen? ;)


Have you implemented hole punching?

What is hole punching and how would it have helped against the attack?

I only know the term for UDP firewall transversal.

The lame version of port knocking that solves 99.9% of issues.

1. Default policy for access to all of the development environment is deny all.

2. A developer triggers a temporary addition of developers current address to the allow list with an idle timer, punching a hole for developer's edge IP to access the infrastructure.

3. When the idle timer expires or when the developer says "i'm done", the allow rule is removed.

Obviously, a full blown port knocking with keys and policies would be better for a large organization with hundreds of developers and hundreds of hosts but it is the case where 99.9% of the issues can be solved using a very simple system as in order to get to the vulnerable entry point the attacker would need to do it from an IP address used by a developer at that specific time.

IP whitelisting and port knocking are not serious security methods. They're the very-poor-man's version of a VPN and access control policies, and they're not secure.

You are talking about organizations that have GPG private keys used for signing laying around and those that have Jenkins exposed to the outside world.

Dynamic IP white listing and port knocking are perfectly adequate for 99.9% of the organizations.

The hacker seems nice:

“Anyways, that's all for now. I hope this series of issues has given you some good ideas for how to prevent this level of compromise in the future. Security doesn't work retroactively, but I believe in you and I think you'll come back from this even stronger than before.

Or at least, I hope so -- My own information is in this user table.”


"Or at least, I hope so -- My own information is in this user table... jk, I use EFNet."

I enjoyed the shout out for EFnet.

I wonder if he searched for his own credentials on the production server, and if so, is there a log of it?

Just grabbing it to elsewhere and searching there would be trivial, so I'd expect not.

Hopefully, but it could be just posturing.

That gives a 404 now. I'm not enormously happy with GitHub's willingness to completely hide bug reports like this...

I agree. IIRC, the ability to delete Issues is new. There used to be a "Beta" label beside the delete button and prior to that I don't think it was possible (or at least not as easily?).

Looks like it wasn't cached by Google either.

archive.org has it!

For a bit of context: Matrix.org infrastructure has been hacked a second time in 24h, after restoring everything they went down again, story developing here: https://twitter.com/matrixdotorg/status/1116304867683905537

The hacker is now doing a post-mortem in the GitHub issues of the project: https://github.com/matrix-org/matrix.org/issues

This is gold...

> I noticed in your blog post that you were talking about doing a postmortem and steps you need to take. As someone who is intimately familiar with your entire infrastructure, I thought I could help you out.

> There I was, just going about my business, looking for ways I could get higher levels of access and explore your network more, when I stumbled across GPG keys that were used for signing your debian packages. It gave me many nefarious ideas. I would recommend that you don't keep any signing keys on production hosts, and instead do all of your signing in a secure environment.

Another gem:

RRREEEEEEEE> I noticed you missed a doctype in your html page. In order for web browsers to know what type of html to render you should include a doctype. Thanks!

matrixnotorg> @RRREEEEEEEE Thank you, I will consider that for the next release

Edit: it got deleted

But see also: https://github.com/matrixnotorg/matrixnotorg.github.io/pull/...

Wait, did Github delete matrixnotorg's profile or did matrixnotorg?

If Github deleted that profile, I don't really see that as being very hacker-friendly.

Although 'hacker' is often used as a positive term on HN, breaking into a company's production server is clearly illegal activity and should not be condoned. If Github deleted the account, they are simply acting in accordance to published TOS & policy.

If the attacker placed sensitive information on Github, that would indeed warrant a deletion of the account. However from what I saw from the archives, the attacker merely published details about Matrix.org's infrastructure and its vulnerabilities. Is that something that's against Github's ToS?

Why would they do this? It's pure negligence. I don't even sign anything important and still worry about my keys.

I have been asked twice or more why I insisted on not using a Continuous Integration environment for publishing some software releases that are installed by third-parties.

My team was automating the infrastructure to build internal software and naturally they wanted to be able to simplify things.

The idea that was proposed to me was the following: once I push a new version tag to GitHub, the deployment CI server is going to build and release it as an unstable version.

Some important detail here: I use the same key to sign packages regardless if they are released as unstable or stable. That would mean that if someone, somehow, managed to push a tag that was pushed upstream to GitHub, hypothetically they would be able to eventually gain access to consumers machines (basically, developers) when the consumers update it after getting a notification telling them a new version is available. No way I'd allow this to happen, but I would not be surprised if most people just took this as an acceptable risk.

Depending on your threat model I think that signing packages directly from your CI is acceptable, assuming that your CI runs is a reasonably isolated environment (e.g. on your company's LAN) and people who are able to trigger a release are correctly vetted.

If I understand the parent comment correctly they were somehow shipping the release signing key on their production environment which is a whole other level of bad.

> That would mean that if someone, somehow, managed to push a tag that was pushed upstream to GitHub

You have to define what the signature means.

IMHO it is fine for it to mean "this software was built on our build server from a well-defined state of the source code, which is only changable by our employees and contractors, and for which we have the full change log". So I deploy the code signing key to build servers, which is the only place where it is used.

I'm interested in what alternative meaning you would give to a signature. I have considered the possibility of tying it to the QA processes, but then a build can only be signed after checking it manually, which is problematic when many signatures are needed at multiple packaging layers (exe/dll, msi, setup.exe).

The problem is when a malicious package is produced, either because a flaw was introduced in the code, or because a dev machine was compromised, or _when the CI machine sad compromised_; the malicious package will be signed as if it were legit.

One middle point between automated and manual signing is, as usual, key rotation: have the signing keys expire in a short duration of time (say 2 weeks) and manually push them every week, so that the window of attack is as small as possible.

What does a key rotation solve? Either your build server is compromised or it's not.

You add another stage.

1. A release candidate X is tested in a CI

2. If tests pass, the CI sends a notification to the build server. The notification is "prep for release package <hashid>"

3. Build server pulls code from the repo, matches it against "ready, CI passed" notification and builds the package/packages.

Compromise of the entire CI/dev chain would be contained as the builders act as a new pipeline entry point running in parallel of the CI using pull method. To compromise keys located on a build server one would need to either get access to it via whatever the method of remote access the server has ( which should be nearly none ) or figure out how to compromise the code running on the builders using the input from a repo that passed CI.

Unless you audit the entire codebase prior to a manual build, from a machine you know hasnt been compromised with a key you know hasnt leaked, how is a manual build different to CI/CD securitywise?

Auditing the entire database is hard and is not part of my concerns why I rather not to use a CI/CD in my specific case.

This is what I gain by not using the CI/CD the rest of my team uses:

* isolation: I build applications to be delivered to personal computers of software engineers (mostly), they build applications to run on our own internal servers. * my SSH client requires I have my SSH key with me, while I believe I can achieve something similar with a web-based CI/CD, the client-side certificate isn't something as "production ready" out of the box as 24 years old SSH is. * if someone manages to push malicious code to my code base, I am going to notice during manual check: yes, I manually check the diff commits to see if anything weird came up (mostly thinking about bugs). In practice, I basically check if the commit hash is the same as the one I just pushed (usually containing the release notes). If it is, I build. Otherwise, I check what is going on (most likely, I forgot to checkout the tag).

You can say that this doesn't rule out my machine from being compromised, and I must agree... However, besides being very unlikely that I am a target of such a complex attack, I try to do my best to have a secure development environment.

If I were a high profile target, I would just use a spare safe machine to use in deployments (I believe Linus Torvalds use something like this to vet the security of the Linux kernel but I couldn't find the reference).

> Escalation could have been avoided if developers only had the access they absolutely required and did not have root access to all of the servers. I would like to take a moment to thank whichever developer forwarded their agent to Flywheel.

I'd feel so small if I were this developer right now :-|

A couple of his issues appear to have to do with the use of SSH. An Ops-guy whom I worked with had setup a bastion host with ip whitelisting that automatically shut down after 1 hour. He didn't like it as his credo was "if you're using SSH when using a cloud provider you're probably doing something wrong"; meaning to say you should automate and be able to recreate any infra at all times with logs accessible without the need for SSH. I never forgot that.

no, but I wouldn't be surprised if my colleague got the idea from him. It was around the same time (end of 2014 I think).


I can only think of the despair I would be in if I was in Aragorn's shoes

Arathorn (Aragorn is son of Arathorn).

Does being a parent prepare you to handle emergency situations like this better?

I have just checked quickly the comments and post-mortem, and I start wondering - it seems that the attack itself would be not really possible if Matrix would not be open source (as this would restrict access to the sensitive data)? Is that right?

This is not right at all, the hack was due to an outdated Jenkins instance and could have happened regardless of what other software was running on the infrastructure.

Absolutely not. There's a zero reason for development infrastructure ( which includes Jenkins ) to have any connection to production outside a well defined "transfer this tarball" to a deploy staging server path during a known deploy window which cannot be controlled via development credentials.

They closed the threads and deleted all the comments, but luckily the issues page was archived beforehand:


Luckily yeah. So much wisdom in those comments. /s

I am highly skeptical when people taking about "rebuilding [the whole] infrastructure" in a few hours. Even more so when restoring all data from breached systems and before a thorough incident analysis. Show me the org which can just pull that off.

This is doable with proper IaC implementation, and if your org does not have RPO/RTO on lock they're doing it wrong.

Events like Matrix experienced now do not lead to panicked frenzy when this is in place.

It is certainly doable, but I doubt that most people have IaC which is complete, reproducible and tested enough. And the data migration from the breached host still means some risk.

It's quite common in enterprise to have a RTO of a few hours, and RPO of a few minutes, even for infrastructure with terrabytes of data. Of course, many moneys are paid for being able to do that.

I think it boils down to the fact that infrastructure for projects (no matter the size) is usually a second class citizen at best.

Either no one is eager to care for it, or the people who are actually focused on developing the software run it because they need to, or worst case - no contributor is trusted enough to handle infrastructure work, with access being given even more sparsely than commit rights to the whole software. Which is fine by itself, but there are so many (big) projects where infra is kind of terrible because 3 out 100 people involved are doing all the work. Or don't.

It's just a problem with the industry that's like "let's just get something out there and achieve product market first..will worry about infra and security later" that later is just pushed into a backlog and forgotten.

As a classically trained sysadmin I've seen this trend going for the last decade.

Some developers seem to be pushing that ops shouldn't exist any longer or should be outsourced to google (who don't hire ops) or amazon (who do).

Managers see this trend and think that hiring only developers is a good way to save costs and do things the "new way".

Traditional ops roles are indeed not as required but security/process/reliability focused people should not be the same people who write new features. They're in contradiction of each other often.

If you're a developer who thinks ops shouldn't exist any longer consider this:

I can write software and design websites as a sysadmin, does that mean I don't need you now, or that I know everything you do?

I argue that it doesn't. A focus on automation is one thing but defenestrating the notion of operations/SRE is going to net you a bad time.

I'm probably really out of the loop, but what is matrix.org? Looks like an open source slack clone? Why do they have >5 million user accounts? Is that everybody who uses that chat tool?

I recommend this video for you: https://www.youtube.com/watch?v=C2eE7rCUKlE

It describes not just Matrix in the French state (like in the title), it also covers the Matrix 1.0 release and what they want to do with the project (e.g. they eventually want to shut down matrix.org once the ecosystem is mature).

Matrix is a federated chat protocol. Matrix.org is just one server instance.

This is a good mostly-layman's overview of Matrix: https://www.ruma.io/docs/matrix/

Matrix is what happened when somebody looked at XMPP and yelled "NIH".

Matrix is what happened when somebody looked at XMPP and yelled "wow, this aged poorly and has some major usability issues".

And instead of fixing the issues they just went to do a completely new and incompatible thing. That's the very definition of NIH

Especially that, after some period of stagnation, XMPP is doing pretty fine these days with stepping into the modern world.

Ehh... not really. I still can't find a good combination of server, desktop client and iOS client that support things like OMEMO, history sharing between clients, and voice/video chat. And the one iOS client (ChatSecure) looks really dodgy and regularly fails while setting up push notifications.

I think there are something like 3 iOS clients that support OMEMO now:

* https://omemo.top/

* https://monal.im/

* https://zom.im/

And instead of adding those features to existing clients, let's create a brand new protocol, server, desktop client and mobile client. Because why not?

maybe because, among many other reasons, it takes so much f time for something to change in the XMPP world, because you have to wait for the XSF to validate any change, then all the server devs to implement it, then all the client devs to implement it, then all the sysadmin to update their (very often very old version of) XMPP server, then for the users to update their clients (which, with Android fragmentation for example, is a PITA) ?

That's not really true, there are significant architectural differences that made sense as a new protocol. (The biggest being that "replicated conversation database for realtime conversations" instead of a point-to-point message sending/routing thing) - Arathorn explained it well in a comment last month https://news.ycombinator.com/item?id=19419832

and the animation at the bottom of the matrix.org homepage was quite helpful for me

More they looked at IRC/Slack, isn't it? I suppose you could compare it with XMPP group chat which has recently gotten end to end encrypted.

4chan is circulating this picture. It shows the defaced website frontpage.


I have a hard time with the idea that they run the webserver and the matrix server on the same computer. (Regarding users.txt)

It seems they do urgently need to hire capable infrastructure people.

They didn't host production and the website on the same server. The attacker had access to the whole network. After that was detected and cleared, the attacker was still able to change DNS records. The domain was redirected to an attacker controlled site (https://github.com/matrixnotorg/matrixnotorg.github.io) where some logs of the production servers where posted.

Correction: Not on the same server. They just managed to repoint DNS. See 2019-04-12 update on the incident article[0].

[0]: https://matrix.org/blog/2019/04/11/security-incident/index.h...

I can't access that image on my corporate network, any chance of an imgur mirror?

It's just a screenshot of the same info shown on the archive.org page linked by the title.


If you can't get to archive.org, just respond and I'll imgur it.

I believe this is meant to show that it is a targeted attack on the project lead:


Unfortunately I don't have any background context for possible reasons why "actual transparency" on the top line is the issue chosen by the attacker, but makes it seem ideologically driven.

I don't think that this is a targeted attack.

Seems more like a way of showing "I got access to 5493973 passwords and to show that, instead of picking some random users, I'll pick the one responsible for the shoddy security".

Or it might be a clear, concise way of showing that he has access to the entire file without disclosing the information of random users, which also happens to be a particularly short command.

As for motivation I don't know, but I would like to state that I have followed the Matrix project for some time and have found it, and the CEO, to be transparent.

The interesting thing for me here is that none of the other homeservers were affected. Despite the weak security on the largest servers, the ecosystem stays alive.

Antifragility at its finest.

... or is it? https://github.com/matrix-org/matrix-doc/issues/1194 and https://github.com/matrix-org/matrix-doc/pull/1915 and https://github.com/matrix-org/synapse/issues/4540 would tell a different story: Potentially deleting data on remote server just because being matrix.org (or anyone with an access)

I might be misreading it, but it seems that the issues you are pointing out related to 3PID, which is still somewhat centralized. Sure, work on this needs to be done yet the system is evolving to be more independent.

If I paid monthly for a server and then spent time configuring it, now I would be able to talk to people who pay monthly for their servers and spent time configuring them.

Alternatively, if you paid monthly to any hosting provider, you could be talking with customers of any other hosting providers.

And if there were any other hosting providers that came doing such shoddy things in their production systems, they would be wiped out of the ecosystem, but the ecosystem would still be alive.

Just like email, or phone lines... antifragile.

TL;DR: Looks like there was a server with an unpatched Jenkins instance running, which allowed RCE. [0]

Someone (presumably a developer) was connected to that compromised server via SSH, and had forwarded their SSH agent to it. [1]

Apparently that person had root access to the production servers, allowing the attacker to login via the forwarded agent. Yikes.

[0]: https://matrix.org/blog/2019/04/11/security-incident/

[1]: https://github.com/matrix-org/matrix.org/issues/358

Thanks for that summary, the twitter thread that I read on it was not quite as enlightening as this small summary!

It seems he used github for hosting his content on matrix.org https://github.com/matrixnotorg/matrixnotorg.github.io

Looks like all issues created by the "hacker" have been removed?


Seems like the user itself has been deleted, which might cause Github to remove all content created by that user.

we (Matrix.org) haven't deleted the issues; we were deliberately leaving them up for reference.

As someone running a Matrix homeserver I take this incident as an example of the benefits of decentralization. Unlike in more centralized services, the security lapses of Matrix.org have had no affect on my homeserver.

It's "usless use of cat". He/she should have gone:

`grep arathorn users.txt | head -1`

Instead of:

`cat users.txt | grep arathorn | head -n1`

Hackers these days.

It's a "useless use of head". He/she should have gone:

`grep -m1 arathorn users`

Instead of:

`grep arathorn users.txt | head -1`

Commenters these days.

It's been a while seeing UUoC awards on a random internet discussion.


I still dont get that logic. What if it turns out i want to stick a prerprocessing step before grep? With a "useless" use of cat, thats easy. Without it, i need to do some rearranging. Not convinced.

It's the YAGNI principle.

I like the fact matrixnotorg decided to alert Matrix to Elasticsearch's existence.

But Matrix probably should first figure out how to fix the whole 'all server management ports are open to the internet' problem detailed here: https://github.com/matrix-org/matrix.org/issues/360

The last thing we need is another Elasticsearch instance listening on a public IP accessible to the world.

The (presumed) attacker opened a bunch of issues in Matrix' GitHub issue tracker, explaining the security issues leading to this compromise: https://github.com/matrix-org/matrix.org/issues/created_by/m...

TL;DR: A collection of inadvertences and suboptimal practices, some (like having GPG signing keys on production systems) more worrying than others. Something that could probably have happened to most orgs without dedicated security resources.

I lost all my messages with my girlfriend (of course).

Can anyone clarify: if I use their "server key backup" and set a passphrase, I am now two passwords away from giving the next hacker read access to all my messages, is that right?

They had root account activated in hebe? Am i reading this right? He got an passlist of 5 million users?

But even if they didn't, "sudo -u root /bin/bash" or similar gives it to you unless sudo is extremely locked down (which, from audits I have done, is "rarely if ever").

There are hundreds of ways to get root prompt even with the root account nominally deactivated.


I never use SSH agent forwarding but instead use -D socks5 proxying to directly ssh from my host to the ultimate target hosts.

From reading the headline couldn't help but think to myself was it Neo?

I've been slightly annoyed with matrix ever since they boasted at FOSDEM with the fact that they backdoored their encryption so that the French government could Virus scan sent files. :/

we didn't backdoor the encryption. instead, we specced how clients could securely pass attachment keys to an AV server, if they need to. but in practice none of them (other than the french app) do.

the whole point was to spell out that we haven't backdoored the encryption, and instead been transparent about how content filtering could be done in the most responsible manner, if it's really needed.

in order to regain trust in matrix.org, what other options exist than an external security audit?

I'm far less concerned about this because all my messages were end to end encrypted.

Audit the source code, host your own homeserver, and use e2e encryption. The power of open source and decentralisation! :D


Reason why you would do that (and I often do) is that you have further piping options so it becomes standard work flow and muscle memory.

What I usually do is cat the file to inspect it, hit Control+C, then up arrow for previous command, then further pipe and head/tail/grep the file.

Starting a grep command is fine if you know that's all you're going to be doing.

You can use Bash (and I assumed it works in zsh too) variables to save you there. eg

    grep something !$
`!$` will be replaced with the last parameter of the previous command. Crude example

    cat /etc/hosts
    grep $! # will be executed as grep /etc/hosts
I find this extremely useful for when I'm cat'ing a file to get a sense of it's output then wanting to do something more meaningful with it in the next command.

That all said, I have absent-mindedly done my fair share of stuff like this too:

    !! | less
    # eg will run as `cat /etc/hosts | less`
    # if `cat /etc/hosts` was the previous command run
so I certainly wouldn't look down on people who have "abused" cat due to muscle memory.

You can also hit Alt-. to insert the last argument of the previous command.

Unfortunately not on Macs :(

That's one of the features I miss the most when using terminals on a Mac.

Of course you can. It’s «ESC .» or «^[.»

Or you can check “Profiles → Keyboard → Use Option as Meta” for Terminal.app [or just press ⌥⌘O]. And then use option as meta.

> Of course you can. It’s «ESC .» or «^[.»

Thanks, I wasn't aware of that. However it's a different hotkey and only pulls the last parameter. The shell I use, you could hit <alt>+<number> and you would get a parameter of that number completed - not just the last parameter. It was very handy for rebuilding long command lines with different arguments.

> Or you can check “Profiles → Keyboard → Use Option as Meta” for Terminal.app [or just press ⌥⌘O]. And then use option as meta.

I use iTerm2 - which does seem to have similar options but I'm yet to get it working. I know that's down to user error but it's still a real pity that it isn't just the default behaviour (in either iTerm or Terminal).

ctrl-meta-y (readline’s yank-nth-arg) pulls any argument.

Works fine here (Terminal.app + bash 5.0.3 from homebrew).

Also, isn't that special variable $_ and not $! ?

> Works fine here. Terminal.app + bash 5.0.3 from homebrew.?

Doesn't work for me. Maybe I've broken something on my build? Or maybe you've redefined your keys to emulate the [alt] key?

> Also, isn't that special variable $_ and not $! ?

Sorry I meant `!$` not `$!` (updated my post accordingly).

Yes, $_ does the same thing too.

Go to settings and enable "use option as meta key"

I just use M-. (meta+.⃣) for that (aka M-_ aka readline’s yank-last-arg)

that is correct, but this case is obviously different :-)

I do cat x | grep y, because that way you separate out the primary data being passed around and the secondary instructions for how to process it. Preferring functional programming, this is my bread and butter. It´s superior readability and simplicity is something that gets engraved on the inside of your mind after you do a pipe a few hundred times per day every day. This is not about being terse, terseness is almost never a factor.

  ´(prepend ¨\n¨)
  ´(prepend ¨piping¨)
  ´(= ¨piping´)
To think of writing (read-to-string ¨log.txt¨) in the above pipe is wrong. Might as well load on more functions on the input line:

  (prepend ¨piping¨ (prepend ¨\n¨ (read-to-string ¨log.txt¨)))
  ´(= ¨piping´)
The simple principle that emerges is that the first line is for input, not for applying functions. You might guess that someone who does this even in a pipe as simple as cat x | grep y has this muscle memory too, but someone who criticizes it, certainly doesn´t. Maybe that´s not a bad thing, but you do feel a large divide between each other.

I'm fond of this bashism: `<myfile command | command | etc`

This is portable syntax, not a bashism.

As someone who doesn't care, my eyes bleed every time someone tries to make others feel small for no reason using improper grammar:

"As na linux/unix sysadmin": typo ('na') aside it is 'As a linux/unix sysadmin'

"...my eyes are bleeding everytime I see" I think you meant 'my eyes bleed', otherwise it means that, coincidentally, you eyes were already bleeding every time you happen to see someone use cat in that way.

"everytime" is wrong, the correct form is "every time"

My point isn't to insult you, it is to show you that everyone has blind spots and we shouldn't give each other such a hard time.

Grammar and spelling blind spots don't compromise the security of thousands of people. I think if you're going to make such glaring mistakes, you should be able to take a bit of guff for it.

> Grammar and spelling blind spots don't compromise the security of thousands of people

Neither does useless use of cat. Or am I missing something that I couldn't read because of a deleted comment?

No you're not, my mistake. Due to the parent comment being flagged, I incorrectly the response as a reply to different comment.

Haha, that escalated quickly ;)

There are several benefits to it. "cat filename" is a natural starting point for building pipelines interactively one step at a time. It puts the first input on the left side of the command that consumes it. (The alternative of "cmd1 [<]filename | cmd2 ..." does not quite read left-to-right, and some shells do not support "<filename cmd1 | cmd2 ..."; for example, fish doesn't.) When you are done with preliminary testing and want to run the pipeline on a larger amount of data "cat" is quicker to replace with "pv" (and vice versa). You do incur a performance cost with both "cat" and "pv". The command consuming the input also can't seek on it with "cat", but it doesn't matter if you are processing the input one record at a time. If you like the ergonomic benefits, use "cat" until you are optimizing for performance.

but but but... what about the single responsibility principle??? Grep can't both load the file and pattern match against it's contents! Sacrilege.

I know you're being facetious, but every utility is supposed to be able to open and read a FILE handle, even if it's just stdin.

Well technically you're not opening stdin. It's handed to you by your parent.

As a sysadmin with 20+ years of experience, I always type `cat | grep`, completely involuntary, probably because "cat = read" is just burned into me, by the time I think about it the command is already written. Also maybe related to catting more than grepping :D

I do it because it's easier than remembering grep's argument order.

More importantly, it is easier to remember how every tool expects the file argument. cat | foo always works.

Try this then, to hook up the file to grep's stdin directly.

    <filename grep foo


Yeah, but there are several reasons

1 - you want to add more filtering/processing before the grep

2 - grep's command line options are confusing (+ globbing + whatever), easier to just use it to grep stdin

3 - It works. Sure 'grep pattern file' works, but here that is inconsequential. I'm not in an 80s machine to worry if I'm opening one more process or pipe than needed, especially in simple cases like this

I have no issue with people who want to prefix grep (nor any other command for that matter) with cat. However I do completely disagree with your 2nd point. It's literally just:

    grep  [flags]  search_pattern  [filename]
In GNU grep (eg Linux) it's even easier because your flags can appear wherever in the command you want (even at the end). Though I'd recommend sticking to the habit of having them after the executable name for readability and portability with non-GNU coreutils.

It's really not that hard. There's plenty worse CLI tools to use which we're now stuck with because of historic reasons.

It's easy, once you get it

Take a look at the man page (linux)

    grep [OPTIONS] PATTERN [FILE...]

    grep [OPTIONS] -e PATTERN ... [FILE...]

    grep [OPTIONS] -f FILE ... [FILE...]
Now -f does not specify the file to grep, it specifies a file where to read patterns from (and they have the same "variable" name there, confusing)

Not to mention globbing and other shell escapes (which is not grep's fault, of course, but you might end up hitting in some situations)

That's not a typical usage of grep and nor is it even the first instruction in the man page.

Let's actually take a look at the man page shall we:

    grep [OPTIONS] PATTERN [FILE...]
    grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]
That first line is more or less exactly what I posted, is your typical usage of grep, and is very easy to learn.

Sure, you can list of esoteric examples of grep usage but that's besides the point if it's not how people would typically use grep (in my ~25 years of command line usage, I can't even remember one occasion when I've needed `-f` - not saying it hasn't happened but it certainly isn't something I've needed regularly)

Stop cat abuse.

Doesn't surprise me that much, Matrix doesn't seem to be too concerned with security, more with security theatre (considering you can still not easily disable read receipts in your client, a major privacy leak IMO, among other issues).

Disabling read receipts is a client feature. Yes, the currently most mature client doesn't have that feature, but nothing in Matrix precludes it.

Much better than telegram which has in the ToS that you may not make a client that does not sent read receipts.

There is plenty of other issues with matrix and the reference clients on top of something as simple as mandatory leaking of your presence in a chatroom. I've run a matrix homeserver for almost 3 weeks and it as an utter pain to maintain, despite not a single version upgrade and I was plagued with issues that no chat platform would have if the protocol was remotely sane.

edit: That is on top of the numerous security issues this hack uncovered. Apparently the matrix.org devs kept a users.txt file with a dump of users + passwords on the server. Signing keys for debian packages were stored unencrypted on the production server. People used unsafe SSH settings (SSH Agent Forwarding), ran outdated servers with known root-priv RCEs for months and root privileges for all users on a server. Why should I ever trust a matrix developer with their protocol or reference implementations ever again if they can't be trusted with the simple task of updating a service when a critical CVE comes out?

> I've run a matrix homeserver for almost 3 weeks and it as an utter pain to maintain, despite not a single version upgrade

As a counterpoint, I've been running a Synapse (a Matrix homeserver) for about 1.5 years now and it's been smooth sailing throughout, including the frequent upgrades. Maybe it's different at a larger scale (my userbase is 5-10 users), but if, as you say, you did it for three weeks, I guess you didn't have magnitudes more users than I have.

I've had 12 users I brought over from my mastodon instance. I've received multiple complaints about matrix from each one, half of them stopped using it after a week over the privacy and usability concerns.

While you bring up valid concerns about the Matrix team's security hygiene, the point of an open standard is that anyone can (try to) spot flaws in it, and anyone can (try to) create their own implementation.

I myself am waiting for a healthy ecosystem of servers and clients to spring up before starting to rely on Matrix for anything non-ephemeral - even if it takes years. Perhaps I'll even try my hand at writing a client, if I ever run out of things to do. In the meantime, I will dick around with a throwaway matrix.org account to play with it, and to watch progress happen.

> myself am waiting for a healthy ecosystem of servers and clients to spring up before starting to rely on Matrix

Good luck with that. Right now there's only the centralized matrix.org server, or actually there isn't because it's down. If you want open standards and multiple servers (or your own) use XMPP period.

It's not so much a technical question as it is the attitude of "hey we're implementing our own chat protocol cause XML sucks". Totally not getting the point why users and developers would want to use standard protocols - to save their efforts becoming obsolete, taken over by a single entity, or both. It doesn't help either that scarce development resources are needlessly fragmented between XMPP and matrix.

That said, if the matrix protocol can actually manage to attract users and multiple implementations some years down the road (about 30-40 years after IRC), more power to them.

> It doesn't help either that scarce development resources are needlessly fragmented between XMPP and matrix.

In my experience, there's virtually no overlap between the two groups, and therefore no fragmentation. And for good reason: XMPP is a nightmare to implement, so there's a significant group of developers that just won't touch it, but that might be interested in working on Matrix.

And yes, part of the blame for that lies in the usage of XML. While XML can be useful to represent complex data or documents, it's unsuitable as an over-the-wire format because it doesn't have a directly mappable representation in most languages, due to the combination of attributes and child nodes.

This problem doesn't exist for JSON, because pretty much every language directly supports arrays, objects/maps and primitives. This makes a JSON-based protocol much more pleasant to work with, as there is less data-wrangling complexity involved.

> This problem doesn't exist for JSON, because pretty much every language directly supports arrays, objects/maps and primitives

No, JSON will not map directly to a language with advanced type system (with tuples, variants, etc). Even in Elm it's recommended to write a decoder to convert incoming JSON into an internal structure. So in fact the mapping is very poor. And I see no difference in this regard: both XML and JSON is crap.

A chat log is rich text with emojis, photos/videos and other binary data, memes, rich inline citations, block quotes, attachments, and endless new formatting practices and gimicks that are unknown yet as digital communication evolves. XML/SGML is made for this kind of application.

End users don't throw arrays and maps at each other on chat, so JSON's affinity to co-inductive datastructures of programming languages (actually just JavaScript) doesn't help all that much when you have to invent ad-hoc markup over arrays, maps, and primitive types, or ad-hoc hex string encodings of binary data. TBH flocking to JSON because JavaScript can represent it as an object literal is a pretty junior attitude and reflects poorly on the matrix effort. It's equivalent to using BASIC or notepad.exe because that's what's installed on a computer out of the box.

> In my experience, there's virtually no overlap between the two groups, and therefore no fragmentation

Right, so there are two groups of developers working on different IM protocols. If this is not fragmentation (of developers) then what is it?

You know, I never understood why people consider JSON better than XML. Yes, any particular use of XML can be overengineered (namespaces, I'm looking at you), but as long as you control the format or scheme or however you want to call it, it's exactly the same thing as JSON, but encoded differently. In the end, it's all just keys and values or lists of values, arranged in a tree-like hierarchy.

And frankly, I would rather be looking at a well-designed XML format than at a well designed JSON format, with its braces and brackets and commas.

I don't like XML namespaces either (and neither is the original authors of the namespace spec very proud of it [1]). They're greeting you with verbose and rather ugly pseudo-URLs (another bad and confusing concept IMO) and xmlns:xyz boilerplate on page one when you're interested in quickly gleaning XML data.

But arguably, chat log data is actually an appropriate use cases for namespaces, given that you would want a text format that can evolve over time in a heterogenous client and server ecosystem, yet provide a baseline functionality supported by all clients. It's also very helpful if you want to keep chats for archival rather than treating chat as an ephemeral medium. OTOH people have said the excessive use of namespaces and other XML modularization features, and too many XEPs/RFC specs is turning them away from developing XMPP software.

There are valid use cases for JSON though such as ad-hoc data protocols where you own both the server and (JavaScript) client and maintain those in the same repo, and when dealing with simple app data that doesn't benefit from using markup constructs.

[1]: https://www.tbray.org/ongoing/When/201x/2010/01/17/Extensibl...

> Right now there's only the centralized matrix.org server,

I wasn't affected one bit by the outage. Why? Because I run my own homeserver.

> Right now there's only the centralized matrix.org server

That's just blatantly false. Approximately 50% of Matrix users are on other homeservers.

> It's not so much a technical question as it is the attitude of "hey we're implementing our own chat protocol cause XML sucks".

This is just a strawman. Every single talk by Arathorn explains, in great detail, why Matrix is not just "XMPP but JSON". Maybe you disagree with their reasons, but then you should argue against their reasons not some other reasons that you came up with.

> That said, if the matrix protocol can actually manage to attract users and multiple implementations some years down the road

Given the recent hack it looks like Matrix has about ~10 million users federating with each other (if 50% of them are on Matrix.org and Matrix.org has 5 million users) -- and this doesn't count bridged users which aren't using Matrix but are benefiting from the ecosystem.

And there are also several implementations. Riot is the most popular and polished one, but there's a whole bunch of others[1].

[1]: https://matrix.org/docs/projects/clients-matrix

Where are you getting that 50% estimate from?

Arathorn mentions this in a bunch of his Matrix talks (and has mentioned it on HN too I think).

For the record: about 2.5M of these 5.6M users on matrix.org are native to Matrix, rather than bridged. the 50% guess comes from the fact we see 8M in the phonehome stats right now (including the 5.6M on Matrix.org) so that gives 2.5M on and 2.4M off. In practice the number off would be much larger given lots of bigger deployments we know about don’t phone home.

According to the stats they've last reported, the split's about 50/50 between people on matrix.org and people on alternatives, and they've said several times they want to eventually disable or turn off matrix.org.

Also, Matrix is definitely about more than just not-XML - the entire protocol is set up as eventually consistent sync of rooms between servers, which they said would have made a mutant XMPP if they had tried to shoehorn it in

>they want to eventually disable or turn off matrix.org.

What happens to everyone on matrix.org?

Disabling registrations of new accounts, not deleting old user accounts.

However something that they are working on (which is a fairly complicated project) is making accounts migrateable between homeservers. Then, users would be able to seamlessly migrate their accounts off Matrix.org.

That sounds great!

I can't wait years. I need to pick up a definitive platform right now to push as an alternative to proprietary ones. It would suck to migrate all my friends to something just to ask them to move again to something else a couple years later.

I would agree with that. We need something that works now, not when someone finally manages to reign in the Matrix protocol.As it stands I cannot send a friend an invite to Matrix and expect them to like it one bit (which turns out, is what reality looks like).

"I need...", "We need..."

The world doesn't really care about what you need. It simply doesn't work like that. If you have a need, do something about it and help out.

I'm aware the world doesn't care but I simply don't have to time among other projects and my actual job to write a ground up chat application plus federated protocol from the ground up. The "just write it yourself" attitude is frankly insulting to the end user who can't code at all.

"utter pain to maintain"

How so? I've hosted a synapse server for a year and I have never had a problem with it even after major upgrades.

Maybe you should disclose, that you have a personal dispute with the founder of Matrix.

Why should I disclose untrue things? I have no issue, personal or otherwise with the founder.

You edited your post? Before it said, you operated "the" single federated independent Matrix Server.

And there was only one guy doing this as far as I recall.

I've not edited my post in the way you describe, or in any way comparable. All edits I perform either do not change what I wrote (spelling) or I very clearly append it to the post with an "edit:" prefix.

I've not written that I operate "the" matrix server.

Maybe you should just leave this mentality of "if said thing disapproves of somoene's business, the person saying the thing must be opposed to someone in person." behind.

Because with this mentality of yours means you have a personal dispute with me. Better disclose that before saying such things.

"means you have a personal dispute with me"

I never talked to you, unless you are the same guy who "operates the single federated independent matrix server" and who does have a personal dispute with matrix.

It seems you confused your socket accounts?


No no, what I mean is, world doesn't run on personal vendettas, but rather, ideas. So if you want to be tribal, be tribal about the ideas.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact