Hacker News new | past | comments | ask | show | jobs | submit login
Notes on privacy and data collection of Matrix.org (gist.github.com)
144 points by maxidorius 34 days ago | hide | past | web | favorite | 74 comments

XMPP self hoster here.

I've been revisiting moving to self hosted Matrix around every 4 months now for 2 years, and every single time I failed.

The reasons vary; initially synapse refused to work, then I got stuck trying to set up a multi-domain service.

That said, this document verifies what I feared in the background: what matrix offers as self-hosted is too simple to be true, and thus it's no surprising I never got it completely running.

XMPP has it's own issues, but when I self host it, it's there, nowhere else. No identity servers, no push servers, no jitsi servers in the background.

It seems like I'm going to be with XMPP for a much longer time.

My experience is that synapse is not harder than ejabberd (I don't have experience with other XMPP servers)

Matrix tries to do a lot more than XMPP. In my experience, people find XMPP too limiting, so they don't use it.

> Matrix tries to do a lot more than XMPP

It's not that simple. Many doesn't know about omemo, jingle, etc, when it comes to xmpp. Or xmpp bridges like biboumi.

Matrix is is doing the same thing, but on differrent - and way more complicated - infrastructure ideas.

Prosody is definitely much simpler to configure, even with multiple domains, especially when per domain specific setting are needed.

The problem with omemo, jingle, etc. is that you have makes sure that the XMPP components you use support them. It would be fine if all popular implementations support those features, but that's not the case.

So you can't just say, let's use XMPP. You have to be very specific and make sure people use the right versions.

You can say that Prosody is easy. I don't find the following list easy: https://prosody.im/doc/modules

And you probably need 'Component "conference.example.org" "muc"' for any kind of 'room' support.

The the next question, does prosody have the equivent of federated rooms in matrix. Here is a list of XMPP extensions in the documentation: https://www.prosody.im/doc/xeplist

I guess the answer is, there are no federated rooms in prosody.

Another question is whether it is possible to send someone an XMPP message when that person is offline. I have no idea how to search for that.

No federated rooms. Offline messages are supported.

I guess newer Prosody supports this and much more out of the box but the generic configuration instructions I used are here: https://serverfault.com/questions/835635/what-prosody-module...

> The problem with omemo, jingle, etc. is that you have makes sure that the XMPP components you use support them. It would be fine if all popular implementations support those features, but that's not the case.

Isn't this also the case with Matrix that no implementations except the official ones support E2E encryption?

> Isn't this also the case with Matrix that no implementations except the official ones support E2E encryption?

This isn't true. There are independent working E2E implementations in weechat-matrix and pantalaimon (python-nio), nheko (mtxclient), and even a read-only one in purple-matrix. Meanwhile lots of independent apps build on the official SDKs (e.g. Seaglass on macOS, bots like Matrix-Recorder, the various Riot forks, etc)

Glad to hear the update, thanks!

Just curious, which XEP is offline messages?

In Matrix, clients are supposed to implement the full Client-Server API. If a client leaves out e2e then it cannot claim to implement the matrix protocol.

Given that 1.0 of the full matrix protocol was published only a few days ago, it makes sense that anything other than the official implementations are behind. E2e in the official implementation is not that old either.

Even the official implementations need some work to be useful. For example, cross device signing is not there yet.

> Just curious, which XEP is offline messages?

Simple retaining messages until the client is online was supported since ages, I don't think there is XEP for that but I could be wrong. More elaborate scheme with persistence, multidevice access, paging etc is here: https://xmpp.org/extensions/xep-0313.html

> In Matrix, clients are supposed to implement the full Client-Server API. If a client leaves out e2e then it cannot claim to implement the matrix protocol.

Yes, XMPP has XEP suites that serve the same purpose: https://xmpp.org/extensions/xep-0387.html

Just a note - without XEP-0198 you will be losing offline messages if your connection is unreliable. I think most desktop jabber clients still do not support it.

> Isn't this also the case with Matrix that no implementations except the official ones support E2E encryption?

A general purpose project to provide all clients with E2E encryption is being developed (and is usable right now): https://github.com/matrix-org/pantalaimon

Right now it runs as a daemon on a user's local machine.

It seems that some people that host XMPP servers should select a set of required extensions with specific versions and give the whole thing a name and version and use that in lieu of “XMPP”. Likewise, clients would need to provide and claim compatibility with the extended XMPP. Has this not been done before?

> So you can't just say, let's use XMPP. You have to be very specific and make sure people use the right versions.

I too ran an XMPP server for years, used plaintext and OTR, it was nice.

This was in the days before everyone had tablets, phones and laptops. I used ChatSecure (formerly Gibberbot) on my phone and Pidgin on my PC.

OMEMO wasn't invented then and nobody else had the double ratchet so I had to just deal with the fact there wasn't multi device support and E2E.

Google and Facebook both offered XMPP bridges too. Google and Facebook have discontinued such services. Voice/Video never worked with them and file transfer with Pidgin never worked with Google Talk.

Now in today's world, how am I seriously going to convince my friends to use XMPP when they will say, can we use camera/video, oh do we have group E2E too?

Am I seriously going to say "lets use this XMPP client for chatting in text because it supports OMEMO, and lets use this other client because now we want to have a video call?".. What am I going to do when they're on Android?. Conversations.im is nice, but there's no voice/video with that.

The problem simply is there's no reference client that does everything. Many of the clients are ancient fugly GTK clients, and if they do Jingle it's only on Linux (Pidgin, Gajim, Telepathy based etc)

* https://en.wikipedia.org/wiki/Jingle_(protocol)

* https://omemo.top/

* https://en.wikipedia.org/wiki/Comparison_of_instant_messagin...

If we all use different clients for different things how am I supposed to say "Mom you click here to do that".

Then you do have some promising clients like Coy.im that look nice. And they've said NOPE NO OMEMO HERE. https://github.com/coyim/coyim/issues/233#issuecomment-21200...

Oh you can have video here, but no OMEMO https://github.com/jitsi/jitsi/issues/199#issuecomment-17017...

That is exactly why I had to dump my Ejabberd server after years of self hosting. The XMPP client app habitat is in disarray.

If Conversations offered voice/video, that would have been a different story.

You are not wrong at all.

That said, self hosting matrix seems to be similarly hard to execute at this point in time - simply too many opaque and moving components on the server side.

The riot client is also incredibly slow for my taste.

Installing Synapse is very easy actually with Pip, however as this OP showed that using your own identity server is also necessity. FOr that you want https://github.com/kamax-matrix/mxisd

But this is what I've been pointing at; to self host XMPP with multiple domains and per domain settings, I need prosody, nothing else. Not identity server, no video servers, etc. This is the based of my problems.

on what platform is riot slow? if android, give riotx a try - it’s roughly 6x faster.

in terms of too many opaque moving components serverside; the baseline is just a homeserver. pip install matrix-synapse and off you go. configure your client not to use an identity or integration server if you are worried about them.

I was pretty disappointed when I learned one of my contacts on Matrix was receiving my messages on his Gmail. That's a feature I would rather the client to not even have so people don't accidentally enable it. I hope it won't take too long for a more privacy-oriented client than Riot to appear. The best thing right now is MiniVector, which is a stripped down Riot fork with fewer permissions required: https://github.com/LiMium/mini-vector-android

Interesting concern - I'm thinking I'd like the various clients to alert me back - as a message sender that the messages were being auto forwarded to a non secure place like gmail - of course I think the option to turn off notifications should be possible on both ends.

That's a good idea. The important thing I think is to create a culture of privacy-preservation around the whole Matrix ecosystem so that features like that make it to the various clients.

The issue with that app is that there is no voice chat :(

I don't use voice chat, so to me that's a good thing, but if you need that feature the full Riot client is probably the better option.

You were pretty disappointed when you learnt that recipients of your messages can do whatever they want with them? And your solution is to remove the feature altogether?

How about removing the ability to copy text? Or the ability to take screenshots?

I was disappointed the program wasn't designed to nudge users towards more privacy-conserving behavior.

I think the point of the parent is that removing forward to email would only give you a false sense of privacy, as there are several other ways to record received messages.

    The following stack will be used as reference, with users
    connecting via web, desktop and smartphone clients:

        Client: Riot-web v1.2.1,
                Riot Desktop v1.2.1,
                Riot Android v0.9.1

        Server: Synapse v1.0.0
Version numbers are probably sufficient to in a general scientific setting. They are usually a precise reference to a specific piece of software anyone attempting to replicate the investigation should be able to find their own copy of the software and have reasonable confidence their copy is identical.

Unfortunately, it might not be a good idea to trusting that a version number consistently maps to a specific URL, or that a server will give the same file to everyone each time they ask fo a URL. We know that sending different versions to different people is common ("A/B testing"). If you're investigating the security of something or worse: you suspect you might have sentient opponents actively trying to deceive you, then version numbers are no longer sufficient: you should also include cryptographic checksums! The only way you can know that the file you received is the same is if you have e.g. SHA-2 hashes as proof. Even better, if it's important, include the RIPEMD-160, SHA-1, CRC32, and any other available hash/checksum because why not add redundancy and give people options.

Totally fair point, thank you for bringing it up. Given the numerous build types (source, pip, debian packages, etc), what would you suggest to do in this case? Give the git commit hash maybe?

A traditional approach is to attach a checksum file with all of the relevant packages, in the usual ${hash}sum output format (hashes truncated for HN-page readability):

    $ sha256sum *
    e406bcc...51c199a  riot-android-0.9.1.tar.gz
    8020cc6...d6126c1  riot-v1.2.1.tar.gz
    443b612...51e0cef  synapse-1.0.0.tar.gz
> Given the numerous build types (source, pip, debian packages, etc)

In the interest of making a reproducible investigation, it might be a good idea to include hashes for the specific packages being investigates.

> Give the git commit hash maybe?

That would probably work? This gets into the problem of reproducible builds, where builds from different environments might not be identical. This means documenting that you used "a build of version 1.2.1 git commit 7446799e4b0e3e65122f5642b5f3a8c59aae15bf" means something slightly different than saying you used "riot-v1.2.1.tar.gz with SHA256 8020cc617367a4318be090b1562a26571f1a3417b0d4a52b2d4f19e03d6126c1". That said, obviously having literally any hash to work from is much better than using version numbers alone.

Github links that include the commit hash might be useful, but it seems like you cannot link to both a tag and a hash? I wonder if github supports links that are a combination of https://github.com/vector-im/riot-web/releases/tag/v1.2.1 and https://github.com/vector-im/riot-web/commit/7446799e4b0e3e6... ?

After talking with the other contributors of the doc, we decided on not going into further details.

We acknowledge the need for reproducible investigation, but the document did not explain in a scientific manner how we reached such outcomes. We had to draw a line to keep the document on point with our message. Adding hashes wouldn't really make a significant difference.

We'll make sure to keep this in mind if we do write a follow-up with details on reproducible checks thought. Thank you for your insight!

This is a fantastic read. Thank you for investigating this and writing about it in such a clear way.

+1 for the thanks. I considered self hosted Matrix for a professionals community, but your researched basically killed that path for me.

Much to improve.

Pretty much everything pointed out in the OP only applies if your setup is configured to use a 3rd party identity, integration server and notary (trusted_key) server. So if you are selfhosting and you want to avoid using 3rd party servers, don’t use them!

Agreed that we should do better at presenting a max-privacy config preset and explaining how identity/integ/notary servers work to users (without making the UX unusable), but to throw away the whole project over this is throwing out the baby with the bathwater, imo.

I believe you are missing the point of the document. It is not so much where the leaks go, but that they happen when there is no consent or knowledge of the transfer.

Example: for Scalar, the issue says that Riot talks "too much" to it. The research is not about how many times Riot talks to it. It is that Riot talks to it before the user explicitly requested the service, and in a way that the user does not expect.

As we wrote in the paper: "Privacy protection is a mindset". It is not about fixing individual issues and then have new ones pop up because the underlying problem is not fixed. It is about having a process in place so it cannot happen again.

I'm curious what you are going to use instead. I know Matrix is not perfect (and I would wait for some of the planned features to land), but the most commonly selected options have a lot more problems.

Thank you for your feedback! It is great to know if it was understandable or not, especially given its length.

Thanks for the review. I feel like I gave them too much of a trust :(

Matrix devs, instead of battling the reviewer here, please make a proper blog post and explain what is really going on here. Tell us the truth about your data handling and the data retention.

The reviewer did his own share of work. If there are mistaken parts in his reporting, please correctly explain them in a civilized way in a possible blog post.


apologies if it seems like we’re battling the reviewer; it’s just that there is a bunch of stuff which is simply incorrect, which is frustrating. did you see our pdf response, out of interest?


Yeah, the issue with all these replies is that the real replies that addresses the issues can be lost. I really recommend that you put that on your official blog so we all can benefit from reading your response.

yup, our plan is to fix the valid issues highlighted here and blog a response.

Or what about joining the room given on the research paper and actually participating in the discussion? We would love to have some proper interaction where we can exchange network output and screen recordings that simply show first hand what we are talking about...

Replying to a blog to a living research document intended to be used by another project/protocol feels silly. Whatever you write certainly is not all up to date anymore with the corrections we made thanks to the Matrix community that joined the room, and the Grid community that kept on discussion the doc.

Where is Matrix.org tho? This could also be a good occasion for the three new guardians to join the community.

Uhhh the default configuration harvests your contacts database, wtf?

No, Riot/Mobile explicitly warns and prompts you to opt in if you try to discover contacts by email/phone number. It looks like this on Android:

"Riot needs permission to access your address book contacts to find other Matrix users based on their email and phone numbers. Please allow access on the next pop-up to discover address book users reachable from Riot."

That said, this analysis does have a few valid points in it, specifically:

* We should probably provide a click-thru when users interact with 3rd party identity lookup servers or integration managers

* We should hash contacts when doing bulk lookups

* Riot/Web has a bug where it talks to the integration manager too frequently (https://github.com/vector-im/riot-web/issues/5846)

* Notary servers should eventually be removed entirely (as per MSC1228).

However, most of the rest of it is alarmist and disproportionate FUD, plus the author has sadly forgotten to disclose that he's working on a hostile fork of Matrix. A point by point response is at https://matrix.org/~matthew/Response_to_-_Notes_on_privacy_a... fwiw (apologies for the PDF, but Google Docs doesn't seem to expose a read-only view of commented docs.)

> Please allow access on the next pop-up to discover address book users reachable from Riot.

Please don't say please to make people perform questionable privacy violations. How about:

"If you want Riot to determine which of your contacts also use Matrix and to easily enable you to talk to them via Riot, you can allow Riot to access your contact list.

Note: This will upload all your contacts' details, as stored on your phone, including addresses, birthdays, notes, and more if available to matrix.org. Here is the privacy policy."

Or something like that, whatever it really does.

Disclaimer: I like matrix :)))

> "Riot needs permission to access your address book contacts to find other Matrix users based on their email and phone numbers. Please allow access on the next pop-up to discover address book users reachable from Riot."

I know very well that concise wording in UIs is incredibly hard, but this wording is misleading. It makes it sound like Riot will not work unless I allow access to the address book. I'd suggest something like this:

> Riot can check your address book to find other Matrix users based on their email and phone numbers. If you agree to sharing your address book for this purpose, please allow access on the next pop-up.

Yeah, that's way better - thanks. have pushed the change at https://github.com/vector-im/riot-android/commit/786bf017852...

10 minutes from a hacker news comment to a github commit. Respect

That is just a text change, very superficial

Much much better. Thanks!

I definitely agree that your wording change is a big improvement.

Why can't Riot "check my address book to find other Matrix users" without sharing it to their server? Could the client make a one-way hash for each contact and send those to the server to compare against hashes of other contacts?

So in your wording, it's not clear that B (agree to sharing my whole address book) follows from A (Riot wants to check my address book for a purpose). Actually it's not clear if "agree to sharing your address book" means sharing it with the client app, or sharing it with a remote server that someone else controls.

I would appreciate if we could stick to the points brought up in the notes instead of trying to discredit the work of several people (writers, reviewers, sanity checkers) who equally contributed to it.

The document is clear that it puts the default behaviour and explanation next to what users understand out of it and expect, just like what the privacy policy of Matrix.org is based of in section 2.1.1. We have asked several technical and non-technical people alike, from our family members to our friends to people in our communities. And the feedback is unanimous: They did not understand nor expect what we described.

In terms of actually of handling the issues, the scalar issue is one we brought up with Ben months ago in private as per your disclosure policy, and yet nothing was done. This is just an example of a long list of issues brought up over the years.

The point of the document is not to find justification for what is happening, but to inform users that it is happening. An attacker got access to your systems which contained logs from which such data can be gathered. It is important that users who self-host and do not expect such data to get out realize that it does so they can take appropriate action.

The document might feel alarmist, certainly. It does not feel alarmist because we wrote it. It feels alarmist because the behaviour described is happening and nothing is done about it. It is not discussed anywhere. Attempts to do so are shut down. But it does not change anything: leaks are happening right now on thousands of servers and for millions of users (up to 9M, as per Matrix.org figure) and every person who we showed this to before publishing had the same reaction: "I never expected such data to go out like this. I am worried".

As for Grid, we made a specific effort out of respect for the Matrix.org people not to mention it or steer towards it. Yes we have forked Matrix. No it is not hostile, despite your continuous claims to label it as such.

We think it is time to stop talking about all the good reasons why, in the 5 years it took to get Matrix out of beta, there was just no time to deal with such leaks. We think it is time to start talking about how we can make sure it stops from happening and which decisions lead to it happening for so long unnoticed.

You wrote the software. Start respecting your users privacy.

And while your comments are valid, a large part of your comments are actual FUD, because every other chat application out there behaves the same (and for a large portion of the user base, matrix probably is just yet another chat app...)?!?

Especially your ongoing notion of metadata as private information which should be hidden is funny: how do you intend to do that? Short of wrapping your application into Tor (which seriously impacts performance letting your average family member happily pass it), I can't think of any method not including any BS-Bingo (how about a blockchain...).

I agree that the vector.im-identity service seems really unnecessary and it reminds of Mozillas approach to sync (yeaah, the ones with your cited manifesto cough); still, I was well aware that this means regularly contacting this server and probably also checking my contacts DB against it (as well as having metadata on my browser, like every other website + it's 23 ad-networks, uuh)? Also for anyone interested in actually hosting a server it's really spelled out plainly, that this is a measure for convenience and you can still host your own server – btw: did you ever try to integrate federation into syndent (you might show the world your archived Issue/PR...).

The part about the integration server is indeed worrying (but not you, putting at the end?!?) because without it, I don't really see the value proposition of matrix compared to plain old XMPP (and I wonder how you intend to monetize on kamax...). And I wasn't really aware of it...

The other parts

- I didn't give an eMail, wasn't a problem for me and I'm seriously not imaging any way to resolve this w/o aforementioned BS-bingo or yet another personal information (private/public key, which is beyond scope for most people + creates its own set of problems (people with unencrypted keys on their machines...)

- so the only way for matrix to read messages is by adding a bot? can the scalar.vector.im server initiate that too? otherwise your claim that vector.im can read all your messages is just BS

- you never mention that encryption by default would be cool. How will kamax.io handle this?

The document does not try to compare other chat applications with Riot. It does not try to say what is good or bad. We simply take how Riot is presented and understood by non-technical people, and see if its behaviour matches what they understood it does in terms of privacy. There was a mismatch, and people asked to know what was shared without their consent or understanding: we wrote it down.

We did the next best thing after improving sydent: we wrote our own implementation of an Identity server: mxisd. We linked it several times in the doc. You should give it a look. That's one example of how you can be better at privacy.

If the content of the document does not surprise you, and you were fully aware of all that was going on, it is also a win! Sadly, this is not our experience with the many users we came in contact with. They did not know, but wanted to know in details.

We do not mention End-to-End encryption would be cool indeed because it would not change what is happening here. In Matrix, the encryption would only cover the content of the event, but not its metadata (sender, source, timestamp, etc.). The document is clear that the vast majority of the leaks are around metadata (who sent what, who did what, when, from where) and not data itself (the message itself).

This document only scratches the surface of privacy in Matrix, by being specific to Matrix.org and its choice of recommended software. It gets worse as we start investigating the protocol itself. It is your choice to see this as FUD. It does not make it less true, and while you might not care, some do. We published the document for those who care and do not have the means, time or capacity to do such a research themselves.

Oh yeah I get that you are not comparing with anything else, because then your criticism wouldn't be on matrix but on how about 99% of the internet today work _by design_. With any protocol, where you are expecting any reliable, low-latency 2-way communication, each partner will know about the other, you won't work around that anytime soon. What you can work around is the number/mass of the partner which is acting as your always-on messaging relay, eventually trading convenience for privacy - I agree that vector/riot is not looking too good on this, but I don't see how they're not clear on it/working towards eventual resolution of this problems. One can argue whether this should be the first priority or an afterthought but I've grudgingly accepted the latter being standard today (just look at all those leaky SaaS-apps on hn...).

For the perception/expectations of average Joe on privacy/obscurity on the internet I recommend you read the recurring threads on any platform whenever there is a new "scandal" centered on whatsapp (europe): half of your commenters will just tell you that they are gonna use Telegram (yeah, the ones, where you don't know exactly whose behind and which think that encrypted group chat is too much of a hassle).

Regarding your comments that the protocol is broken, I'm really surprised how you are intending to tackle this? Why the hell are you using the very same protocol which is driven by a body which you claim intransparent and non-cooperative? If all your allegiations are true you would have been better of rolling your own/your software won't be compatible for long if you take your own writing seriously...

P.S.: care to elaborate who's "we"? Your projects have a surprisingly low number of contributors (which hopefully changes now), so I can't really figure out, why you are not just saying "I". Also don't know what's so bad on taking a stand in a civilized public discussion (if "we" decided to be anonymous)?

> Oh yeah I get that you are not comparing with anything else, because then your criticism wouldn't be on matrix but on how about 99% of the internet today work _by design_.

You want to quite a length to work in this irrelevant slight.

I don't know enough about the design/implementation and the overall context to add anything of technical value to conversation and I won't even try. I would, however, like to point out that both your 'tone' and 'demeanour' come across as incredibly hostile and unnecessarily personal.

Also, dismissing a possibly valid criticism or review of something because it doesn't present immediate solutions to the problems highlighted doesn't mean the criticism itself has no worth, and to discount it out of hand for this reason is folly.

The P.S. is unnecessarily personal, I won't edit it but hereby apologize for the harsh tone.

Other than that I suppose this is personal, since this seems to be the personal pet project of the author (and he constantly assumes a "we" as if multiple people were signing his rant).

And if you read my comments carefully, I never critizise the lack of solutions. My critic relies on the fact that he starts with very clickbaity "facts" which are then elaborated assuming that average joe is running his own homeserver and not knowing the tradeoffs of it. He then happily mixes up problems of any non-TOR messenger (of which matrix.org matrix is one...) and specific problems of running a matrix-homeserver with the recommended settings, using the vector.im phonebook (and apparently a bunch of these settings are bugs...).

Such as it stands, this is convoluted FUD. If he wanted to make a constructive contribution he might as well have stated the problems upfront (the current working matrix implementation relies on proprietary/centralized services) and then gone into a discussion of every problem step-by-step (and advertizing the very solutions he tries to sell). The problems of matrix/vector/riot are as real as Mozilla pivoting to the Firefox Service Company and integrating a host of proprietary tools, but the problem has deserved better than this rant...

iMessage does not upload your contacts. Don’t claim that uploading/traversing someone’s contact dB is necessary. You only ever need to do a directory lookup when someone messages another person.

BitMessage absolutely obfuscates the metadata FYI.

from wikipedia

> Bitmessage was conceived by software developer Jonathan Warren, who based its design on the decentralized digital currency, bitcoin.

So, looks like "blockchain"? Anyone else?

From the FAQ: https://bitmessage.org/wiki/FAQ

> On average it should take 8 minutes from the time you click the send button to the time you receive a response.

Whew. Privacy really costs.

>Riot/Mobile explicitly warns and prompts you to opt in if you try to discover contacts by email/phone number. It looks like this on Android

This is way better than the behavior of proprietary programs, but I'd much prefer if the program actively discouraged uploading private information. I can deny the permission myself, but there's not much I can do about other people's behavior, and with the way Riot currently works they are going to end up uploading my personal information.

A client that isn't even able to access contacts is one of my top wishes for Matrix right now. As soon as that appears I'm going to start recommending it to everyone over miniVector.

I am confused by the e-mail issue. The OP says that Riot says:

> If you don't specify an email address, you won't be able to reset your password. Are you sure?

In your response on pg. 4, you say:

> Commented [11]: Yup, this is the point of the service -to map email addresses and phone numbers to matrix IDs.

Is it possible to specify an e-mail address to be able to reset passwords without making this e-mail address public? Clearly, this should be the default setting if someone enters an e-mail address after the above prompt by Riot.

So email addresses are used for two purposes in Matrix: "administrative contact" for an account (for password reset), as per https://matrix.org/docs/spec/client_server/r0.5.0#adding-acc..., and for discovering users' mxids by email address (as per https://matrix.org/docs/spec/identity_service/r0.2.0#post-ma...).

At registration, if you specify an email address, Riot does sets it both for password reset and for mxid discovery. You (and the OP) are right that this should be clearer - it boils down to the fact that we need to add UI to remind the user that they're using an identity server (with given terms of use) and to confirm this is what they want.

We could also split it into separate actions (one to set it for password reset, and one to use it for discovery), and indeed before Riot this is how it used to be (there was a checkbox in Matrix Console at registration to let the user choose whether to bind their email). This got lost in Riot because of concerns that it made the registration UX too noisy and complicated (especially with custom HS & IS URLs flying around the place), so it currently binds their email by default. I've just filed https://github.com/vector-im/riot-web/issues/10054 to track addressing this.

You said "making this e-mail address public" in your question - it's worth noting that binding a 3PID does not publish it in a public list; instead, it means it can be used as a key to look up your MXID for users who already know your email address.

In terms of the other valid points the analysis raises, I've also filed a bug to track hashing contact details when doing lookups (https://github.com/matrix-org/matrix-doc/issues/2130, although i could have sworn we had one already). The other two issues (Riot/Web talking to Scalar too much, and the desire to remove notary servers entirely) already have bugs - https://github.com/vector-im/riot-web/issues/5846 and https://github.com/matrix-org/matrix-doc/issues/1228 respectively).


> You said "making this e-mail address public" in your question - it's worth noting that binding a 3PID does not publish it in a public list; instead, it means it can be used as a key to look up your MXID for users who already know your email address.

The domain part of e-mail addresses is public anyways due to certificate transparency, meaning that an interested party would only have to enumerate the local part to find all e-mail addresses from a specific domain used by Matrix users. In this respect, the lookup answers the question "Does this address exist?" and as such makes it public.

To clarify: the paper does not claim a list with email addresses is made public or anything of the sort. Only that they can be queried without restriction or authentication.

Once again, it's not about brute listing things. It's about knowing a 3PID from another source, like a dump of email/phone number on the darkweb which can then be used to query for a mapped Matrix ID. Or simply an email given for another purpose to the same server.

It is all fun and games until you start correlating data sets, like claudius points out correctly with other public lists.

> "Riot needs permission to access your address book contacts to find other Matrix users based on their email and phone numbers. Please allow access on the next pop-up to discover address book users reachable from Riot."

access != upload

This is the same wording how Facebook/LinkedIn/etc got our contact list.


> a hostile fork

What? Matrix is Free Software. There is no such thing as "hostile fork".

It's possible to have a fork where both sides of the fork remain compatible. A hostile fork will lose this property by one or both sides of the fork causing divergence on purpose (generally the incumbent side) to make it difficult for the other side (generally the parasitic side). This is certainly a real phenomenon, regardless of whether the software is free or proprietary.

> What? Matrix is Free Software. There is no such thing as "hostile fork".

Of course there is.

Does hashing really provide much extra privacy when looking up phone numbers or email addresses. Especially for phone numbers the entropy is tiny, it is trivial to precompute all hashes for all phone numbers in most countries.

it doesn't provide much extra privacy, given the rainbow tables are trivial to compute, as you say, which is why we haven't prioritised this historically. moxie wrote well about this at https://signal.org/blog/contact-discovery/.

however, it does provide some defence-in-depth against Identity Server inspecting the email & phone number details in plaintext, so we'll go sort it out as per https://github.com/matrix-org/matrix-doc/issues/2130

We have answered to your claims of "being alarmist", "disproportionate FUD" and that we did not forget to disclose we are working on a fork which is not hostile.


Forgive me if I'm mistaken, but:

* You could fully encrypt push notifications?

Push notifications these days don't contain any contents; they just tell the app to wake up and sync and display a local notification - so there's not even really anything to encrypt.

The document has been updated from feedback received all over the community, including new identified leaks and possible data correlations.

We encourage anyone who already read the initial version to check out the revisions of it for new content or re-visit the document.


Nice malware site

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact