
Notes on privacy and data collection of Matrix.org - maxidorius
https://gist.github.com/maxidorius/5736fd09c9194b7a6dc03b6b8d7220d0
======
pmlnr
XMPP self hoster here.

I've been revisiting moving to self hosted Matrix around every 4 months now
for 2 years, and every single time I failed.

The reasons vary; initially synapse refused to work, then I got stuck trying
to set up a multi-domain service.

That said, this document verifies what I feared in the background: what matrix
offers as self-hosted is too simple to be true, and thus it's no surprising I
never got it completely running.

XMPP has it's own issues, but when I self host it, it's there, nowhere else.
No identity servers, no push servers, no jitsi servers in the background.

It seems like I'm going to be with XMPP for a much longer time.

~~~
phicoh
My experience is that synapse is not harder than ejabberd (I don't have
experience with other XMPP servers)

Matrix tries to do a lot more than XMPP. In my experience, people find XMPP
too limiting, so they don't use it.

~~~
pmlnr
> Matrix tries to do a lot more than XMPP

It's not that simple. Many doesn't know about omemo, jingle, etc, when it
comes to xmpp. Or xmpp bridges like biboumi.

Matrix is is doing the same thing, but on differrent - and way more
complicated - infrastructure ideas.

Prosody is definitely much simpler to configure, even with multiple domains,
especially when per domain specific setting are needed.

~~~
phicoh
The problem with omemo, jingle, etc. is that you have makes sure that the XMPP
components you use support them. It would be fine if all popular
implementations support those features, but that's not the case.

So you can't just say, let's use XMPP. You have to be very specific and make
sure people use the right versions.

You can say that Prosody is easy. I don't find the following list easy:
[https://prosody.im/doc/modules](https://prosody.im/doc/modules)

And you probably need 'Component "conference.example.org" "muc"' for any kind
of 'room' support.

The the next question, does prosody have the equivent of federated rooms in
matrix. Here is a list of XMPP extensions in the documentation:
[https://www.prosody.im/doc/xeplist](https://www.prosody.im/doc/xeplist)

I guess the answer is, there are no federated rooms in prosody.

Another question is whether it is possible to send someone an XMPP message
when that person is offline. I have no idea how to search for that.

~~~
Boulth
No federated rooms. Offline messages are supported.

I guess newer Prosody supports this and much more out of the box but the
generic configuration instructions I used are here:
[https://serverfault.com/questions/835635/what-prosody-
module...](https://serverfault.com/questions/835635/what-prosody-modules-do-i-
need-to-support-conversations)

> The problem with omemo, jingle, etc. is that you have makes sure that the
> XMPP components you use support them. It would be fine if all popular
> implementations support those features, but that's not the case.

Isn't this also the case with Matrix that no implementations except the
official ones support E2E encryption?

~~~
Arathorn
> Isn't this also the case with Matrix that no implementations except the
> official ones support E2E encryption?

This isn't true. There are independent working E2E implementations in weechat-
matrix and pantalaimon (python-nio), nheko (mtxclient), and even a read-only
one in purple-matrix. Meanwhile lots of independent apps build on the official
SDKs (e.g. Seaglass on macOS, bots like Matrix-Recorder, the various Riot
forks, etc)

~~~
Boulth
Glad to hear the update, thanks!

------
meruru
I was pretty disappointed when I learned one of my contacts on Matrix was
receiving my messages on his Gmail. That's a feature I would rather the client
to not even have so people don't accidentally enable it. I hope it won't take
too long for a more privacy-oriented client than Riot to appear. The best
thing right now is MiniVector, which is a stripped down Riot fork with fewer
permissions required: [https://github.com/LiMium/mini-vector-
android](https://github.com/LiMium/mini-vector-android)

~~~
stevenicr
Interesting concern - I'm thinking I'd like the various clients to alert me
back - as a message sender that the messages were being auto forwarded to a
non secure place like gmail - of course I think the option to turn off
notifications should be possible on both ends.

~~~
meruru
That's a good idea. The important thing I think is to create a culture of
privacy-preservation around the whole Matrix ecosystem so that features like
that make it to the various clients.

------
pdkl95

        The following stack will be used as reference, with users
        connecting via web, desktop and smartphone clients:
    
            Client: Riot-web v1.2.1,
                    Riot Desktop v1.2.1,
                    Riot Android v0.9.1
    
            Server: Synapse v1.0.0
    

Version numbers are probably sufficient to in a general scientific setting.
They are usually a precise reference to a specific piece of software anyone
attempting to replicate the investigation should be able to find their own
copy of the software and have reasonable confidence their copy is identical.

Unfortunately, it might not be a good idea to trusting that a version number
consistently maps to a specific URL, or that a server will give the same file
to everyone each time they ask fo a URL. We know that sending different
versions to different people is common ("A/B testing"). If you're
investigating the security of something or worse: you suspect you might have
sentient opponents actively trying to deceive you, then version numbers are no
longer sufficient: you should also include _cryptographic checksums_! The only
way you can know that the file you received is the same is if you have e.g.
SHA-2 hashes as proof. Even better, if it's important, include the RIPEMD-160,
SHA-1, CRC32, and any other available hash/checksum because why not add
redundancy and give people options.

~~~
maxidorius
Totally fair point, thank you for bringing it up. Given the numerous build
types (source, pip, debian packages, etc), what would you suggest to do in
this case? Give the git commit hash maybe?

~~~
pdkl95
A traditional approach is to attach a checksum file with all of the relevant
packages, in the usual ${hash}sum output format (hashes truncated for HN-page
readability):

    
    
        $ sha256sum *
        e406bcc...51c199a  riot-android-0.9.1.tar.gz
        8020cc6...d6126c1  riot-v1.2.1.tar.gz
        443b612...51e0cef  synapse-1.0.0.tar.gz
    

> Given the numerous build types (source, pip, debian packages, etc)

In the interest of making a reproducible investigation, it might be a good
idea to include hashes for the specific packages being investigates.

> Give the git commit hash maybe?

That would probably work? This gets into the problem of _reproducible builds_
, where builds from different environments might not be identical. This means
documenting that you used "a build of version 1.2.1 git commit
7446799e4b0e3e65122f5642b5f3a8c59aae15bf" means something slightly different
than saying you used "riot-v1.2.1.tar.gz with SHA256
8020cc617367a4318be090b1562a26571f1a3417b0d4a52b2d4f19e03d6126c1". That said,
obviously having literally any hash to work from is _much_ better than using
version numbers alone.

Github links that include the commit hash might be useful, but it seems like
you cannot link to both a tag and a hash? I wonder if github supports links
that are a combination of [https://github.com/vector-im/riot-
web/releases/tag/v1.2.1](https://github.com/vector-im/riot-
web/releases/tag/v1.2.1) and [https://github.com/vector-im/riot-
web/commit/7446799e4b0e3e6...](https://github.com/vector-im/riot-
web/commit/7446799e4b0e3e65122f5642b5f3a8c59aae15bf) ?

~~~
maxidorius
After talking with the other contributors of the doc, we decided on not going
into further details.

We acknowledge the need for reproducible investigation, but the document did
not explain in a scientific manner how we reached such outcomes. We had to
draw a line to keep the document on point with our message. Adding hashes
wouldn't really make a significant difference.

We'll make sure to keep this in mind if we do write a follow-up with details
on reproducible checks thought. Thank you for your insight!

------
masterfooo
Thanks for the review. I feel like I gave them too much of a trust :(

Matrix devs, instead of battling the reviewer here, please make a proper blog
post and explain what is really going on here. Tell us the truth about your
data handling and the data retention.

The reviewer did his own share of work. If there are mistaken parts in his
reporting, please correctly explain them in a civilized way in a possible blog
post.

thanks

~~~
Arathorn
apologies if it seems like we’re battling the reviewer; it’s just that there
is a bunch of stuff which is simply incorrect, which is frustrating. did you
see our pdf response, out of interest?

~~~
masterfooo
Hi,

Yeah, the issue with all these replies is that the real replies that addresses
the issues can be lost. I really recommend that you put that on your official
blog so we all can benefit from reading your response.

~~~
Arathorn
yup, our plan is to fix the valid issues highlighted here and blog a response.

~~~
maxidorius
Or what about joining the room given on the research paper and actually
participating in the discussion? We would love to have some proper interaction
where we can exchange network output and screen recordings that simply show
first hand what we are talking about...

Replying to a blog to a living research document intended to be used by
another project/protocol feels silly. Whatever you write certainly is not all
up to date anymore with the corrections we made thanks to the Matrix community
that joined the room, and the Grid community that kept on discussion the doc.

Where is Matrix.org tho? This could also be a good occasion for the three new
guardians to join the community.

------
nfoz
This is a fantastic read. Thank you for investigating this and writing about
it in such a clear way.

~~~
thenaturalist
+1 for the thanks. I considered self hosted Matrix for a professionals
community, but your researched basically killed that path for me.

Much to improve.

~~~
Arathorn
Pretty much everything pointed out in the OP _only applies_ if your setup is
configured to use a 3rd party identity, integration server and notary
(trusted_key) server. So if you are selfhosting and you want to avoid using
3rd party servers, don’t use them!

Agreed that we should do better at presenting a max-privacy config preset and
explaining how identity/integ/notary servers work to users (without making the
UX unusable), but to throw away the whole project over this is throwing out
the baby with the bathwater, imo.

~~~
maxidorius
I believe you are missing the point of the document. It is not so much where
the leaks go, but that they happen when there is no consent or knowledge of
the transfer.

Example: for Scalar, the issue says that Riot talks "too much" to it. The
research is not about how many times Riot talks to it. It is that Riot talks
to it before the user explicitly requested the service, and in a way that the
user does not expect.

As we wrote in the paper: "Privacy protection is a mindset". It is not about
fixing individual issues and then have new ones pop up because the underlying
problem is not fixed. It is about having a process in place so it cannot
happen again.

------
olliej
Uhhh the default configuration harvests your contacts database, wtf?

~~~
Arathorn
No, Riot/Mobile explicitly warns and prompts you to opt in if you try to
discover contacts by email/phone number. It looks like this on Android:

"Riot needs permission to access your address book contacts to find other
Matrix users based on their email and phone numbers. Please allow access on
the next pop-up to discover address book users reachable from Riot."

That said, this analysis does have a few valid points in it, specifically:

* We should probably provide a click-thru when users interact with 3rd party identity lookup servers or integration managers

* We should hash contacts when doing bulk lookups

* Riot/Web has a bug where it talks to the integration manager too frequently ([https://github.com/vector-im/riot-web/issues/5846](https://github.com/vector-im/riot-web/issues/5846))

* Notary servers should eventually be removed entirely (as per MSC1228).

However, most of the rest of it is alarmist and disproportionate FUD, plus the
author has sadly forgotten to disclose that he's working on a hostile fork of
Matrix. A point by point response is at
[https://matrix.org/~matthew/Response_to_-
_Notes_on_privacy_a...](https://matrix.org/~matthew/Response_to_-
_Notes_on_privacy_and_data_collection_of_Matrix.pdf) fwiw (apologies for the
PDF, but Google Docs doesn't seem to expose a read-only view of commented
docs.)

~~~
majewsky
> "Riot needs permission to access your address book contacts to find other
> Matrix users based on their email and phone numbers. Please allow access on
> the next pop-up to discover address book users reachable from Riot."

I know very well that concise wording in UIs is incredibly hard, but this
wording is misleading. It makes it sound like Riot will not work unless I
allow access to the address book. I'd suggest something like this:

> Riot can check your address book to find other Matrix users based on their
> email and phone numbers. If you agree to sharing your address book for this
> purpose, please allow access on the next pop-up.

~~~
Arathorn
Yeah, that's way better - thanks. have pushed the change at
[https://github.com/vector-im/riot-
android/commit/786bf017852...](https://github.com/vector-im/riot-
android/commit/786bf017852fad184bf3b7b2a0e51eb20b6a87b6)

~~~
dharma1
10 minutes from a hacker news comment to a github commit. Respect

~~~
masterfooo
That is just a text change, very superficial

------
maxidorius
The document has been updated from feedback received all over the community,
including new identified leaks and possible data correlations.

We encourage anyone who already read the initial version to check out the
revisions of it for new content or re-visit the document.

