Hacker News new | past | comments | ask | show | jobs | submit login

> issue

To be clear, it got solved, and it happened several months ago.

> the Python/Twisted impl is very heavyweight (but getting better)

Constantly near-saturating rpi2 cpu while doing effectively nothing last time I turned it on. (70%+ cpu time). This is why I'm thinking I'll wait for golang until I try again.

>Right now the protocol is relatively good

Couple questions:

- If I delete my server database, start anew, and join a room I used to be in, will I end up in a bad state, or are things more robust these days? I remember reading about it somewhere.

- Does the server _still_ solve a bunch of names and open hundreds of connections / server event interval or whatever it was called?




> If I delete my server database, start anew, and join a room I used to be in, will I end up in a bad state, or are things more robust these days? I remember reading about it somewhere.

It should end up (eventually) in a consistent state, but it can take a while to sync up again.

> Does the server _still_ solve a bunch of names and open hundreds of connections / server event interval or whatever it was called?

It's still full mesh, so whenever you send a message in a room, your server has to make a HTTPS hit to every other server which is participating in that room. In a massive room like Matrix HQ, this could mean 800 hits or so. It shouldn't do DNS every time, and it shouldn't open a new connection every time, but haven't checked the connpooling recently; hopefully it hasn't regressed.


>whenever you send a message in a room

Just to be clear, I saw this behavior without sending anything to any room. Just by being in the room.

If the server interval setting was 5 seconds, it'd literally do hundreds of connections every 5 seconds.


there is no such server interval setting, and there never has been? i can only assume that this was the retry schedule doing exponential backoff, trying to contact servers that are down. currently we don't have the concept of shared retry hints (as deciding whose hints to trust would be hard), so every server has to work out which servers are available in the mesh itself. After about 10 minutes it calms down.


homeserver.yaml

# The federation window size in milliseconds

#federation_rc_window_size: 1000

federation_rc_window_size: 60000

That's how high it needs to be set, apparently.


That doesn't control how aggressively the server connects out to other servers though - it limits how rapidly the server processes inbound requests; upping the window from 60s to 1s means that it will only process 10 requests from a given server in a 60s window (rather than 1s window) before deliberately falling behind. Interesting if changing it helped your problem; not sure how to interpret that.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: