reply
Now we still use Redis for reading the activity streams and as LRU cache for all sorts of data, but it is populated like all of our specialised slave-read systems (elasticsearch, etc) by replicating from the MySQL log.
Hope that helps!
For example you have 10 people in your organization with various permissions on repos. Some people (CTO let's say) can see every repo while others might only be able to see some repos. Or you might have consultants or open source projects which non-employees contribute to. Then you construct a graph where each node is an contributor that is connected to other contributors by the permissions they have on repos (or are the repos the nodes and the contributor permissions the connections?). Finally you run a graph partitioning algorithm where the number of partitions is the number of unique timelines you have to write for an organization. Thinking about an organization with closer to 500 contributors I can see how this could reduce the number of timelines by 30%.
Even if Redis was a better fit for some of their use cases, it just makes it much easier to not have the additional persistent database system to manage.
Interesting. Is there a comparison of overall performance between the intermediate design (w/ Redis) and what they ended up with?
Out of curiosity: how many users do you have, and what measurements are you using to determine scalability?
Disclaimer: I work at GitHub.
reply