In the article, mentioned Linux is mentioned as the underlying OS. Wonder what approach Ubicloud takes (if any) to have actual diversity in the software stack for the purpose of reliability and security. My assumption here being, that different OSes, while increasing the attack vector also make it more likely that the whole fleet is not susceptible to the same software problem or vulnerability at roughly the same time. Just something I started pondering about after seeing Hetzner, which is quite popular in the BSD land.
There's not many people/projects/companies that take this approach; so if they're not telling you about doing it, you can safely assume they aren't.
IMHO, it's a nice idea, but it at least doubles your system integration work, and the benefits are mostly hypothetical, unless you're willing and able to dynamically shift your infrastructure between OSes if one of them usually performs better but is susceptible to some DoS that's inbound today.
Because with schema-based sharding in Citus, the schemas are the sharding unit and the system can move them to new nodes being added to the cluster.
You can start with all your microservices sharing a single node, add nodes and have the storage layer distributed horizontally in a way transparent to the services themselves.
There no longer is an enterprise version, what runs on Azure is the exact same Citus that you can run yourself. We even invested in Patroni, to make it easier for the community to self-host Citus with HA setups.
While we obviously want people using Citus on Azure, having Citus as a viable open-source choice is our path to achieving that. I wasn't part of the company when the acquisition happened so can't speak to that, but I can imagine how that could have made sales at the transition time unclear.
Personally I would also like to add, that the team is full of long term open-source contributors. We contribute both to PostgreSQL and projects around it (like pgbouncer). I understand and respect your reservation, but wanted to share my perspective on it.
Schemas are groupings of tables and other entities that can be defined within a database. You can think of them like of a namespace in programming languages. You can have the same table definition, within the same database, defined multiple times (each in a different schema) and each holding different data.
By large and small we are referring to the amount of data each schema holds currently. They can grow over time and some of them may become very big while others will remain small (storage wise).
> You can have the same table definition, within the same database, defined multiple times (each in a different schema) and each holding different data.
So in this respect, each table within a schema indeed already acts like a "shard" of the overall table
Is this enforced? Like, if I create a table "messages" in schema A and a table "messages" in table B, must they have the same columns/column types, or is that just convention
Hi Nathanba, I am a technical program manager for the Citus extension at Microsoft.
> Citus also exists for Postgres but their docs basically tell you that it's only recommended for analytics
Could you point me at the docs that made you think that? We find Citus very good at multi-tenant SaaS apps (OLTP), IoT (HTAP) workloads and analytics (OLAP).
> And it sounds like all it's doing is a basic master-slave postgres setup with quite a few manual things you have to do to even benefit (manually altering tables to make them sharded/partitioned)
This is true for now. We are looking into ways to make onboarding easier. That said, the time spent on defining a good sharding model for your data often leads to very good perf characteristics. Regarding the architecture, I personally find Citus closer in spirit/design to what Vitess is doing. Additionally, every node in the cluster is able to take both writes and reads, so I don't see the parallel to a basic primary/secondary setup.
I'm looking at this page: https://www.citusdata.com/use-cases
"Multi-Tenant SaaS" to me very clearly sounds like there are features specific towards isolating the database into different customer domains ("tenants"). Which to me implies that this is sort of a scaling trick that isn't going to provide performance for a regular application where I can't partition my data like that.
> But with all that, as Joe Armstrong once jeered, no amount of type checking would catch the following bogus code:
It is worth to note that the stated problem, of accessing a file after it is closed was addressed in the Phd thesis by [1] Joe Armstrong in which he suggested a solution for exactly this problem by introducing a new testing & development methodology he named "protocol checkers" (9.1 Protocols, page 195). Relevant quote from the thesis:
> Given a protocol which is specified in a manner similar to the above
it is possible to write a simple “protocol checking” program which can be
placed between any pair of processes.
The protocol is a state machine, which would detect the attempt to write to a file after it was closed.
Developers are categorized as people with commit access to the project. So contributing patches itself is not enough to add entries to this specific file.
- Meetings are remote first. If at least one person is remote then the whole team video calls in. There is nothing worse than being a person on the virtual end trying to understand a mic grabbing voice from a room full of people.
- Schedule peer programming sessions, especially with less senior staff. Without the office environment they have less opportunities to grow.
- All meetings should have a note listing all decisions made.
- Promote async communication. Have people describe a problem instead of saying hi and waiting for a reply.
you should check out https://www.nohello.com/
Some folks at my workplace have been using this to explain their point of view to people who just send Hi/Hello
These "slack farts" really irk me. The first time someone does it, I usually respond with the nohello.com link and a brief note letting them know that the niceties aren't necessary - you can just ask your question.
I stopped using multiple browser for exactly that purpose. It actually increases your attack surface. While compartmentalization of browsers helps somewhat against tracking you now have to keep all of them updated. RCE bug in one of them already compromises everything you have on your machine, so having multiple browsers just means the attacker can pick from a wider array of targets.
I meant the attack surface of browsing sessions. Yes I agree, having multiple browsers increases the attack surface of the device, which is why I try to minimize the amount of apps installed. But browsers are an exception.