More

mulander · on May 27, 2024

In the article, mentioned Linux is mentioned as the underlying OS. Wonder what approach Ubicloud takes (if any) to have actual diversity in the software stack for the purpose of reliability and security. My assumption here being, that different OSes, while increasing the attack vector also make it more likely that the whole fleet is not susceptible to the same software problem or vulnerability at roughly the same time. Just something I started pondering about after seeing Hetzner, which is quite popular in the BSD land.

toast0 · on May 27, 2024

There's not many people/projects/companies that take this approach; so if they're not telling you about doing it, you can safely assume they aren't.

IMHO, it's a nice idea, but it at least doubles your system integration work, and the benefits are mostly hypothetical, unless you're willing and able to dynamically shift your infrastructure between OSes if one of them usually performs better but is susceptible to some DoS that's inbound today.

ec109685 · on May 27, 2024

Seems like the opposite is also true. Diversity of operating systems / approaches results in more chances of making a mistake.

__s · on May 27, 2024

The idea is to have something like biodiversity so one virus doesn't wash you. Software systems tend to trade redundancy away for efficiency

mulander · on July 20, 2023

Because with schema-based sharding in Citus, the schemas are the sharding unit and the system can move them to new nodes being added to the cluster.

You can start with all your microservices sharing a single node, add nodes and have the storage layer distributed horizontally in a way transparent to the services themselves.

potamic · on July 20, 2023

I see, so it's kinda paving the way for a db-as-a service solution?

mulander · on July 20, 2023

Hi, we open-sourced the project fully: https://www.citusdata.com/blog/2022/06/17/citus-11-goes-full...

There no longer is an enterprise version, what runs on Azure is the exact same Citus that you can run yourself. We even invested in Patroni, to make it easier for the community to self-host Citus with HA setups.

While we obviously want people using Citus on Azure, having Citus as a viable open-source choice is our path to achieving that. I wasn't part of the company when the acquisition happened so can't speak to that, but I can imagine how that could have made sales at the transition time unclear.

Personally I would also like to add, that the team is full of long term open-source contributors. We contribute both to PostgreSQL and projects around it (like pgbouncer). I understand and respect your reservation, but wanted to share my perspective on it.

mulander · on July 18, 2023

Great observation! :)

We worked upstream to have `search_path` properly handled (tracked per client) by pgbouncer.

https://github.com/pgbouncer/pgbouncer/commit/8c18fc4d213ad4...

Check config.md in that commit for a verbose, humanized description.

mulander · on July 18, 2023

Schemas are groupings of tables and other entities that can be defined within a database. You can think of them like of a namespace in programming languages. You can have the same table definition, within the same database, defined multiple times (each in a different schema) and each holding different data.

By large and small we are referring to the amount of data each schema holds currently. They can grow over time and some of them may become very big while others will remain small (storage wise).

pickledish · on July 18, 2023

Ah, interesting ok

> You can have the same table definition, within the same database, defined multiple times (each in a different schema) and each holding different data.

So in this respect, each table within a schema indeed already acts like a "shard" of the overall table

Is this enforced? Like, if I create a table "messages" in schema A and a table "messages" in table B, must they have the same columns/column types, or is that just convention

mulander · on July 18, 2023

They can be different, as long as your application can handle them being different (which makes sense for microservices as an example).

pickledish · on July 18, 2023

I see! Fascinating, thanks for the help :)

mulander · on April 22, 2023

Hi Nathanba, I am a technical program manager for the Citus extension at Microsoft.

> Citus also exists for Postgres but their docs basically tell you that it's only recommended for analytics

Could you point me at the docs that made you think that? We find Citus very good at multi-tenant SaaS apps (OLTP), IoT (HTAP) workloads and analytics (OLAP).

> And it sounds like all it's doing is a basic master-slave postgres setup with quite a few manual things you have to do to even benefit (manually altering tables to make them sharded/partitioned)

This is true for now. We are looking into ways to make onboarding easier. That said, the time spent on defining a good sharding model for your data often leads to very good perf characteristics. Regarding the architecture, I personally find Citus closer in spirit/design to what Vitess is doing. Additionally, every node in the cluster is able to take both writes and reads, so I don't see the parallel to a basic primary/secondary setup.

Nathanba · on April 23, 2023

I'm looking at this page: https://www.citusdata.com/use-cases "Multi-Tenant SaaS" to me very clearly sounds like there are features specific towards isolating the database into different customer domains ("tenants"). Which to me implies that this is sort of a scaling trick that isn't going to provide performance for a regular application where I can't partition my data like that.

mulander · on Nov 30, 2022

> But with all that, as Joe Armstrong once jeered, no amount of type checking would catch the following bogus code:

It is worth to note that the stated problem, of accessing a file after it is closed was addressed in the Phd thesis by [1] Joe Armstrong in which he suggested a solution for exactly this problem by introducing a new testing & development methodology he named "protocol checkers" (9.1 Protocols, page 195). Relevant quote from the thesis:

> Given a protocol which is specified in a manner similar to the above it is possible to write a simple “protocol checking” program which can be placed between any pair of processes.

The protocol is a state machine, which would detect the attempt to write to a file after it was closed.

[1] - https://erlang.org/download/armstrong_thesis_2003.pdf

sidpatil · on Nov 30, 2022

Another term for this is typestate analysis.

https://en.wikipedia.org/wiki/Typestate_analysis

mulander · on June 9, 2022

Developers are categorized as people with commit access to the project. So contributing patches itself is not enough to add entries to this specific file.

mulander · on April 8, 2022

- Meetings are remote first. If at least one person is remote then the whole team video calls in. There is nothing worse than being a person on the virtual end trying to understand a mic grabbing voice from a room full of people.

- Schedule peer programming sessions, especially with less senior staff. Without the office environment they have less opportunities to grow.

- All meetings should have a note listing all decisions made.

- Promote async communication. Have people describe a problem instead of saying hi and waiting for a reply.

sokoloff · on April 8, 2022

If someone sends me just “Hi” in slack, they’re still waiting to this day.

ferryman · on April 8, 2022

you should check out https://www.nohello.com/ Some folks at my workplace have been using this to explain their point of view to people who just send Hi/Hello

maxehmookau · on April 8, 2022

Or "Hey, quick question."

Ask the damn question then.

pards · on April 8, 2022

These "slack farts" really irk me. The first time someone does it, I usually respond with the nohello.com link and a brief note letting them know that the niceties aren't necessary - you can just ask your question.

whywhywhywhy · on April 8, 2022

>There is nothing worse than being a person on the virtual end trying to understand a mic grabbing voice from a room full of people.

I can assure you it’s just as frustrating having to slow an in person meeting down to remote pace because one person didn’t want to show up.

mulander · on Feb 9, 2022

> just that it minimizes the attack surface

I stopped using multiple browser for exactly that purpose. It actually increases your attack surface. While compartmentalization of browsers helps somewhat against tracking you now have to keep all of them updated. RCE bug in one of them already compromises everything you have on your machine, so having multiple browsers just means the attacker can pick from a wider array of targets.

favourable · on Feb 14, 2022

I meant the attack surface of browsing sessions. Yes I agree, having multiple browsers increases the attack surface of the device, which is why I try to minimize the amount of apps installed. But browsers are an exception.