
Announcing Citus MX: Scaling out Postgres to over 500k writes per second - craigkerstiens
https://www.citusdata.com/blog/2016/09/22/announcing-citus-mx/
======
ozgune
(Disclaimer: Ozgun from Citus)

I'm not a video person, but when I watched this one, I was quite impressed. I
feel this may be one of those rare occasions where the blog post title can't
quite capture some of all parts in the video.

[https://youtu.be/JAPl4eNFxk4](https://youtu.be/JAPl4eNFxk4)

I think the really impressive part is what happens for you behind the covers.
MX sets up replication, auto-failover, backups, pgbouncer, monitoring
dashboard, etc. for you through the click of a button.

~~~
merb
this product seems interesting, especially for a small company with high
demands (scale out), however it sucks that you don't write any prices for your
cloud product on your page. I know that a lot of these prices will be created
with your customers but still If I just want to use it in the cloud via my AWS
Account it would be cool to have PPM shown anywhere. something like 1€ + AWS
Node for a cluster of X. Having a price upfront for Cloud products is actually
why we my company is using the cloud. It's just simple to get started without
any sales person. Just click, deploy, automate.

~~~
craigkerstiens
We have a few links to our pricing calculator in various places, but we'll
work to do a better job surfacing. You can find the pricing at:
[https://console.citusdata.com/pricing](https://console.citusdata.com/pricing)

~~~
merb
there is one thing missing, if it excludes or includes machine prices (I just
wondered cause I doubt that it includes them since I've never seen a per
memory pricing), but it's great.

Other things that are missing is, what happens if X goes down, but I guess
that's found in your documentation. and there is no pricing for additional
redundancy. Just from the video and the price toggler it seems that the data
is there two times (primary/secondary on HA) or am I wrong?

Edit: I forgot the price is amazing. Especially if everything is working like
it does in your video. I basically never seen a solution that scales PG that
easily and even outscales most other solutions and could reshard / rebalance
with a query / click. Just amazing, really.

Edit2: Pricing is: Extremly cheap.

~~~
craigkerstiens
Thanks for the feedback. This is fully inclusive of machine prices, backups,
everything.

For redundancy there's two things. First is for data durability, we archive
all updates to S3 every 16 MB or 60 seconds whichever comes first, using
WAL-E. Second, if you want higher availability you enable that. In that case,
we run a streaming replica for all nodes in the cluster, those nodes receive
updates as they happen. In the event of some failure (which we continuously
monitor for) we automatically fail you over to the standby.

~~~
koolba
> In the event of some failure (which we continuously monitor for) we
> automatically fail you over to the standby.

Is there a transparent load balancer in front of the servers to handle that or
would the master endpoint host change in the case of a failover? I noticed in
the video that the database URL connected to is a raw EC2 hostname. Can't tell
if it's a load balancer or the raw server though.

In either case I'm assuming any connections in a pool would have to be re-
opened (new TCP sockets) but updating the host for the DB is a bit more
involved (and would require an app reboot) v.s. just allowing the connection
pool to auto recover.

~~~
fdr
All network addresses of every node is kept the same for-ever, even as they
are replaced, via AWS Elastic IPs.

The burden on the client application is to timeout and re-connect only.

------
kod
I've been working on a PoC using Citus for the past month or so. Craig and the
devs there have been super responsive in making sure our technical needs get
met.

The core open source product already seems like the best sharded relational
database I'm aware of, and the attention they're putting into their cloud
product is really impressive (especially for the price).

------
marknadal
16K writes/second is very impressive (Disclaimer: I work on competing
products), from my experience of working on distributed database performance.
I've been watching various system's performance spec and this is quite good.

My only complaint is that most vendors report the SUM total writes/second
across all nodes which I think is a bit unfair for titles/linkbaiting.
Reporting single node writes/second is much more honest. Why? Because if they
had spun up 64 nodes rather than 32... or any arbitrary number, they could get
whatever number they want. That said, still very impressive.

~~~
wmfiv
Isn't linear scalability a hugely desirable feature? Especially for a SQL RDMS
like Postgres?

~~~
marknadal
Very desirable, but linear scalability is often a constraint of the business
logic not a fundamental property of the database.

For instance, if you are Twitter where everything is an append-only insert
then achieving linear scalability ought be easier than if you are Google Docs
where every update effects many other dependent records. Most benchmarks (and
this isn't a bad thing) assume the former.

------
rdtsc
> Citus has a built-in replication mechanism in which the leader node sends
> writes to all replicas. However, this requires that the leader is involved
> in every write to ensure linearizability. Fortunately, streaming replication
> provides us with an alternative that we use in Citus MX: Let PostgreSQL
> handle the replication, removing the need for a single leader.

So, how does Postgres handle it? That is, how does pushing that concern
further down the stack work? Does Postgres already have a component which
provides robust linearizability in a distributed context? Is it partition
tolerant as well?

~~~
fdr
Nothing that fancy: Citus has historically had its own replication method
where it writes N copies of the data, explicitly controlled by the coordinator
node. This imposes locking overheads and would make a multi-machine writing
implementation much more difficult and slower, if one were to implement the
synchronization necessary.

Citus MX does away with this and gets its redundancy through a series of
common-variety Postgres active/passive failover techniques. I plan to write a
blog post in the near-ish future on how this works in detail. With the
redundancy assumed, Citus merely has to bin a query to the correct single
machine servicing that sliver of data. After that, the concurrency
characteristics are exactly like regular Postgres.

What MX contributes is the metadata synchronization, allowing every node to
participate in originating a query or modification.

~~~
rdtsc
Ah ok. Thank you for explaining.

------
crudbug
Congratulations.

Q: Have you considered decoupling writes from reads workloads. Separate
logical clusters for write and read roles ?

The CQRS [0] model is really intuitive for web scale backends.

[0]
[http://martinfowler.com/bliki/CQRS.html](http://martinfowler.com/bliki/CQRS.html)

------
ap22213
A "32 node Citus Cloud cluster" sounds pretty expensive. I couldn't find
pricing info.

~~~
craigkerstiens
You can find pricing at
[https://console.citusdata.com/pricing](https://console.citusdata.com/pricing).

