Hacker News new | past | comments | ask | show | jobs | submit login
Citus 6.2: Concurrent index creation and complex queries for multi-tenant dbs (citusdata.com)
87 points by sgrove on June 7, 2017 | hide | past | favorite | 16 comments

This is a pretty impressive set of new features and fixes (i.e. things that worked in Postgres, but not with CitusDB).

In particular, the added ability to run `CREATE INDEX CONCURRENTLY` [1] for distributed tables is an important addition. I've run a pretty big Postgres database before, and this feature is absolutely _critical_ for bringing new indexes online for large tables without affecting users or other operations (without it, `CREATE INDEX` needs a lock that blocks other modifications in the table).

Its reverse, `DROP INDEX CONCURRENTLY` is a relatively newer addition to Postgres and also pretty key. While running on pre-9.2 (when it was added), we had to stop dropping indexes from large tables because the operation would block long enough that we'd start timing out user requests while it was running.

[1] https://www.postgresql.org/docs/current/static/sql-createind...

I wonder how they'll compete with Cockroach down the road. Looks like both projects are doing great work around ACID in multi-tenant environments.

I'm really inclined to say that Citus is better suited for the vast majority of companies, since You Are Not Google [1]. For example, forcing everyone to synchronously replicate using Raft all the time seems like overkill, and incurs a huge latency cost.

Furthermore, I think Citus has a head start on being a mature project, since it builds upon the back of the already-mature Postgres for things like query planning.

[1] https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb

This is key: SERIAL on distribution column: we now use serial types exactly as in Postgres


Just a quick comment on this:

> Good software never stops evolving

I think I would translate this as: "our software is so large in scope, it must never stop evolving if it is to remain good enough."

Small software, with a single, focused goal, can evolve to a steady state which is still good.

Often we need to build things whose scope is too large for that, and must constantly churn. That's ok. But ideally you are finding parts of your work which can be sent in the opposite direction: towards God; or the oneness of all things.

If none of your work ever seems to find a path towards simplicity and oneness, it's possible you are pathologically mixing concerns, which, if true, is probably slowing you down.

That's kinda vague philosophizing about an actual project. It seems pretentious and not constructive.

Do you have a concrete criticism of Citus? Do you believe a database system can become your definition of "God", or are large database systems simply not necessary and could be replaced by a few tiny God-like pieces? If so, why do large database systems exist at all?

I have no criticism of Citus at all. Seems like a great project.

I don't think systems of large scope can necessarily be replaced by small scope ones. Just that the large project is probably suffering if smaller, more stable parts aren't sloughing off of it as part of your device process.

Citus people are putting SQL back into NoSQL! :) Awesome.

SQL has actually been back in NoSQL for a few years now.

Spark SQL, Presto, Vora for example allow you to write ANSI SQL and query NoSQL stores such as Cassandra, HBase, MongoDB etc.

And if you are are after SQL-Like well then you have Phoenix, Hive, CQL etc.

Do you guys have pricing anywhere on the site?

Not them, but here is the link for their cloud offering https://www.citusdata.com/pricing

For other enterprise solution would require you to contact them first.

hmm 200K/month for 20 HA nodes with 20TB and 4TB RAM. A bit pricy and 20TB is kindah low for max disk. A single PG instance on a hefty box can do 20TB.

It comes with more than just the database. The package includes, but not limited to:

  1. All enterprise features.
  2. All nodes being managed by Citus themselves. All the upgrades or rebalancing will be done by them.
Another option which should be cheaper is paying for the Enterprise license and manage it on your own hardware with your own setup.

Finally, you could also spin up the community version, which is free but lack important enterprise features such as rebalancing the shards when you add more nodes to the setup.

Thank you for all the info the use case we are running a setup with dedicated PG RDS instance for each client some of them are pushing the RDS limits of 6TB potentially looking to move to multi tenant setup as managing 100+ instances is not fun. The read load is fairly light it's rare that more than a few people are running queries against an instance. Would it be possible to consolidate using Citus (are there any advantages as far as ease of management etc?)

Craig from Citus here, if disk is your biggest bottleneck we do have some flexibility there and can go larger than 1 TB per node if needed. Or it may be that Citus isn't the best tool for you as well.

In general what we find is most Citus users are either memory constrained or constrained on processing power.

Offtopic, but the mention of elegant lightsabers reminds me of a scene from one of my favourite TV series, StarGate SG-1:


Money quote at 02:25:

    [O'Neill demonstrating a staff weapon]
    This is a weapon of terror. It's made to... intimidate the enemy.
    [O'Neill shows a P-90]
    ... is a weapon of _war_. It's made to _kill_ your enemy.
I.e. effective beats elegant :).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact