Hacker News new | comments | show | ask | jobs | submit login
RethinkDB versus PostgreSQL: my personal experience (sagemath.com)
822 points by williamstein 225 days ago | hide | past | web | 327 comments | favorite

I appreciate the detailed analysis. A few comments:

> This post is probably going to make some people involved with RethinkDB very angry at me.

Actually, our community has always felt the opposite. Performance and scalability issues are considered bugs worth solving. That may have been the reaction of one or two community members, but that doesn't represent our values at all.

> A RethinkDB employee told me he thought I was their biggest user in terms of how hard I was pushing RethinkDB.

This may have been true (at the time) in terms of how SMC was using changefeeds, but RethinkDB is used in far more aggressive contexts. Here's a talk from Fidelity about how they used RethinkDB (for 25M customers across 25 nodes): https://www.youtube.com/watch?v=rm2zerSz6aE

SMC did seem to uncover a number of surprising bugs along the way: I would describe it as one of the more forward-thinking use cases that pushed the envelope of some of RethinkDB's newest features. This definitely came with lots of performance issues to solve along the way. I appreciate William’s tenacity and patience in helping us track down and fix these along the way.

> In particular, he pointed out this 2015 blog post, in which RethinkDB is consistently 5x-10x slower than MongoDB.

It’s worth pointing out that this particular blog post raised serious questions in its methodology, and recent versions of RethinkDB included very significant performance improvements: https://github.com/rethinkdb/rethinkdb/issues/4282

> Even then, the proxy nodes would often run at relatively high cpu usage. I never understood why.

I'd have to double-check with those who are far more familiar with RethinkDB's proxy mode, but it's because the nodes are parsing and processing queries as well, which can be CPU-intensive. They don't store any data, but if you use ReQL queries in a complex fashion (especially paired with changefeeds) it's going to require more CPU usage. We generally recommend that you run nodes with a lot of cores to take advantage of the parallelized architecture that RethinkDB has. This can get expensive if you aren't running dedicated hardware.

> The total disk space usage was an order of magnitude less (800GB versus 80GB).

RethinkDB doesn't yet have compression (https://github.com/rethinkdb/rethinkdb/issues/1396). Between this fact and running 1/3 the number of replicas, the reduced disk usage is not surprising.

> I imagine databases are similar. Using 10x more disk space means 10x more reading and writing to disk, and disk is (way more than) 10x slower than RAM…

This isn't necessarily true, especially with SSDs. RethinkDB's storage engine neatly divides its storage into extents that can be logically accessed in an efficient fashion. This is particularly valuable when running on SSDs, which are fundamentally parallelized devices. RethinkDB also caches data in memory as much as possible to avoid going to disk, but using more disk space doesn't immediately translate to lower performance.

One other interesting detail: since RethinkDB doesn’t have schemas, it stores the field names of each document individually. This is one of the trade-offs of not having a schema: even with compression, RethinkDB would use more space than Postgres for this reason. (This also impacts performance, since schemaless data is more complicated to parse and process.)

> Not listening to users is perhaps not the best approach to building quality software. [referring to microbenchmarks]

I think William may have misinterpreted the quote he describes from Slava’s post-mortem. Slava was referring to benchmarks that don’t affect the core performance of the database or production quality of the system, but may look better when you run micro-benchmarks: https://rethinkdb.com/blog/the-benchmark-youre-reading-is-pr...

We have always had an open development process on GitHub to collaboratively decide what features to build, and what their implementation should look like. I’m not certain what design choices William is suggesting we rejected. One has to only look at the proposal for dates and times in RethinkDB to see how this process and open conversation unfolds with our users: https://github.com/rethinkdb/rethinkdb/issues/977

> Really, what I love is the problems that RethinkDB solved, and where I believed RethinkDB could be 2-3 years from now if brilliant engineers like Daniel Mewes continued to work fulltime on the project.

RethinkDB development is proceeding after joining The Linux Foundation, despite the company shutdown. We believe that with a few years of work, RethinkDB will continue to mature as a database to reach Postgres’ level of stability and performance. We’re exploring options for funding dedicated developers long-term as an open-source project.

My thoughts: whatever technology you end up picking is going to have tradeoffs depending on your use case (and the maturity of the technology) and it's going to come with baggage. That's true of Postgres, MongoDB, RethinkDB, any programming language you choose, any tools you pick. If you're willing to carry that baggage it can be worth it: especially if it gives you developer velocity or if the problem you're solving is particularly well-suited to the tool.

Pick the technology that will have the least baggage for your problem. I often recommend Postgres to people, despite being one of the RethinkDB founders. Pragmatism wins over idealism, every time.

>It’s worth pointing out that this particular blog post raised serious questions in its methodology, and recent versions of RethinkDB included very significant performance improvements: https://github.com/rethinkdb/rethinkdb/issues/4282

I wouldn't even seriously consider that point - the article didn't even mention what version of MongoDB was being used. Safe mode writes could have been off, and he may have been just testing the latency between the client and database nodes. It's a pretty poor benchmark.

I think there are several useful points here:

1) It's rare to have enough insight into the internals of a particular datastore to accurately predict how it will perform on a particular workload. Whenever possible, early testing on production-scale workloads is essential for planning and proofs of concept.

2) Database capabilities are a moving target. E.g., the performance improvements to pgsql's LISTEN/NOTIFY are essential to its ability to handle this particular workload. In previous jobs, I've had coworkers cite failed experiences with 15-20 year-old databases as reasons for not considering them for new projects. Database tech has come a long way in that time.

3) Carefully-tuned RDBMSs are more capable than many tend to admit.

> Carefully-tuned RDBMSs are more capable than many tend to admit.

I'd generalize that further: RDBMSs are more capable than many tend to admit.

When it comes to a persistent data store, you've got to go out of your way to justify using something that isn't statically typed with firm ACID guarantees. I'm not saying those use cases don't exist, I'm saying most people don't have them.

"Saving" 15 minutes of dev time because you don't know what your schema is going to look like is going to cost you orders of magnitude more time down the road asking yourself that same question.

> "Saving" 15 minutes of dev time because you don't know what your schema is going to look like is going to cost you orders of magnitude more time down the road asking yourself that same question.

Yeah. In my experience, "schemaless" doesn't mean you have no schema. You still have a schema, it's just implicit and you don't have any tools available to actually operate on that schema.

This is pretty similar to the argument for static typing. Better to let the type system (help) prove correctness ahead of time, rather than wait until runtime to debug.

It's generally easier to fix an incorrect program than corrupt data.

You're not wrong, but the magnitude doesn't compare.

Yep. Everyone winds up with some war story about cleansing some multi-petabyte data store – but the better data engineers I know try very hard to avoid having two of them.

Yeah. I'm a huge fan of dynamically (yet strongly) typed languages like Ruby. They absolutely have their place. But weakly-typed data persistence or data interchange are absolutely terrible.

It's not just about type safety, though. A good RDBMS enforce things like referential integrity (FOREIGN KEY constraints) and allow you to express further constraints on your data (e.g. order.amount must be a positive number, a certain combination of columns must be unique across the table, etc.)

And most (all?) RDBMS that have been around for a while have been tuned and optimized to support this efficiently. I remember reading that at some point (1980s-1990s-ish) DBMS vendors were buying compiler developers like crazy to help them with query optimization and such.

Seriously, eventually the data have to be structure some how so you can make use of it.

Hence you end up writing map reduce down the road.

Mind as well do it upfront with Relational DB.

In almost all cases I've seen the schema is simply in the application layer as opposed to the database layer.

The argument for schemas within databases was when you had dozens of users and applications all connected to a database. In the last few years we've seen micro services, Kafka queues and REST APIs replace the database as the primary integration points.

Your schema might sit in code, but you've got to reinvent the wheel on all sorts of validations and guarantees your database would provide. Whether you have queues and APIs is irrelevant, your persistent store should be a strong one.

What sort of tooling does rethinkdb have for verifying the schema of existing data and inserts?

Soon, write hooks. Not the greatest solution though.

Worse is when the schema is written into the code. It turns tje codebase into a horrible mess.

I'd probably change this a bit by saying that it's not a problem to have the schema in the code (it's got to be somewhere, preferably version controlled), it's a problem when your schema reflects (or even worse, is) your business logic.

edit: s/you're/your/

Yes, exactly.

Schemaless design enables the organizations to collect data now and decide what to do later. Though analysts who work on these datasets should be able to write code to parse the records when needed.

> Though analysts who work on these datasets should be able to write code to parse the records when needed.

Right, so the data conforms to a known schema that they can program against. Schemaless design doesn't somehow make this possible, it actually makes it difficult. If you already know what data you're capturing, then you have a schema.

This seems to be a growing issue in the industry as we go more and more data driven.

I don't think most devs seem to realise that just throwing data into a data store, without constraints, integrity checks of any of that other boring stuff, actually makes the data useless.

Try doing analytics against data that can be anything. You can't. Mixing numbers with strings, duplicating fields or having them under varying naming strategies, all just mean you have lots of data that you cannot use.

One of the pet peeves I see data scientists having is lots of data, with no organisation, as they can't use it.

If you want to collect data now that's fine. Collect it in it's original format and label what that format is and where it came from. Then make a proper data store with a proper schema and work on importing the data from those sourcs when you have this foundation sorted out.

Schemaless imo is just kicking the can down the road.

You always know what kind of data it is anyway. You don't just dump random data, at least not the majority of the time. When you grab data you're grabbing a specific data group.

It's important to keep in mind that the "collect now and analyze later" is exactly that. You don't have to worry about constraints and integrity, you just capture what you think is of value, which is likely to be all the things unless you're working at really large scales. For all intents and purposes, storing everything as text files is just as good for this task.

> You don't have to worry about constraints and integrity

Sure, you don't have to specify or enforce constraints with a schema either. A schema just means the data has a well-defined structure, it doesn't mean you have to enforce constraints of any kind, though that often helps a lot. Data with no structure at all is very likely useless.

It's not always straight forward. A lot of protocols use delta encoding. For example "insert BUY GBPUSD, price=1.1702, volume=100000". But yes, it's worth coming up with a schema on write if at all possible.

But then you can't query schemaless data without sophisticated parsing or absurd wildcard/regex matching of some sort, which obviously is highly error-prone, thus making schemaless data at best only probabilistically useful.

For delta-encoded models, it still seems straightforward to define a general schema via (operation, objectId, value, transactionId) tuples. 'value' can be a string if you want maximum flexibility, but at least this way you can reliably query the operations, entities involved and which changes happened together as a group.

>But then [it sucks]

Absolutely. I agree with you, I'm just saying it's not trivial to turn a delta encoded format into a nice schema.

I mean, I would advise against storing deltas as tuples and considering that the schema. If it's at all possible, try to rectangularize your format. If you're using a column store a lot of the redundant data from the denormalisation can be compressed away.

If the rectangularisation is lossy then you may need to archive the raw data. This is the hurdle where a lot of people stop and choose schema-on-read since they can't afford the storage requirements for an archive of raw data and a nice analytic format for querying.

Offsetting database layout decisions is not having better flexibility, it is having less clue about database layout, unfortunately.

Collecting data in some form and re/de-normalizing it in some useful fashion isn't exactly huge problem if you're not NetFlix or Google (and isn't huge problem even then).

It just requires certain amount of education.

The problem with standard'ish SQL+RDBMS and their schemas is that types are not enforced well in SQL and most libraries and language integrations, thus we end up with the quagmire of static schemas that can not be reliably type checked when you use standard tooling.

Also lacking in many DB systems is integrated support for tables withh heterogenous schemas that is supported by page/row-level cersioning and/or on line schema alteration. Having to rewrite a huge table only because you want to narrow a couple field for future data, it gets old quickly.

> The problem with standard'ish SQL+RDBMS and their schemas is that types are not enforced well in SQL and most libraries and language integrations, thus we end up with the quagmire of static schemas that can not be reliably type checked when you use standard tooling.

This used to be an issue with MySQL (and not really any other major RDBMS implementation), but with STRICT_MODE and the default settings on other implementations, SQL is fantastic at representing and enforcing static types. Furthermore, ORMs and tools like Apache Spark have really upped the integration between types in the database and types in languages. Practically speaking, RDBMS is really the only way you can have sane static typing in most applications (esp. ones that are built on dynamic languages).

> Also lacking in many DB systems is integrated support for tables withh heterogenous schemas that is supported by page/row-level cersioning and/or on line schema alteration. Having to rewrite a huge table only because you want to narrow a couple field for future data, it gets old quickly.

This is just not true. All of the major commercial RDBMS systems support both online schema alteration AND flexible/semi-structured types (JSON, XML). Again, MySQL and Postgres are behind the curve on this (although they both now support JSON). Commercial systems like SQL Server, Oracle, and MemSQL all have both.

I'm not sure if I read your comment properly. But you said SQL Server has both? Both JSON and XML?

Yes SQL Server has json, but it's not real json support. It's an ntext column with some json functions... support for XML is great but the sql Server team refuses to support json properly because "we invested in xml and no one uses it"

PostgreSQL on the other hand has a real json type for a Long time with query and index support. As well as xml. In this regard, json support. PostgreSQL is actually ahead of all rdbms...

PostgreSQL may be behind in some other areas but not here. PostgreSQL has some of the best support for online schema changes (transactional DDL for almost everything, lots of work put into reducing lock level of schema changes) and was the first to implement support for JSON.

When doing the rewrite, I was extremely surprised by how good PostgreSQL's JSON support is; there are a large number of functions for querying nested structures, and the indexing is very powerful. I mapped my data from RethinkDB documents to mostly relational data, but kept a couple of columns as JSONB for now, which saved time and massively simplified the rewrite. I remember trying to fully model similar data back in 2007, and it got overly complicated for no real benefit...

(I'm the author of the blog post.)

Please try to alter a schema of a live multibillion rows table before pretending that it works. Unless they changed how is implemented recently then such an operation can take hours. I love postgres for many reasons so i would like to be proven wrong.

Some schema changes can take prohibitively long on large tables, so you do have to be careful. But there are ways of achieving the same results that are safe for production database. Braintree did a decent writeup on it. https://www.braintreepayments.com/blog/safe-operations-for-h...

I have never worked with a database of that size but I have done major refactoring of tables with about half a billion rows. And while it can take hours for the schema changes to finish it is hours without any downtime (or just a couple of seconds of downtime during when the exclusive locks need to be acquired). It is tedious to have to wait that long (especially if one has 10 times the rows than what we did), but PostgreSQL can do almost any change online just fine.

What kind of schema changes do you think of that would require that much downtime? Even in complex cases that can be avoided with triggers or even rules.

Depends on the schema change. If you have a multibillion row table there are plenty of other things you need to pay close attention to as well, as you're probably all-too aware.

Hours is okay, if it works.

Now, if you could just tell this to the people who depend on the software and tell me that hours of down time is not a possibility.

Hours of work <> Hours of downtime

The bigger the database the more complex it will be to implement schema change while keeping track of transactions that happen while your are transitioning. However I see no reason why it shouldn't be feasible given proper amount of preparation. And if short downtime is a goal I don't see why a proper amount of time developing a solution wouldn't make it either.

And on top of that PostgreSQL community is implementing new way to solve problem with each release. I'm not in a huge live-data use case so I might be wrong, but upcoming logical replication look promising for such use cases https://www.depesz.com/2017/02/07/waiting-for-postgresql-10-....

> online schema alteration AND flexible/semi-structured types (JSON, XML).

> Again, MySQL and Postgres are behind the curve on this

You may want to remove PostgreSQL from that list, especially in 2017.

PostgreSQL has supported XML for years (long before it supported JSON). And online schema changes are not only possible, but even transaction safe, which is especially nice for migrations.

> types are not enforced well in SQL

I'm not sure what you mean. Sane RDBMS do enforce types strictly.

> language integrations, thus we end up with the quagmire of static schemas that can not be reliably type checked when you use standard tooling

Languages that don't have static types to begin with maybe. > tables withh heterogenous schemas that is supported by page/row-level cersioning

Can you give an example of what you mean?

> on line schema alteration.

> Having to rewrite a huge table only because you want to narrow a couple field for future data, it gets old quickly.

Postgres doesn't need to lock the whole table for many types of alterations. It won't rewrite the table for null default fields.

I've never had such issues with postgres personally. I know mysql is awful for schema migrations.

This problem was definitely there for certain SQL servers when inserting incorrect values causes the server the silently cast the data to the field definition type. MySQL had something similar 10 years ago. I think in 2017 most of the SQL servers handle types properly. Adding support for more advanced types like UUID or custom types is a great thing and I see the use of it in production every day.

I dont really see the case for heterogenous schemas, I usually create a view of two very similar tables by creating a union with null fields. This works well enough.

After a long time with RDBMSs I used nosql at one point for my product in anger and I found the work to keep the data consistent just too stressful.

I'm happy to be back on Postgres again (and have been for a couple of years). It protects me from my own human inadequacies.

RDBMs may not scale as well as let's say something like Cassandra, but as long as you don't need the scaling properties of Cassandra they make your life so much easier.

I have a feeling that premature scaling is a version of premature optimization. You probably don't have big data. And you probably don't need a massively scalable, distributed system. So, don't try to do it. Just don't.

"Those who refuse to study history are condemned to repeat it"

People like Jim Gray, Mike Stonebraker, et al. were/are not dummies.

"Those who ignore Codd are condemned to learn Codd, forcefully".- Codd's Law

Adapted to DBs.

The simple fact that I know the data I read from the datastore is consistent with my rules, is enough for me to stay on RDBMS. I've encountered enough bugs and failures to not trust a schema enforced in my data layer code.

Posts like this are annoying and completely stupid.

NoSQL databases encapsulates a wide variety of data stores including Key Value (Riak), BigTable (Cassandra), Document (MongoDB), Time Series (InfluxDB), In Memory Grid (Ignite) and dozens and dozens of others that blur the already grey lines. Many of which are strongly consistent (negating your consistency point). So your experience with one does not at all translate to other data stores.

The equivalent of what you're saying is "PHP sucks therefore all programming languages suck".

I think the point of the post is valid: it can be hard to work with a schemaless datastore. I think Cassandra is the only one you listed that supports type enforcements.

NoSQL != Schemaless.

That mantra hasn't been true for a number of years now. Especially since the big trend has been exposing SQL layers in front of NoSQL databases. For example Phoenix in front of HBase.

And SQL as well as all know requires data types to be usable.

Just to explain - I was going to write mongo originally but then decided to go more general. I evidentially went too general with my terminology. I've used mongo since the early days and I've also worked on projects that have used everything from neo4j and redis to oracle as the primary data store.

There are a bunch of things that made me nervous about mongo as the primary db. Data consistency was the main one and, like the author of the post - migrating away from mongo was hard almost entirely because of data inconsistency that had developed over the course of a couple of years.

Querying was also harder on an ad-hoc basis. With a relational db there are generally just a couple of recommended ways of modelling your data. But once you do that, it's easy to query after the fact. Or to add new stuff. With mongo I felt like I was constantly making a trade off that I would have to accept at querying time.

MongoDB has been able to do this (optionally) since v3.2.

Isn't it more like saying all loosely typed languages suck? Your analogy is what's really completely stupid.

I'd agree entirely. Unless you have an exceptional reason, in which case you probably know what you're doing already, build a version of your system which uses either a relational database (with a sensible schema, not a single-table EAV thing) or a filesystem as a backing store first.

(When it comes to a primary data store, its first job is integrity, so denormalization is premature optimization.)

Similarly; don't distribute any computation you don't have to distribute. Just getting a bigger machine is remarkably likely to be cheaper, more operable and more reliable.

> ... or a filesystem as a backing store first.

File systems are far harder to use correctly compared to pretty much any database; suggestion: only use them for storing large blobs, and use collision-free names (eg. random UUIDv4). Many applications don't need that though.

Sometimes you just have large blobs, though! (I'm thinking, for instance, of machine-learned models; big matrices are exactly the kind of thing we have HDF5 and friends for.)

> "Saving" 15 minutes of dev time because you don't know what your schema is going to look like is going to cost you orders of magnitude more time.

While I agree with the spirit of this, sometimes you'll never know what your schema is, because it's not under your control to begin with. That's why there are triple-stores and "document" databases and key-value stores now, and why there were deductive databases 30 years ago.

The world is wide, and there's room for more than one tool in the chest.

Postgres is of course an excellent document store with JSONB.

Its really powerful being able to create relational views backed onto data as json and vice versa.

Exactly. The arrival of JSONB in Postgres killed most of my interest in other nosql stores. It's the best of both worlds.

This assumes that JSON is what you want to store and query. In my experience, about 1 year ago, I couldn't get basic things out from the Puppet database.

Even those use cases can sometimes be met by an RDBMS. Postgres can outperform Mongo as a key value, or json doc store. But Mongo is what many people think of for something like that.

And for those who may not know, but appreciate irony, Mongo now uses ... wait for it ... a somewhat stripped down, key-value-ish, embedded ... RDBMS as its underlying data store. Yes friends, "it's a dessert topping AND a floor wax"

should also point out that Mongo is more than a jsonb store, and postgres is, at least currently, less than a distributed, "scale-out" database. Mongo supports problems that postgres cannot currently address. period. (and I'm a huge postgres fanboi)

Could you share some of those problems? (genuinely interested)

Horizontal scalability. Outgrowing a single master database (writes being bottlenecked, not reads) is really painful. You can engineer around it, but spinning up another master in a multimaster DB is a really quick and easy fix (once you have tooling built out it's a matter of minutes). Sharding your Postgres DB is not a quick and easy fix. It will likely take about a month to engineer around and perform the data migration, if not longer.

I've heard good things about Citus for horizontally scaling Postgres.

I've pretty much only heard about it in the context of OLAP. If your data model requires transactions, you're gonna have a hard time when you outgrow a database.

PostgreSQL outperforms MongoDB how ?

Because as someone who has tried to update JSON documents (1M/s) in both PostgreSQL and MongoDB I would strongly disagree. MongoDB with WiredTiger was at least 10x faster per node and in terms of pure updates one of the fastest databases I've ever seen. And before the trolls yes I was fsyncing to disk.

Postgres doesn't fancy many small and frequent updates (like rapid counter increments).

It is workable, though, via combination of batching, ensuyring indexes on hot fields have appropriate fillfactor, and making sure Postgres can use heap-only tuples.

Here's my old SO question on this and a well-comprehensive answer that's still actual and could be helpful for some cases: http://stackoverflow.com/a/1663434/116546

This is pretty amazing. Now, would you mind showing me a situation when the raw insert performance is the single dimension when we are making a technical decision?

Hint: you probably can't, the reason why you store data is that you can read it (query it) later. MongoDB could be 10000x faster with inserts, still not a considerable solution for us because it fails with many of the other aspect of providing a great data access layer.

Raw insert performance is the main driver for tick databases persisting high frequency market data for HFT systems. The TimesTen DB, later bought by Oracle, was built for this scenario. Arthur Whitney's KDB too.

Not to mention the sub second latecy for querying 1B rows.

Not the single dimension but an important one. Any real time application processing events data (IoT, heartbeats, prices logs) would have a similar requirement. That's one reason why people uses queues, to try to workaround this. And as for the reads, it will drilldown to your query needs. I have done systems where the read requirement was a retrieval from a key and the writes request came in thousands per second.

Absolutely, I have run data ingestion pipelines before for event data processing, perfectly suitable for the use cases you mentioned too. I just dont see mongodb as a good fit here.

I think it depends on the workload. PostgreSQL does not not support updating documents in place but is really fast at writing and reading. Never done any benchmarking myself, this is just based on my general knowledge of what data structures and algorithms both databases implement.

Not worth much. MongoDB is a completely different beast after version 3.0 with WiredTiger being the default.

But I am sure that there are use cases for which PostgreSQL is going to be much faster in particular single field lookups on unindexed fields across all records.

My point is that it's never black & white to say Database A is faster than Database B. It's always use case dependent.

well there is also https://www.techempower.com/benchmarks/#section=data-r13&hw=...

Point being when someone says something is faster without providing verifiable numbers... it should not really be taken seriously

> I'm not saying those use cases don't exist, I'm saying most people don't have them.

Though I agree with you there is the problem that RDBMSs can't handle hierarchical data very well. This comes up in enterprise master data quite often, eg. in employee relationships or organizational structures. I'm no SQL wiz but I have yet to see a readable and well-performing SQL which can select the managerial line of an employee.

A PostgreSQL (recursive) WITH query [1] can deal with that:

  CREATE TABLE employee (
         id int primary key,
         name text,
         manager int references employee(id)

  INSERT INTO employee(id, name, manager)
  VALUES (1, 'jane', null),
         (2, 'john', 1),
         (3, 'jake', 1),
         (4, 'jeff', null),
         (5, 'jessica', 3);

  WITH RECURSIVE t(manager, managed) AS (
       -- Direct managers
       SELECT manager, id
       FROM employee
       WHERE manager IS NOT NULL
       UNION ALL
       -- Indirect managers
       SELECT employee.manager, t.managed
       FROM t, employee
       WHERE t.manager = employee.id
         AND employee.manager IS NOT NULL)
          (SELECT name FROM employee WHERE id = t.managed),
          array_agg(employee.name) AS chain_of_command
  FROM t, employee
  WHERE t.manager = employee.id
  GROUP BY managed;
Results of the query:

    name   | chain_of_command
   jessica | {jake,jane}
   jake    | {jane}
   john    | {jane}
  (3 rows)
You can argue about the readability (syntax highlighting would help), but to my eyes it isn't that bad (if you know how WITH queries work). It's also more concise than the corresponding query would be in many a procedural language.

The efficiency should be pretty acceptable too (instead of having direct pointers O(1) to follow the manager, you follow them indirectly through an index O(log N)).

[1] https://www.postgresql.org/docs/9.6/static/queries-with.html

> I have yet to see a readable and well-performing SQL which can select the managerial line of an employee.

This is not a simple problem - I'm not even sure how to map it mathematically to set theory. SQL is based on set theory, and does so very well. What you describe is a graph problem, which simply falls outside of what SQL was designed for. Either a RDBMS needs to be extented to handle data in graph form and operate on them acording to graph theory, or you should use a DB suited for this problem.

Point is, ideally you want both. Especially master data is closely entangled with data which you have in tables (and rightly so). In the example above consider an order header table which will have a reference to an employee (the one working on the order or the one who created the order). The schema for orders is rightly relational but I'd need a join to the employee which therefore is stored in a relational table as well.

I don't see how that scenario is anything but relational, even for queries, or are you talking in extension of the previous post about the managerial line? If so I agree, but instead of "poluting" the RDBMS with graph related constructs, I think the option here would be save the data in the RDBMS, and either build a graph on demand, or maintain an in memory graph, for queries. "Everything but the kitchen sink" are rarely a good idea for tools, it has a tendency to make the cost of switching some module unacceptable.

Recursive CTE can be used for this kind of queries. Or you can use nested sets / intervals depending on your workload and the tradeoffs you're ready to make.

Also, Postgres Devs reignited the benefit of "schemaless" dbs, which supports JSON as a first class deeply queriable type.

I think it's important to distinguish two cases. If mainly one app is the storage client, then you usually have the schema in the app, no problem with schemaless (most early phase startups).

If you have many different server apps accessing the database, then everyone has a different view on the schema which leads to high coordination overhead.

I will say this can go the other way in the hands of someone who doesn't understand what they are doing.

I recently started work at a startup where the codebase is running PHP/MYSQL which in itself is ok, but the database was horribly designed.

To deal with the tracking of "events" they constantly receive from devices that vary a lot from device to device it uses a linker table for every field that can be stored aside from ID. Considering the data never changes once inserted, it's really seems like overkill.

Any meaningful query that goes beyond getting a single datapoint involves dozens of joins which just kills the database. If someone just made him use mongo for the events storage we would probably be in a better position to fix the issues we are dealing with.

Postgres hstore fills the key value gap quite well too.

But there are document databases that have schemas, like couchDB.

So then it boils down to the cap theorem, and ease of ad-hoc querying vs ease of replication/scaling out. Which is more interesting.

Or do people try and push "not being able to validate your data" as some kind of feature?

I've been using couchdb to sync local and remote copies of a single database for a medium sized product and it is really convenient for that use case.

> 3) Carefully-tuned RDBMSs are more capable than many tend to admit.

So important. I have, in the past, worked with many people who simply thought NoSQL was new and RDMS was old therefore ALWAYS use new. Trying to get someone to pick an RDMS solution in said environment was impossible. The times I've had to take a document store and make it store relational data are more than I'd like to admit.

I have a pet theory that any given system has a maximum cleverness budget. Make sure to spend it on the things which actually matter in your problem domain.

Interesting data products often have to spend a lot of that cleverness budget in the algorithmic layer – machine learning and statistics. So when I work on that kind of thing, I've learned to have a real appreciation for boring data storage. I'll even happily use MySQL; the ways in which MySQL can suck are very well-understood and that's a hugely underappreciated virtue.

This 'cleverness budget' is a fantastic term. Thank you!

I've never worked with postgres before my current project and I'm so impressed with how easy it is to get stuff done. I used Cassandra for a prototype (bad choice for my workload, not Cassandra's fault) and documentation, ease of tuning, tooling maintenance all seem to be so much easier. Of course, Cassandra has a host of problems to solve that postgres doesn't -- being meant to run on clusters and all. But you know what, I don't care. I also was heavily leaning new stuff before. But what can I say, if an RDBMS is what I need -- I probably should use one.

And yet when I see data engineering positions, particularly around here, it's all about Hadoop/HDFS, Spark, Cassandra, Mongo. I'm not saying there aren't use cases for those, just that it seems like they've become the default basket of options rather than starting with an RDBMS and figuring out why you shouldn't use that.

I feel like a dinosaur working with Teradata all these years, but it hasn't failed me. Sure we spend some extra time modeling, and then some extra when we need changes, but other than the cost, it's really been solid.

> but other than the cost

Why do you think people are moving to Hadoop ?

That cost that you glossed over is massive. Teradata is very, very expensive. It's great. It works. But it's expensive. Especially if you want to store large amounts of data e.g. networks or web traffic.

Fair enough. I've been in large companies for too long, it seems. We'll pinch pennies over things like coffee and continuing education, but we're still paying those Teradata and SAS bills.

RDBMS are more capable in multiple aspects; may it be speed, may it be functionality.

I often see developers doing all kinds of things in the application layer; enforcing constraints, manipulating data, even sorting and filtering. It's like some folks are feared of writing SQL or don't trust their database.

Maybe they got spoiled by ORM mappers that anything beyond standard CRUD may feel cumbersome to write.

> > I often see developers doing all kinds of things in the application layer; enforcing constraints, manipulating data, even sorting and filtering. It's like some folks are feared of writing SQL or don't trust their database.

> Maybe they got spoiled by ORM mappers that anything beyond standard CRUD may feel cumbersome to write.

A decent ORM[0] will use the constraints expressed in the application to reflect those as constraints in the DB schema as well.

Why the duplication, you ask? For the same reason that many of the same constraints are imposed in the front-end code: Responsiveness.

Validation that happens in the front-end avoids a request/response to the server just to give the user feedback that their password isn't long enough.

Relationships and constraints expressed in the application can avoid querying the database just to get the same kind of feedback.

This duplication may seem useless, pointless, repetitive, and redundant, but not all accesses to the web server will be via a JS-enabled client, for example anyone accessing the application via an API.

[0] eg. SQLAlchemy: http://docs.sqlalchemy.org/en/latest/core/constraints.html

I think you're right about number 3. I have a friend who likes to do as much as possible with pgsql and I'm often surprised at how well it performs for the things he uses it for.

We store settings based on a hierarchy in the database using JSONB. Basically:

global -> country -> account -> user

PostgreSQL doesn't support deep merging of json docs, so I just wrote an aggregate function using plv8 to merge them. (the result from the query always results in ~10 records which is then grouped and merged so perf is always fast)


This doesn't include the javascript used for merging cos I just wanted to show the stub.

This means we can just pass the args to postgresql and return the settings for any given user.

We used to attempt to run 6 queries and merge in C# code when we did this in SQL Server.

I've deliberately avoided deep arrays for settings, so my jsonb settings merge is simply:

    SELECT COALESCE(companies.options, '{}'::jsonb) || COALESCE(groups.options, '{}'::jsonb) || COALESCE(users.options, '{}'::jsonb)
      FROM users
      JOIN companies ON users.company_id = companies.id
      LEFT JOIN groups ON users.group_id = groups.id
      WHERE users.id = ?
Where I need a hierarchy for key names I'm just prefixing.

I sit right on the knife edge of tossing all the jsonb options and replacing with an old-school normalized EAV model.

I also had to write JSONB deep merging as part of this RethinkDB --> PostgreSQL rewrite. Here's the code I wrote https://github.com/sagemathinc/smc/blob/ce594ff0574ce781bf78... That code constructs a single complicated query; I wonder whether I should write it as a stored function in the database instead. (I'm the author of the blog post.)

And yet i hear day after day, don't put business logic in the db, it does not scale ... it's like people are stuck in 2005

At one dev position I had, the DBA wrote stored procedures for everything I could possibly want to do with the database, and that was the only way I interacted with it period. I wrote a small wrapper around these stored procedures and the records they returned, and used it exclusively.

It was a much more pleasant experience than having no stored procedures and a full blown ORM, which IMO inevitably leads to sillyness like doing inner joins at the UI level.

> and a full blown ORM, which IMO inevitably leads to sillyness like doing inner joins at the UI level.

What ORM was this that made you do inner joins in the UI layer?

This sounds a lot like what Stephen Feuerstein and Tom Kyte advocate for Oracle environments.

I'm not sure if it's fair to read your parent as putting business logic in the db: it may just use Postgres whenever a datastore is required and Postgres is not a bad fit.

Deciding where business/application logic should reside is an important decision, and treating PostgreSQL as an application server in addition to a database does require a different way of approaching the tool. Of course, this is no different than any of the other application architecture choices that need to be made.

I am not sure i understand your comment also :) My point was that people (read web devs) treat databases (postgres) as only data stores, put in data, get data out, and they are so much more. Sure not anything goes into the db, but if it's a bit of logic that relates only to the data (who can access it, who can change it and in what way), most likely it belongs in the database. PS: but let's not derail this thread with debates on where business logic should go :)

I think I agree with you that there are some things that belong in the database. For example, anything I can do in the database to ensure data integrity (without unduly affecting performance), I'll put there (data types, constraints, triggers, functions which encapsulate transactions). Similarly with access privileges.

My initial comment was in response to these statements:

I have a friend who likes to do as much as possible with pgsql and I'm often surprised at how well it performs for the things he uses it for.

I understood you to read this as putting business logic in the database (indicated by the And yet) in your comment:

And yet i hear day after day, don't put business logic in the db

I read the initial comment by 'fapjacks as meaning using Postgres whenever appropriate, as opposed to doing everything one could with Postgres in Postgres.

Not a big deal at all. I saw a potential misunderstanding and attempted to clarify.

PS: Of course, business logic should go where I put it, right? ;)

You are right in both cases, yes i read that comment like you described and yes that is probably not what fapjacks meant.

As for the beginning of your comment, i am 100% with you. The problem is that the cases you describe (integrity,access,reports on data) are most of the time what the entire application does (we are all just doing CRUD and all that ...) and instead seeing that for what it is (let's call it DataLogic) people are all to often calling it Business logic, and the moment they put that label on it their brain automatically (after years of conditioning) does not even begin to consider the database as the appropriate place for that type of logic.

My problem for putting business logic in the db is not so much about performance but readability, custom sql scripts are very hard to maintain...

If you do not like pgsql syntax i get that, you can always use python or something but why would you say custom sql scripts are hard to maintain?

Isn't it harder to create well designed abstractions for business logic in SQL compared to a full-fledged backend application server?

I know SQL has stored procedures and the like, but is it correct that SQL is limited in how it can express domain logic, compared to languages that backend application servers are generally written in?

The point i was making was more in the direction of logic in the database, not specifically in pure SQL, it can be PgSql, Python ... does not matter. For the tasks that database side code should be used for (data consistency/access rights/reports on data) i would say they can express the domain logic for those tasks quite nicely, even better then in other languages. For example i can say "grant select on invoices to accountant" or i can add a constraint on a table like "CHECK (price > discounted_price)" or "CHECK (price > 0)". I do that in one place and i no longer have to worry about checking this (simple example of) domain logic anywhere else.

Most of the time this is what people call domain logic (consistency/access/reports) although i would think a better name for it would be "data logic", heard that somewhere and in a lot of the cases this is almost everything the application does

> I know SQL has stored procedures and the like, but is it correct that SQL is limited in how it can express domain logic, compared to languages that backend application servers are generally written in?

Procedural extensions aren't, really. I mean you can literally just use Python (pg/python) or Javascript (plv8) to write postgres functions.

What are some resource one could use to learn 3.)? I know the very basics of postgres since I am a web dev, but I like to learn more about fine tunning rdbms for better performance.

The PostgreSQL manual [1] is an incredible resource. The first 100 or so pages give you a lay of the land. If you read those you will be lightyears ahead of the average web dev. After that you can dig into other chapters/topics that interest you.

[1] https://www.postgresql.org/docs/manuals/

I always appreciated that Postgresql documents their differences from standard SQL. When I want to know the standard way I often revisit their documents, in addition to those of the database I'm using.

Even though I've never sat down and read it, I can say with confidence that the Postgres manual is one of the most impressive manuals I've ever seen. They could have published the manual as 2 books, and I would have paid money to read them.

I highly recommend the Arch Wiki as well. It is the best of its kind, there is.

That too is 95% generic - https://wiki.archlinux.org/

They published it as 3 books. Or, a third party did.

The more you know! It looks like they published the v9.0 reference manual in late 2010.

In case anyone is curious, you can order them here:

* Volume 1A - http://www.network-theory.co.uk/postgresql9/vol1a/

* Volume 1B - http://www.network-theory.co.uk/postgresql9/vol1b/

* Volume 2 - http://www.network-theory.co.uk/postgresql9/vol2/

* Volume 3 - http://www.network-theory.co.uk/postgresql9/vol3/

Replying to save, this is very useful on first pass.

You can also click the "favorite" link to save a comment. Once "favorited", they're accessible via your profile page.

If you're looking to learn a mental model of the internals, a coworker has pointed me at http://www.interdb.jp/pg/ and it's pretty good.

That's the spirit. Don't think about tuning the config, stock AWS RDS instances will give you all the things you need 99% of the time

IMO the main advantage for RethinkDB is its HA story. Last time I had to manage a PostgreSQL cluster (2012-2013) its HA story was pretty bad. It was limited to a master-slave(s) setup with manual failover and manual cluster rebuilding all dependent on incomplete 3rd party tools. Has PostgreSQL improved on this? A quick googling leads me to believe it hasn't and I'd only even consider it again if it were managed by a 3rd party (eg. aws rds).

Excellent point! My counter-argument to that would be: How many startups/SMEs embracing Rethink really need full four or five nines availability?

I'd expect the costs of running that HA stuff is an order of magnitude higher than the costs of even a few days of downtime p.a.. And migrating from Postgres to something with HA once needed is probably easier than migrating to Postgres if costs are killing you.

Yes PG 9+ have greatly improved replication.

Yes AWS does it for you.

This was with 9.x using the replication system. It was certainly better than the old wal forwarrding systems, but it still was a giant PITA.

It surely has improved (replication and failover are much easier now, than they used to be), but still not anywhere close to "just give other node's address and we're good". And that BDR thingy for master-master stuff is still an unofficial fork (although, as I get it, it's from one of main Postgres contributors, so it's really close)

It has improved quite a bit since then (in 9.2 or 9.3 when you last used it), but it is still not all the way there. Especially rebuilding the cluster is much easier now.

If you want a managed solution RedShift supports PGSQL syntax and connectors now I think. That's a fully managed turn-key DB, but of course it locks you into AWS.

Not meaning to be an ass, but this is really bad advice.

Redshift is not supposed to be used like Postgres. Redshift is a data warehousing solution, with completely different tradeoffs, and accommodating completely different work loads than your average Postgres database. For example, you can't create indices on Redshift tables or relationships between them, the consistency story is completely different from Postgres and Redshift is optimized for bulk loads, not millions of discrete inserts/updates a second.

Amazon RDS does provide hosted Postgres, and that is what you want to use if you want managed Postgres. Or you can use Herokus's hosted Postgres (it does get expensive with size, though.)

Postgres and Redshift should not be considered equivalent. It's true that Redshift originated as a fork of Postgres 8, but Postgres has come a _long_ way since then and is quite different.

On top of that, Redshift is columnar database, which is an entirely different animal than vanilla Postgres (or any other DBMS).

Ref: http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-...


Redshift isn't highly available (if a node goes down you have to wait for it to restore a new one, which takes hours) and only has a small subset of PostgreSQL features. It is really only intended for OLAP (analytics) processing.

Do I understand correctly that the author went from a distributed database to a single-master scenario? That's a valid tradeoff, but I'd clearly describe it as such.

My experiences with RethinkDB have been rather positive, but my load is nowhere near that of what the article describes. I agree that ReQL could be improved, I found that there are too many limitations in chaining once you start using it for more complex things.

But the two most important advantages remain for me:

* changefeeds (they work for me),

* a distributed database that I can run on multiple nodes.

I do agree that PostgreSQL is fantastic and that SQL is a fine tool. In my case the above points were the only reasons why I did not use PostgreSQL.

EDIT: after thinking about this for a while, I wonder if the RethinkDB changefeed scenario is doable with the tools in PostgreSQL: get initial user data, then get all subsequent changes to that data, with no race conditions. Many workloads seem to concentrate on twitter-like functionality, where the is no clear concept of a change stream beginning and races do not matter.

You say that you did not have such loads, so why did you need multiple nodes then? Why the complexity when databases like Postgres can do 1.5M queries per second on a single box? StackOverflow is running just fine on basically one big mssql box (the other is just a standby replica, if i remember correctly)

Two reasons:

* I expect a larger load in the future,

* I want multiple nodes not just for speed, but mostly for data replication.

* you are solving problems you don't have * you can have replicas with postgres also

even if you do end up getting the load you are hoping for (though i doubt you will have 1.5 million qps), reads are easily scalable with replicas, and writes are also possible with sharding or tools from CitusData

This is precisely the very painful lesson I learned - don't solve problems you don't have...

(Blog author)

He/she was also not having problems with RethinkDB, so why "solve" that by using Postgres?

Disclaimer: I'm also happy running RethinkDB in a cluster of 3. It has been rock solid stable for over 2 years.

I don't have of the top of my head other examples but i know PG is used by yandex.ru which is like the google of russia, and they migrated from oracle and they are very happy, so if it works for them i think it will work for you :)

Don't they use clickhouse instead?

they probably use a lot of tools for a lot of reasons but they do use postgres



It's much easier to write in a new database in the future, than it is to deal with greater complexity initially.

I find that in the first phases, I often don't know what kind of load is going to be where, so I simply use a relational database since they give excelent data security, and okayish performance for most things under low load.

I think the inverse is actually the case here. If you need changefeeds, use Rethink up front rather than write your own implementation. When you hit a problem with its performance, then think real hard about your domain and see if it's possible to replicate the safety guarantees it gives you on another DB.

Unless you have a workload that won't fit on a single machine, a distributed database isn't really an advantage over a single master with one or more appropriate replicas that are available for failover.

I strongly suspect that the author has ignored race conditions in his changefeeds implementation.

Atomic changefeeds was the thing that made me start using RethinkDB. I'd been excited about it before then but only started using it for production workloads once that feature shipped.

They'd be possible to implement in Postgres with some kind of monotonic counter that increases on every update of every row in the table. That would be pretty expensive though.

Atomic changefeeds were a big addition, but resumable changefeeds are still a big gap. Something like what Kafka has. You can emulate this somewhat with atomic changefeeds and updating documents to mark them as "processed".

> EDIT: after thinking about this for a while, I wonder if the RethinkDB changefeed scenario is doable with the tools in PostgreSQL: get initial user data, then get all subsequent changes to that data, with no race conditions.

It's easier if you do things in the other order, and can make assumptions about the structure of your data. In my implementation, I have to first start listening for changes, then do the initial query (relevant code: https://github.com/sagemathinc/smc/blob/ce594ff0574ce781bf78...).

As you say, race conditions aren't an issue for some applications. For me, (slightly simplifying) the main table that involves changefeeds is a table of (timestamp, patch) pairs, and for it, race conditions aren't an issue -- you get the data in whatever order you want and merge it on the client. I'm definitely making no claims to have implemented changefeeds for general PostgreSQL queries or general data.

I am concerned that there is an edge case with one of the tables where there is a race condition that causes trouble, in which case I'll have to explicitly change the schema in order to account for this. Somebody below writes "I strongly suspect that the author has ignored race conditions in his changefeeds implementation."; that's not exactly true, since I'm worried about them and try to structure my data to account for them.

>Everything is an order of magnitude more efficient using PostgreSQL than it was with RethinkDB.

A large part of the sales pitch of "NoSQL" was that traditional RDBMSs couldn't handle "webscale" loads, whatever that meant.

Yet somehow, we continue to see PostgreSQL beating Mongo, Rethink, and other trendy "NoSQL" upstarts at performance, one of the primary advantages they're supposed to have over it.

Let's be frank. The only reason "NoSQL" exists at all is 20-something hipster programmers being too lazy to learn SQL (let alone relational theory), and ageism--not just against older programmers, but against older technology itself, no matter how fast, powerful, stable, and well-tested it may be.

After all, PostgreSQL is "old," having its roots in the Berkeley Ingress project three decades ago. Clearly, something hacked together by a cadre of OSX-using, JSON-slinging hipster programmers MUST be better, right? Nevermind that "NoSQL" itself is technically even older, with "NoSQL" systems like IBM's IMS dating back the 1960s: https://en.wikipedia.org/wiki/IBM_Information_Management_Sys...

"NoSQL" has it's use case just like traditional DBs do. The problem is that most "NoSQL" provider try to sell you their DB for everything when a good old SQL DB might be better.

For example, I worked on a IoT project for an R&D company and Riak was definitely a better choice than a SQL database for our usecase : Easily distributed, handle large very large amount of small write and small amount of large read. It was also a necessity because we could generate several gigabyte per minutes and being able to add Riak node to add more storage space and bandwidth dynamically was very useful.

But yeah, SQL DBs are like a good toolbox : They can be used in most situation and will work fine. But "NoSQL" DBs are usually more of a dedicated tool for a specific use : It's better for this specific case but might not do well for other tasks.

Thanks... I came here to say this, and I actually said something similar yesterday and my comment got downvoted (which I couldn't care less), but to me the conclusion is, check your requirements and use the proper tool.

You might have a point about Postgres being faster than whatever NoSQL database, but there's really no need for all the name calling. Especially the "lazy" is very uncalled for in my opinion - there's just too much to learn, and too little time. If you're a 20-something (or whatever age, for that matter) and have to maintain the entire system of a single company, you're simply not going to be able to be a master of every part of the stack, or even know where you're making the wrong choices.

>Especially the "lazy" is very uncalled for in my opinion - there's just too much to learn, and too little time.

How is that a valid justification for reinventing the wheel, and poorly at that? In the end they're having to come around and learn SQL anyway, judging by all the Mongo hate and "why we switched to Postgres" posts I see here, so they gained nothing by avoiding it.

>or even know where you're making the wrong choices.

They could try actually listening to older developers for once and trusting established solutions like PostgreSQL instead of chasing after everything new and shinny that catches their eye.

> How is that a valid justification for reinventing the wheel, and poorly at that?

I'm not talking about the people reinventing the wheel, I'm talking about the people using the reinvented wheels and not realising that it's a reinvented wheel.

> try actually listening to older developers

This is harder than you make it sound. On the internet, it's hard to know who to listen to, and in real life, it can be hard to find older developers working on something relevant to you, and even then their skill levels differ. And then they should even want to bother with holding your hand while you're setting up your first database, making sure you don't botch up the performance by not doing things that come natural for someone who's been working with PostgreSQL for years and years.

If there's too much to learn and little time, you learn the stuff that computer science got right. Some abstractions are so good that you should prioritize learning them above any technology of the day:

- Relational algebra

- The network stack model (OSI or IP; it is irrelevant)

- Compiler theory

- Category theory

You'll never regret the time spent on these.

so if there isn't even enough time to learn practical but sub-optimal solutions, a dev should instead absorb books about IT, mathematics and programming theory?

great advice...unless any part of your equation includes receiving a paycheck.

Don't make it sound harder than it is. Each of these is 30-40 hours of work for undergraduate students. You don't have to quit your job and join a monastery to study compilers. It's interesting, challenging and fun for those of us into computing; and it pays back the time spent learning.

So now I have a list of things I should learn according to a random person on the internet, that no one in real-life ever gave me before. How am I to know that this list is the one? Why not every similar list that has ever been given to me?

And how am I going to develop the front-end of our new website after learning everything you named? There'll still be a lot left to learn before I can even hope to have produced something.


If people don't have time to learn these things, they should open up their culture and hire and advance the people who do.

Dilettantism is a kind of careerist land grab -- and the effect is to displace skilled people and deliver worse experiences to consumers.

> The only reason "NoSQL" exists at all is 20-something hipster programmers being too lazy to learn SQL

For me as a 20-something swe who claims to be somehow proficient in SQL and NoSQL, I find it way more difficult to get the NoSQL datamodel right. With traditional SQL, you can basically do whatever you want with a model that looks however it wants. But if I have a, eg Cassandra model and a new requirement pops up, I cannot just do some joining and satisfy that.

While there is some truth in this, I think you are missing the context.

NoSQL first became popular in 2008/09, when web traffic was exploding. In those days 500GB disks were the largest available and were very expensive in server form. CPUs were less powerful, and RAM was much smaller.

All this meant that many sites really were running into the limits a single server database. The most common solution back then was master slave MySQL replication, which had a whole set of problems of its own. Don't forget Postgres replication was pretty rudimentary, and back then MySQL really had a performance edge if you could compromise things like transactional integrity(!) - which many did.

Things like Hadoop solved the reliability issues with distributed MySQL, MongoDB (and CouchDB) tackled the horrible developer ergonomics and Redis tackled performance.

In that context they all made a lot of sense.

Now of course buying two big servers with a heap of RAM and storage and putting Postgres on them with replication is pretty easy.

RAID has been common in servers since much earlier than 2008. In the 2000 dot-com boom servers had RAID & often multiple CPUs.

(for the less well f unded, Linux software RAID has also been pretty good since ca. 2000)


But RAID had its own set of problems. Because RAID relies on multiple physical disks (often SAS for performance) there were serious limits on the amount you could get in. This was prior to SSDs being widespread of course.

Here's a review of a typical 1U server from 2008:

The X4150 can handle up to eight 2.5-inch SAS drives (mine has four 10K drives, 72GB each), 64GB of RAM (mine has 16GB), four Gigabit Ethernet interfaces, and three PCIe slots. This puts the X4150 ahead of the mainstream server pack, as the main contenders in this space generally offer a maximum of 32GB of RAM, and between four and six local disks.[1]

Assuming RAID, this had 144GB of storage, and even back then that was problematic. RAM was very small too, so keeping your DB in memory (or even a working set, or even the indexes) was difficult with a single server.

And of course, yes you could go to 2U or bigger. But back then server space was expensive.

Everything is a trade off of course. But my point is that in the context of 2008 hardware looking to do things differently made a lot of sense.

[1] http://www.pcworld.idg.com.au/review/sun_microsystems/sun_fi...

>Now of course buying two big servers with a heap of RAM and storage and putting Postgres on them with replication is pretty easy.

Is it really ? I really have no idea since I haven't played with PG in a couple of years but setting up clustering used to be a PITA with it so if it got better that sounds great. One of the impressive things I saw from RethinkDB is that you can set up sharing and replication with a few steps.

Yes, it is much better than it used to be. See https://wiki.postgresql.org/wiki/Replication,_Clustering,_an...

We did a hot-standby setup without much trouble at all.

(And also - thanks for making my point about why NoSQL happened for me)

Oh I don't disagree at all btw., I've been doing MS SQL server and frontend professional recently so I haven't been in the loop about OSS DB, glad to hear things improved.

I think the NoSQL popularity and appeal was largely due to the fit with the startup/rapid-iteration mentality. The lack of rigid schema promised faster product iteration with less friction compared to traditional RDBMS with strict schemas and typing.

Isn't the opposite true, though? SQL offers you a lot of flexibility, whereas NoSQL follows a rigid data model.

Oh indeed, most RDBMSs offer great flexibility. They just tend to require a lot of configuration work. My impression of the NoSQL wave was "we don't know and we don't care; just throw it all in there and let search sort it out."

In fact, that mentality originated with Google (and it's entirely a bad way of doing things). Managing, categorizing, classifying, etc. in non-trivial cases may be an effort of relative futility, constantly chasing a moving target.

A simple key/value or document store just lets you start storing before you may actually know what you're storing or what the patterns and structures are (or even what you might want to do with the data later). With RDBMS, your designs can change quite a lot based on these requirements or intended uses.

> Yet somehow, we continue to see PostgreSQL beating Mongo, Rethink, and other trendy "NoSQL" upstarts at performance, one of the primary advantages they're supposed to have over it.

Yeah, pg will beat the pants off of pretty much any NOSQL DB on a single node, but what happens when you need master-master replication of sharded data in each datacenter as well as between datacenters?

Sure, denormalize and enforce data integrity at the application layer, shard the data, use queues for data updates to ensure every data-center gets all the data (eventually), and so on.

Now you're essentially treating the RDBMS as an eventually consistent NOSQL data store, horizontal scaling within a DC is still going to be annoying once you can't scale the shards vertically anymore, interruptions of network connectivity (or just increased latency) between DCs create huge headaches, and the (right) NOSQL DBs will beat the pants off of it for an equivalent hardware or hosting budget.

That said, NOSQL datastores solve problems that your startup will only encounter once you achieve product-market fit and enter a hypergrowth phase. Your MVP does not need an eventually-consistent distributed NOSQL backend, dammit.

Heck, your MVP may not even need PostGres or MySQL either. Just use SQLite, back up your server, and redeploy on pg only when (if) you start getting some traction.

It's fair to point out that young people will lack experience and understanding that older people have but there is such a thing as ageism in the opposite direction -- against younger people -- and your post comes across as such.

I am closer to 30 than 20 but I was closer to 20 than 30 when I was first introduced to NoSQL.

I enjoyed using PostgreSQL prior to NoSQL and I continue to enjoy PostgreSQL.

What got me exited about NoSQL when I first heard about it had nothing to do with SQL being "old". If I thought that "oldness" was a disadvantage -- which I don't, because it's not -- I wouldn't be using an operating system that has it's roots in the 70's and my workflow would not have been command-line-centric.

What got me exited about NoSQL was:

- The prospect of serialisation and deserialisation of data without having to rely on an Object-Relational Mapper. The relational model is great for the right kind of data, but the reason we have ORMs is that we are trying to cram data into a model that doesn't quite fit it. This has worked to varying degrees of success or failure. When NoSQL was introduced, PostgreSQL did not yet have native support for JSON, and having fields hold stringified JSON meant that you wouldn't be able to query that data using SQL, so having stringified JSON in a relational DB is not optimal. PostgreSQL has since gotten native support for JSON but here I am talking about when NoSQL was introduced.

- Map-Reduce. If it's a good fit for Google, I should investigate it to understand how to use it and to see if it had a use for me. Even if it would turn out to not bring any advantages, I would have learned it and I'd know to re-investigate it in the future if I needed something like it in order to scale a project that got a lot of traffic.

- Document-oriented instead of tables makes it possible to model data without having a strict schema.

Certainly someone with more experience might have been able to tell why one or more of these were actually not such a good reason, but MY POINT IS that I had actual reasons for being excited about NoSQL and none of my reasons had anything to do with hipsterism, ageism or anti-oldness to do. This is why I think your post is ageist, because rather than consider that we might have well-thought-out reasons to want to learn about NoSQL, you automatically jumped to the simple, overly broad, and frankly outright offensive, explanation that you did.

A lot of that is true, but to be sure there are use cases for NoSQL--they are simply too often misunderstood or abused, being used when a traditional RDBMS would have worked fine. Elastic/horizontal scaling for certain types of data that are amenable to it (i.e. event logs, feeds, append only type stuff): its going to be hard to keep up with something like Cassandra there, where you can pretty much add servers on the fly to pick up load. In memory stuff for sure, for a distributed volatile in-memory key store. Simple flexibility you get going schema-less: if you have an embedded DB/on-premise application, you can't just get into your customer's environment and perform a migration; having a flexible schema can make patch updates easier.

> Mongo

MongoDB doesn't scale. Short story, only one node in a replication group can accept read/write.

You're not going anywhere if you compare SQL databases to the worst of the NoSQL database that ever existed.

Go try actual databases NoSQL that doesn't suck: cassandra, riak, elasticsearch, dynamodb, redshift, bigquery.

One important exception about NoSQL's raison d'être. If all you need is a key-value store without any fancy requirements then NoSQL^W pre-relational databases are perfectly good choices.

(Well, *dbm and memcachedb et al sorta pre-date that NoSQL-is-webscale database boom you're writing about.)

Not every web app is webscale. From all the evidence I saw SMC is tiny in webscale terms - 80GB of data or 600 open connections are not what NoSQL databases were meant to solve.

Using this anicdote to say that postgres can handle webscale is ludicrous.

My biggest problem with SQL for systems with web interfaces is that it forces you onto OLTP, when your workflow is just "select data, show to the user" -> "user saves changes, save them in the db". Googling for "asp.net deadlock" gives 160k results.

You should have a look at PostgREST, still in development but look like a promising missing link for you use case.

I probably have not worked enough with ASP.NET but I do not get that reference at all. In my experience deadlocks in databases are rare and do not happen on simple databases.

Most asp.net workshops rely on SQL Server, which up to version from 2012 did not support MVCC and relied on locking on writes and reads. The problem is, it had problems with web-kind load. If something was writing into a table X, and there would be a parallel query for table X join... (especially including index seek and not just lookups), you'd get into territory of heavy locking. Add ORM to that, which wouldn't let you influence the order of the joined tables and you've just got yourself a deadlock. This happened on a few occasions in a few different projects I've worked on. The real problem is: there is lot of technicalities you need to know and understand, to use SQL without problems and in powerful way. Some NoSQL solutions (including MongoDb) try to abstract away those technicalities, and that was the point of my comment.

lol "lets be frank".... have you ever worked on a large scale project youself? how did running that on a singly homed postgres work out for you?

your comment seriously sounds like "I haven't ever seen anything with non-trivial scale yet, so I'm sure everybody working on it must be lazy/stupid/whatever."

and everbody is upvoting them. gotta love hacker news cargo cult.

Thanks for taking the time to write this up, William.

This is a great read, even if only as a helpful "this is how I did something hard, and how it turned out" kind of hacker story.

And I could see this being quite relevant to some ideas I have for a multi-user semi-real-time cooperative-editing web app.

How far has postgres come w.r.t. setting up a cluster with automatic fail-over and recovery? I didn't see the author address this aspect of Rethink that has a lot going for it.

Clustering and automatic fail-over and recovery is not always a great idea. It sounds like you want it implemented so you "don't have to worry about it", but fail-over and recovery systems are a lot more complicated and troublesome than an alert, some read-only degraded operation, and a manual promotion.

Aphyr's "call me maybe" series shows just how hard it is to get automatic failover and recovery working just right. If you're not ready to become a full-time expert on the topic, and pay for more servers for the same load, then simple replication and manual promotion can result in less downtime and less dataloss. I've seen multiple small startups have cassandra clusters fail, because they did not maintain them properly. I've seen them mysteriously lose data in elasticsearch clusters, for the same reason.

That said, GitLab has recently shown that even simple replication can be screwed up. I'd say, the lesson is, the most effective thing is to keep it simple (not "easy"), and avoid disaster caused by mundane mistakes.

My opinion is evolving on this matter. I've seen a more than a couple of developer introduced bugs in our application that cause data loss (improper error handling, swallowing errors instead of retrying, etc.) We use RDS with automatic failover and redislabs cluster with automatic failover. I'm 99% positive we have lost data before, and it makes me very uncomfortable to think about. But since we lose data from buggy code, I really can't really justify a huge amount of engineering effort to avoid the chance of loss in DB failover.

Yeah, if you can wait another year or two, something like CockroachDB [1] might fit the bill. That is a database that is distributed from the start. So you (theoretically) won't have to worry as much about individual datacenter failures, and scaling.

[1] http://www.cockroachlabs.com

I usually wait 5 years with a database/datastore after it goes GA before consider it for anything. Looks promising though.

Yes, I've liked the design from the very start, and they've got a nice group of people working on it. It's supposed to go into a 1.0 release this year.

I'll probably try it out on a service where I'm writing simultaneously to two different databases, one local and one distributed. And then log any issues where I see an inconsistency. I won't be relying on it for a while for anything critical.

What about Couchbase?

We had serious operational issues with Couchbase and some funny developers where always ready to answer our tickets with "works on my laptop". One we had 1.3 billion documents in the cluster and it became unstable and we had to delete all of the data to bring the service back online. Luckily we used S3 as the source of truth based on the operational characteristics of CB. I would not recommend it for anybody who is not ready for massive dataloss and continuous operational issues. Some of these might be been fixed in the meanwhile though.

Couchbase appears to require failover as it has authoritative servers (three, it looks like?). CockroachDB should scale linearly with cluster size, meaning failover isn't a thing--it just struggles until you give it more CPU/disk/mem/what have you.

Have a look at http://repmgr.org

Thanks. Indeed, I'm just using one PostgreSQL master, with no HA, but plan to set something up later, and have put repmgr on my todo list. (I'm the post author.)

A single Postgres machine holds all data? So you scale the database by switching to a stronger machine? Or do you plan to shard the data somehow?

Do you have any queries iterating over the entire table?

> A single Postgres machine holds all data?

Yes. It's currently using 80GB of disk, so regarding disk space I can scale for years. I have tiering system where older data gets stored in Google cloud storage, and grabbed on demand. So regarding space, there isn't an issue scaling up. It's also trivial to increase disk space on live VMs on GCE.

> So you scale the database by switching to a stronger machine?

That's my plan. We're at about 50% overall load as I write this, and here's the usage on the database server: "load average: 0.09, 0.16, 0.17". We're collecting historical Prometheus and other monitoring about load and queries, etc. By the time we need 32 cpus (the GCE limit), either the GCE limit will be raised, the company will be dead (no way in hell), or the company will be wildly successful. So in the only case in which I need to scale out, it will be an exciting new problem that I can hire a team to help with, and we'll benefit from understanding exactly what the problem is we need to solve.

I have wasted an enormous amount of time until now due to premature worry about scalability, fueled by my naive assumptions about how quickly SageMathCloud would grow.

It doesn't make sense to me to architect these kinds of applications at the database level.

What's wrong with using something like redis pubsub? I don't get the obsession of evented databases, or implementing this kind of thing at the database level. I suppose its attractive to "listen" to a table for changes but the pattern can be implemented elsewhere and with better tools.

Databases should be used for persistence, organization and schema of data, have flexible querying, and not much else.

You can flip this argument and make a similar point:

"What's wrong with using something like Postgres LISTEN? I don't get the obsession of redis pubsub, or implementing this kind of thing outside the database. I suppose its attractive to "subscribe" to a collection for changes but the pattern can be implemented inside a database which has better tools."

In more constructive terms, databases are familiar to a lot of folks and have reliable guarantees (persistence, ACID) that are generally useful properties. If you're able to achieve the performance you need within the database, then you get a lot of operational benefits from keeping your workload running inside of it. If your workload for some reason is slow inside of a database, then it certainly makes sense to consider specialized alternatives (like Redis). In my experience, however, you can tune a database to perform better than these tools in pretty much every real-world case (i.e. moderate concurrency with realistic load).

Postgres LISTEN/NOTIFY is actually very similar to Redis pubsub. You can manually add triggers to send notification on changes, sure, but the real "in the database" stuff is in RethinkDB.

On the one hand, I'd say, because of decoupling.

If one feature is in Redis and one in Postgress, I can replace each of them more easily.

On the other hand, how often did someone replace Postgress in their stack?

The great thing with LISTEN/NOTIFY is they work inside transaction boundry. This way you can get correct cache invalidation inside inside multiple table changes really easy. But if you cant use transactions that doesnt seem helpful.

That is really interesting - thanks.

I am explicitly not taking any side here, but that is certainly a matter of debate. Especially relational database management systems have all this functionality like views and triggers and stored procedures so that you can move a substantial part of your business logic into the database management system and only have very thin clients in order to minimize duplication of business logic across different clients. Treating a database management system as a dumb data store is only one possible philosophy, admittedly the one that seems to be the dominant one currently, at least as far as I can tell.

In the moment you are out the DB, you are out of ACID.

This is a reason that I don't wanna to do event sourcing outside PG.

I think that is is also a good idea to have a changefeed caching layer on top of PGNOTIFY, and you can definitely use redis for that.

You can have both. Postgres has foreign data wrappers, with Redis as a first class citizen. So, you can push your data to Postgres, and have it trigger to publish the updates. Databases have a pretty big toolset to make your life easier when you need to manipulate your data.

Code-level change data capture doesn't work well for e.g. many-row updates, one off queries... a ton of different stuff. That's not to say it doesn't have uses, but wanting global-DB-update monitoring is not one of those things where it fits cleanly

Aren't they both just the observer pattern? Seems like it falls into the same category as every other "should I implement this in the DB or app land" code.

Also, moving your callbacks to the db means you can call into the db from multiple code bases without fear of replicating the callbacks inconsistently.

That said, I'm not aware of many people who rely on that functionality.

Should be titled: Finding out how awesome Postgres is (the hard way)

What I didn't see mentioned here is clustering. One of the things that sold us on RethinkDB was how easy it was to cluster compared to PostgreSQL. The latter has poor documentation and it's very hard to know you've done things right... and if you don't the results can be catastrophic failures or mysterious replication problems with cryptic error messages.

Edit: also I was led to believe by PG documentation that LISTEN/NOTIFY is impossible across a cluster, which means that code depending on LISTEN/NOTIFY is impossible to cluster. If that's the case you're stuck with master/slave and manual or (scary) automatic failover now.

We wanted a system that is masterless (or all-master) in the sense that any node can fail at any time and the system doesn't care. RethinkDB delivers that, at least within the bounds of sane failure scenarios, and it delivers it without requiring a full time DBA to set up and maintain. That's worth a certain amount of CPU, disk, and RAM in exchange for stability and personnel costs, especially when a bare metal 32GB RAM SSD Xeon on OVH is <$200/month fully loaded with monitoring and SLA. So far we've been unable to throw a real world work load at those things that makes them do anything but yawn, and OVH has three data centers in France with private fiber between them allowing for a multi-DC Raft failover cluster. It's pretty sweet.

The only thing that would make me reconsider is if the use patterns of our data were really aggressively relational. In that case PGSQL would be a clear winner in terms of the performance of advanced relational operations and the expressivity of SQL for those operations. ReQL gives you some relational features on top of a document DB but it has limitations and is really designed for simpler relational use cases like basic joins.

"We wanted a system that is masterless (or all-master) in the sense that any node can fail at any time and the system doesn't care."

Neither PostgreSQL, nor MySQL can do much here and this is actually why we have the nosql movement. These problems are fundamentally unsolvable with the trade offs those RDBMSs made.

ahem CockroachDB would like a word…

And that's why we have the "NewSQL" movement, though a bit slow. The benefit of NoSQL in some cases (Cassandra and HBase for me) is that it restricts developers from using non-scalable data structuring and querying. RDBMS's have a bad reputation primarily because how they were used via unrestricted queries that came back to haunt the product owners.

And even then, some local DB approaches are fundamentally unsolvable in a distributed way (CAP, exactly-once-sends, etc) without trading something, even with cockroach.

They came out of the same movement, you can be pretty sure they didn't take the same trade offs.

MySQL actually just officially released "Group Replication" a few months ago to address this.


> MySQL Group Replication is a MySQL Server plugin that provides distributed state machine replication with strong coordination between servers. Servers coordinate themselves automatically, when they are part of the same replication group.

>>> Weird. OK, I tried it with some other parameters, and it suddenly took 15 seconds at 100% CPU, with PostgreSQL doing some linear scan through data. Using EXPLAIN I found that with full production data the query planner was doing something idiotic in some cases. I learned how to impact the query planner, and then this query went back to taking only a few milliseconds for any input. With this one change to influence the query planner (to actually always use an index I had properly made), things became dramatically faster. Basically the load on the database server went from 100% to well under >>> 5%.

I am actually interested in this part. Figuring out issues with EXPLAIN is one of my favorite things.

I'm replying due to this same part/paragraph. I've been dealing with some manifestation of EXPLAIN since Oracle 6. In the last few years I've become handy with PostgreSQL's EXPLAIN as well, and this reminds me of my biggest hang-up regarding PostgreSQL; their hostility toward optimizer hints.

Like Mr. Stein I too have found myself in bad places with PostgreSQL's optimizer. This is commonplace with relational systems; every such system I've ever dealt with, including all versions of Oracle since the mid 90's, Informix, MS-SQL, DB/2 (on AS/400, Windows and Linux,) and PostgreSQL eventually get handed a query and a schema that produces the wrong plan and has intolerably bad performance. No exception. None of these attempts to create flawless optimizers that anticipate every use case has ever succeeded, PostgreSQL included.

With other systems there are hints that, as a last resort, you can apply to get efficient results. Not so much with PostgreSQL. Not implementing the sort of hints that solve these problems (as opposed to the often ineffectual enable_* planner configuration, unacceptable global configuration and other workarounds needed with PostgreSQL) is policy:

"We are not interested in implementing hints in the exact ways they are commonly implemented on other databases. Proposals based on 'because they've got them' will not be welcomed."

How about proposals based on "because your hint-free optimizer gets it wrong and I require a working solution without too many backflips and somersaults or database design lectures." No? Then sorry; I can't risk getting painted into a corner by your narrow minded and naive policy. PostgreSQL goes no further than non-critical, ancillary systems when I have say in it. And I do.

You're absolutely correct. Optimizer hints are a source of contention in Postgres. I'm sure you're already aware, but just for completeness, here's some more information on optimizer hints and Postgres from Simon Riggs back in 2011. I don't expect this to sway opinion one way or the other. It's useful to read more on Postgres' policy.


From the introduction:

- Introducing hints is a common source of later problems, because fixing a query place once in a special case isn’t a very robust approach. As your data set grows, and possibly changes distribution as well, the idea you hinted toward when it was small can become an increasingly bad idea.

- Adding a useful hint interface would complicate the optimizer code, which is difficult enough to maintain as it is. Part of the reason PostgreSQL works as well as it does running queries is because feel-good code (“we can check off hinting on our vendor comparison feature list!”) that doesn’t actually pay for itself, in terms of making the database better enough to justify its continued maintenance, is rejected by policy. If it doesn’t work, it won’t get added. And when evaluated objectively, hints are on average a problem rather than a solution.

- The sort of problems that hints work can be optimizer bugs. The PostgreSQL community responds to true bugs in the optimizer faster than anyone else in the industry. Ask around and you don’t have to meet many PostgreSQL users before finding one who has reported a bug and watched it get fixed by the next day.

Now, the main completely valid response to finding out hints are missing, normally from DBAs who are used to them, is “well how do I handle an optimizer bug when I do run into it?” Like all tech work nowadays, there’s usually huge pressure to get the quickest possible fix when a bad query problem pops up....

Having never used postgres, this seems extremely scary. I've only ever needed index hints a couple times in other databases, but when you need them there isn't usually an alternative. Messing with global knobs is a good way to cause more problems than you are solving.

I've even entertained the idea that every query should be hinted. For OLTP workloads, you practically always know exactly how you want the DB to execute your query anyways. And often times you find out very late that the query planner made the wrong choice and now your query is taking orders of magnitude longer than it should (worse, sometimes this changes at runtime). I've never actually gone through with this religiously though...

>> Having never used postgres, this seems extremely scary. I've only ever needed index hints a couple times in other databases, but when you need them there isn't usually an alternative. Messing with global knobs is a good way to cause more problems than you are solving.

You've got the plot exactly. The last such battle I was involved with ended in creating a materialized view to substitute for several tables in a larger join; without the view there was no way[1] to get an acceptable plan. Creating this view was effectively just a form of programming our own planner. And yes, the need to update the view to get the desired result is an ongoing problem; one that's scheduled to get solved with a migration to another DB.

Like you I've never been all that quick to employ hints. I tend to use them while experimenting during development or troubleshooting and avoid them in production code. But there have been production uses, and you know what? The world did not end. No one laughed at or fired me. No regulatory agency fined me. It did not get posted on Daily WTF. No subsequent maintenance programmer has ever shown up at my home in the dead of night. It just solved the problem, quickly and effectively.

Sure would be nice if people purporting to offer a fit-for-purpose relational systems understood the value of a little pragmatism.

[1] given the finite amount of time we could sacrifice to deal with it

Just to be clear, those settings are global per query, not global for the entire server. Still that makes them almost useless for large queries, but I would not exactly call them scary.

I added code to my clients to do "SET enable_nestloop TO off" anytime they connect to the database. This sets a global flag for that session, which disables nestloop query planning. It could indeed impact other unrelated queries, which is deeply disturbing, but I don't know any way to disable nestloop query planning only for a specific query (aside from constantly setting and unsetting that flag?). Incidentally, here is an example of the query that causes all the trouble: "SELECT * FROM file_use WHERE project_id = any(select project_id from projects where users ? '25e2cae4-05c7-4c28-ae22-1e6d3d2e8bb3') ORDER BY last_edited DESC limit 100;" Search for nestloop in https://github.com/sagemathinc/smc/blob/master/src/smc-hub/p... to see my relevant client code.

> I don't know any way to disable nestloop query planning only for a specific query

You can do that by using SET LOCAL. Here's what your query would become:

  SET LOCAL enable_nestloop = off;
SET LOCAL applies the setting, but only within the current transaction.

If you post a link to the output from EXPLAIN, I could probably advise you the right way to handle this query.

Thanks!! Also, here's EXPLAIN, examples with timings, and the relevant part of the database schema: https://gist.github.com/williamstein/fb31e07d4057232bd3a3e78... (My email: wstein@sagemath.com)

It's a bit difficult to tell what exactly is going on. For future reference EXPLAIN ANALYZE provides a lot more information into the execution of the query. It tells you how how much time was spent in each part of the plan as well as how many rows were produced by each part.

From what I can tell, this query is getting the 100 last edited files of projects a user is part of. The way it is currently executing is by iterating through the most recently edited files, sees if the user belongs to the project of the file, and repeats until it finds 100 files of projects the user belongs to. Since the query returns no results, I'm you are running the query for a user that is not a part of any project, or of only empty projects. This means the query is looking up the projects of every single file only to find that none of them belong to a project the user was a part. You can check this by running EXPLAIN ANALYZE.

I'm not sure, since you didn't post the EXPLAIN of the query with enable_nestloop = off, but here's what I think is happening. You are getting a merge join between the projects table and the file_use table with the file_use_project_id_idx. If this is correct, this means Postgres first scans through all of the projects and finds the ones the user belongs to. Then it looks up all of the files that are part of one of those projects. Then it sorts those files by the time they were last edited and takes the top 100. I'm not sure if that is what's exactly happening, but I'm sure something similar to it is. You can check how accurate my guess is by running EXPLAIN/EXPLAIN ANALYZE.

The first thing I would try is creating a GIN index on the users field which can be done with the following:

  CREATE INDEX ON projects USING GIN (users jsonb_ops). 
What I would expect to see is a nested loop join between projects and files_used. The query should use the GIN index to find all projects the user belongs to. Then use the file_use_project_id_idx to get the files for each of the projects. Then sort the files by the last time they were edited and take the top 100.

So are there any actual success stories that the NoSQL movement can point at because it's bizarre to me how NoSQL can still be all the rage when time after time all I read is post mortems detailing painful experience after painful experience.

I'm at a stage where I haven't built enough of my current project to make moving back to an RDBMS painful yet, so all this stuff scares me.

I've read a number of things on HN where people are building massive DBs with Cassandra with good success.

I built https://github.com/llambda/agilegps on RethinkDB. The atomic changefeeds made it easy to have real-time updates of tracked vehicle's locations. Though it might be possible to do that with listen/notify in PostgreSQL too.

I've built out several large scale features on DynamoDB and I can say that with a proper design it is really nice to only worry about budget when scaling. Just crank the provisioned read/write capacity up as needed.

This gives me hope that maybe we could see something similar with Datomic. Perhaps it is possible to implement the same append-only (this is the best way I understand it), immutable audit trail on top of PostgreSQL and still walk away with SQL (datalog is neat-o but has a learning curve).

but I dream the dream...

Technically PostgreSQL already has this in terms of the WAL log. The problem is that "rolling back" to a given point of time requires recreating the database from the WAL log so it's not useful in the same way.

There are other "remember everything" schemes you can play with clever triggers, but it always comes back to how you end up using the stored data and how easy it is to bring it back to a queryable state.

PostgreSQL also has it in the form of its MVCC table structure. If you never delete 'old' rows, you could query old states of the databases by ignoring rows with newer transaction IDs.

However, nothing about pgSQL is designed for this approach, so i imagine the performance would be terrible.

nothing about pgSQL is designed for this approach

Interestingly enough, that was a part of the original design from the Berkeley days.


Our proposed approach is to treat the log as normal data managed by the DBMS which will simplify the recovery code and simultaneously provide support for access to the historical data.


3.3. Time Varying Data

POSTQUEL allows users to save and query historical data and versions [KATZ85, WOOD83]. By default, data in a relation is never deleted or updated. Conventional retrievals always access the current tuples in the relation. Historical data can be accessed by indicating the desired time when defining a tuple variable.


Finally, POSTGRES provides support for versions. A version can be created from a relation or a snapshot. Updates to a version do not modify the underlying relation and updates to the underlying relation will be visible through the version unless the value has been modified in the version.

One of the purposes of the much-maligned VACUUM command was to push the historical data to archival (optical) media.

The archival store holds historical records, and the vacuum demon can ensure that ALL archival records are valid.


> POSTQUEL allows users to save and query historical data and versions [KATZ85, WOOD83]. By default, data in a relation is never deleted or updated. Conventional retrievals always access the current tuples in the relation. Historical data can be accessed by indicating the desired time when defining a tuple variable.

Or, to put it another way [1]:

"Since one can't change the past, this implies that the database accumulates facts, rather than updates places, and that while the past may be forgotten, it is immutable."


[1] https://www.infoq.com/articles/Datomic-Information-Model

One of Postgres' proudest features actually used to be this -- "time travel", which used MVCC + row time stamps to allow the client to query historical data. This was eventually removed, but the functionality can be emulated using a contrib extension called "timetravel" [1].

[1] https://www.postgresql.org/docs/current/static/contrib-spi.h...

Maybe you're looking for event sourcing? It's great in theory but for all the reading I've done I haven't been able to find any case studies on it working well in practice

I think Eventsourcing is mostly used by large enterprises that don't really blog much about those things.

Let me just say: it's a very interesting approach, but it's also very complicated and has a large overhead in development time and infrastructure complexity.

For most problems, it's A LOT easier to do classic CRUD + some distributed task queue.

You can combine those two. It's pretty easy to just emit an additional event for each write into some event store (Apache Kafka, Postgres, whatever), so you can get a 'best of both worlds' state.

An append only audit trail would also have to carry some type information, since tables can change shape. It makes the relationship to ordinary relational querying interesting, to say the least.

Postgres's logical replication slots do provide way to implement your own change streaming.

I've always wondered how ReQL didn't look intuitive from this comparison and stayed away but I guess that wasn't a wrong assumption.

"Definitely, the act of writing queries in SQL was much faster for me than writing ReQL, despite me having used ReQL seriusly for over a year. There’s something really natural and powerful about SQL."


Good grief. Well this is a genuinely fascinating post containing a couple of absolutely terrifying insights:

"A RethinkDB employee told me he thought I was their biggest user in terms of how hard I was pushing RethinkDB."

Erm. If I were working for/a founder of a relatively small company using a product or service, especially one that's so critical to my own business, that is not the sort of thing I'd want to hear from the provider.

"Everything was a battle; even trying to do backups was really painful, and eventually we gave up on making proper full consistent backups (instead, backing up only the really important tables via complete JSON dumps)."

Holy crap.

Well, the story has a happy ending, and I think the point about the fundamental expressiveness of SQL is something that a lot of people miss in the mad dash to adopt "simpler" NoSQL solutions. I personally find SQL verbose and a bit ugly, but I still sort of love it because it's hugely powerful and expressive. I was perhaps 6 or 7 years into my career before I became comfortable with it, but I wish I'd thrown myself into learning it properly sooner because it is so incredibly useful.

(Also posted to the comments section of the blog)

Great writeup! One of the issues I've run into with LISTEN/NOTIFY is the fact that it's not transaction safe. ie if you call NOTIFY and then encounter an error causing a rollback, you can't undo the NOTIFY.

I ended up building a system on top of PgQ (https://wiki.postgresql.org/wiki/SkyTools#PgQ) called Mikkoo (https://github.com/gmr/mikkoo#mikkoo) that uses RabbitMQ to talk with the distributed apps that needed to know about the transaction log. Might be helpful if you end up running into transactional issues with your use of LISTEN/NOTIFY.

NOTIFY is transaction-safe: if the NOTIFY-ing transaction rolls back, the NOTIFY will not be delivered. See the discussion here ("NOTIFY interacts with SQL transactions in some important ways..."):


Yup, my bad, thanks. I confused the two problems I worked through in working through my use case. The transaction safety one was not the NOTIFY version of the project. That had different issues, which I need to go back and look at my notes for. My memory is a bit hazy, but IIRC we were seeing notifications dropped under high velocities and needed delivery guarantees.

I seem to remember that when it was first implemented there was a fairly small buffer for queued messages. That might have contributed to your problem. The docs say it's 8GB by default now, which is probably sufficient.

The original NOTIFY also didn't have a payload (that was added in PG9) which made it much harder to know what you were notified about.

That one simple addition made pre-9.0 NOTIFY and post-9.0 NOTIFY completely different beasts.

Messages are limited to 8KB, so if the buffer is 8GB, that's a lot of messages that can be queued up.

The whole point of LISTEN/NOTIFY is that it's transaction safe! Not sure where you got that idea.

Not only is NOTIFY transactional, so is LISTEN.

The only serious limitation I can think of: There was some talk, long ago, about propagating notifications via the WAL, so that LISTEN could work on a read-only slave. I don't know if this was implemented.

Why was the pgsql-listen-exchange abandoned? have you found some problems with that approach?

Mainly because I've had way too much on my plate and have not had time to work on any of my open source projects if they're not directly related to something I'm doing at work.

That's good to hear, the fact that the solution itself does not have a problem, i got it compiling for 3.6.x and working. I will try to look into stability (not crashing everything on disconnect) so any advice on how to go about that would be appreciated (email in profile).


Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact