Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: PostgreSQL or MySQL?
196 points by ReD_CoDE on Sept 22, 2019 | hide | past | favorite | 175 comments
I have four questions related to SQL

1. PostgreSQL or MySQL? And why?

2. Is it possible to build a hybrid database schema? For example, SQLite+JSON?

3. Is it possible to convert XML(XMI) schema to SQLite schema automatically?

4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?

The following isn’t the top reason I recommend Postgres, but is the reason I think least likely to be echoed in a dozen other comments:

Postgres has some of the best documentation of any software product I’ve ever used. If someone wants to learn about SQL or databases, I always have to restrain myself from recommending that they just read the Postgres manual front to back. It’s comprehensive, it’s well written, it’s easy to navigate, and almost every explanation comes with several clear examples. It’s hard to overstate how valuable a property that is for a product as central to your architecture as your primary transactional database.

Too many Open Source developers discount the value of documentation. I all too often hear, "It's Open Source so the community should step up and write the documentation." To which I counter that the best person/people to at least start the documentation are the ones who build the product as they're the most knowledgeable about it. The community will gladly contribute. I personally believe that great documentation was a big reason for PHP's success in the early days.

> I personally believe that great documentation was a big reason for PHP's success in the early days.

I agree. The culture of user-submitted comments with helpful examples and clarifications on each documentation page was a big part of that, too.

Same for all of the GNU software distribution, Sun Microsystems’ products, and SGI’s. Even MSDN has had pretty good documentation.

I miss the days when I could understand how something worked just by reading the official documentation first.

To this day, in my open source projects, I do not accept contributions that lack either documentation updates or test cases.

> I miss the days when I could understand how something worked just by reading the official documentation first.

This is probably why I don’t read much documentation when I’m working with a library, et al. Never thought about it until I read the parent

Absolutely. Good, detailed, up to date documentation is one of my top priorities when selecting development tools, frameworks and software.

This. I especially love their "don't do this" page [0]

[0] https://wiki.postgresql.org/wiki/Don't_Do_This

My pet peeve is great software with incomplete documentation.

All that hard work put into the code, just give us a quick example of how to actually use it! Especially where there is maybe a super superficial example, but missing examples of how to use the optional yet essential and non-straight forward arguments/features.

Oh! I have a much bigger pet peeve for projects which have a fine and moderately-usable examples section, but then decide that's good enough and never document the API because one should just "look at the examples".

This. By chance I started reading the manual for Postgres yesterday and it actually filled some holes I had in knowledge. It's amazingly structured and verbose with extremely clear examples. The PG manual is literally a 0-to-hero resource for everything SQL related.

Alas I found the docs hard to navigate, things are not hyperlinked that should be, and I think adding more examples would help.

Anyway, IMHO MySQL has similar level of documentation.

It may have a good doc but searchability on the internet is still miles behind MySQL and that actually counts more on daily search for answers.

Also I sometimes don't know if I should search with psql or pgsql or postgres or postgresql and I tend to type shorter ones but mysql is quite obvious on that.

Isn’t it like 3500 pages?

And every one of them enlightening!

I once did exactly as parent suggested and read the thing as a PDF on my phone, more or less sequentially, in downtime that I would otherwise have wasted. My knowledge of SQL and databases improved immensely.

To be fair I think I skipped some of the appendix matter relating to niche use cases.

On the first point, default to using Postgres unless you have a particular reason not to. In general, it offers more and better-executed features. There are some scenarios where MySQL offers a benefit and this advice doesn’t apply, but it’s unlikely you will need to worry about that if you’re in a “what database do I use” position.

Your other questions are a bit too vague to be answerable. It’s possible to build anything you want, including custom file formats or schema converters, if you want to invest the time. You might have more useful responses if you explain the use case instead of the solution you already have in mind.

As someone who has transitioned from MySQL->pg, this makes total sense to me. Partitioning is still a bit better in MySQL IMO (many types / options) - but pg is nearly there.

Did you mean replication rather than partitioning perhaps?

I've found MySQL replication quite easy to use, whereas PostgreSQL has been rather clunkier in my experience. However, I've had the exact opposite experience with respect to partitioning.

MySQL partitioning shines with one specific use-case: time-based data retention. e.g., by using RANGE partitioning over a temporal value, you can very efficiently drop large amounts of old data when no longer needed, and can automate this fairly easily.

For anything else, MySQL partitioning rule-of-thumb is you'll have a bad time, mainly for performance reasons.

Postgres's support for table partitioning got a massive boost in the last major release (11.x): https://www.postgresql.org/docs/current/ddl-partitioning.htm...

We use PostgreSQL date RANGE partitions in our datamart DB for exactly this – functionally instant deletion of old data, without leaving dead tuples in the main table.

No. I was referring to table partitioning.

I don't have much experience with replication in either MySQL or PG (but understand this is another area where mysql may edge pg out still).

I'd say, use MySQL 8 unless you have a particular reason not to. You get more resource online when you search around and toolings are better and those are as important as features to many and I doubt MySQL 8 lacks much for most people's use cases.

I will respond only to Q1, taking the perspective of analytics and data science.

Answer: PostgreSQL > MySQL

Postgres implementation of SQL includes a few useful clauses that are useful for analytics that MySQL does not support.

It used to be that MySQL had no window functions, and that made it wholly inferior to Postgres when it came to analytics. However, it seems MySQL began supporting window functions two years ago, so that is no longer a reason to choose one over the other.

There are at least two features supported in Postgres that are not available in MySQL that I use often enough to care:



Having these saves dev time. It is possible to implement either using MySQL, but your code will be more verbose, creating more opportunities for error.

If you care about analyzing the data for data-scientific purposes, you would be better off using Postgres. It isn't just the couple of extra clauses. It's also having more (useful) indexing choices and little choices like being able to use two pipes (||) to concatenate strings instead of CONCAT().

Having consulted for hundreds of analysts writing SQL to explore their data and create reports, I would agree with this assessment. Additionally, I tend to avoid recommending MySQL because of a strange behavior when using Group By. In Postgres, every column must be either aggregated or grouped by when using a grouping expression. In MySQL, this requirement is not present, and the ambiguity of the query is resolved by returning basically random results which are quite often misunderstood to represent the desired report. In other words, you can screw up in MySQL easily and not notice, whereas in Postgres, you’ll get an error message.


This was in older versions. From 5.7 ONLY_FULL_GROUP_BY is enabled by default.

Not to mention TimescaleDB and PostGIS. If one ever needs analytics or GIS these are great additions. One doesn't have to support multiple DBs for that. Also Citus, but MySQL (well MariaDB and Percona) also has Galera cluster.

Also in MySQL alter table and drop table closes your transaction. One would argue it doesn't even have "real" transactions because of this.

Personally,PostgreSQL. Both have their advantages and disadvantages, but my standard arguments in favor of Postgres are that it: 1. Enforces data types natively. I've shot myself in the foot before with MySQL and 'polluting' data. 2. JSONB support makes it easy to use a 'hybrid' schema if you want NoSQL-like behavior or a junk drawer to shove JSON into.

Regarding your conversion questions, it's common to have an app that communicates with several databases, but I don't know of any architecture that would allow ACID transactions in two engines simultaneously.

As far as converting data, it all depends on what your development preferences are. I work mostly in the Ruby ecosystem, and I can tell you that I've used ActiveRecord to a Postgres database, and exported the results into a SQLite file. I'm sure it would be possible to use an XML parser like Nokogiri to convert XML into a SQLite file as well. If you prefer another major programming language, the tools likely exist to accomplish the same thing.

> 1. Enforces data types natively.

Quite a while ago MySQL changed the default to be strict on types and not doing truncations anymore. Recently it also (finally) got check constraints.

> 2. JSONB support makes it easy to use a 'hybrid' schema

MySQL has a JSON data type with validation, binary storage, efficient updates and efficient replication.

My experiences are with a pretty old mysql version (5.5, iirc), but that version allowed inserts without supplying a value to a column that is NOT NULL and has no default value.

If that's the strict mode, I shudder to think what the lax mode must be... :(

And upgrade to MariaDB 10.3 caught those cases, so at least there's some progress in the mysql/mariadb ecosystem.

> MySQL has a JSON data type with validation, binary storage, efficient updates and efficient replication.

Postgres has, in addition, indexes over expression, which allows you to build indexes for specific queries into JSONB.

(I'd argue that if you need indexes, it's typically better to extract that data into the regular table and not keep it in the json, but it can still save your butt if you need to react quickly to a requirement change and no time for a data migration).

> My experiences are with a pretty old mysql version (5.5, iirc), but that version allowed inserts without supplying a value to a column that is NOT NULL and has no default value.

With recent MySQL:

create table ttt (col int not null); insert into ttt () values(); ERROR: 1364: Field 'col' doesn't have a default value

> Postgres has, in addition, indexes over expression, which allows you to build indexes for specific queries into JSONB.

MySQL has that, too. I consider this quite useful, especially since MySQL also supports indexing an array of values from a JSON document. (Given `{"colors": ["red", "green", "blue" ]}` a search for `JSON_CONTAINS('$.colors', 'red')` benefits from the index.

Given the progress both Mariadb and postgresql has made I think you will not go wrong with either. In both you have support for json now which one you chose depends on your use case.

Mariadb - If you want easy horizontal scalability with master-master replication go with mariadb.

Postgresql - If you want reliable and powerful sql-92 compliance and willing to work with master-slave go for postgresql. You can do master to master with postgre-Xl [1]. But still given the ease of use with which you can work with galera cluster I would not recommend postgresql for master-master.

Anyways for your other questions:

2. SQLite already supports json type [2] so its not SQLite+JSON its just SQLite.

3. Yes you can write a simple python script to do it.

4. Why to use a custom file format when SQLite has its own format with support for json.

[1] https://www.postgres-xl.org/

[2] https://www.sqlite.org/json1.html

In my experience, master-master replication with Galera is more troublesome than its worth. It's better to use the simple master-standby solution with automated fail-over.

1. Do you have experience running PostgreSQL or MySQL in production? Use what you know. Do you have friends or colleagues with that experience? Use what they know. Is the stack you end up using predominantly used with one or the other, so you have lots more people in the community who will have hit issues you might have? Use that one. Failing any of those? Default to PostgreSQL.

2. I think you mean SQL+JSON, and the answer is PostgreSQL's JSONB columns, or dumping blobs into rows in MySQL (just keep your row sizes below 4kb).

3. Possible? Yes. A good idea? Almost never. Trees and relations are quite mathematically different, and you will need to specify quite a bit of the mapping to make it come out in a way you would actually want to work with.

4. Not really...generally you would have a SQLite database. If you need stuff that doesn't go in well as a blob, you would put it in a file and include a reference in the database.

Mysql has json columns. You can select and create indexes on json fields.

Can you expand on the 4kb part of row sizes? I have been experimenting with storing json in MySQL tables and I'm curious about this limit

MySQL expects rows to be fairly small, which leads to decisions about how the system handles loading data. You can put larger blobs in, but it may have adverse performance implications.

When considering between PostgreSQL and MySQL bear in mind that client-side drivers are licensed differently. I believe the MySQL connector is still GPL licensed; it used to be LGPL and then the license was changed with little fanfare. Commentators called this "weaponising the GPL". The goal is to get you to buy the enterprise version of MySQL.

I know MySQL way better but at a brief look I can see Postgres drivers are licensed more liberally.

Actually the goal is to get you to unwittingly use the GPL connectors, then backcharge you after an audit.

To be more precise: MySQL Connectors are licensed under GPL with Universal FOSS Exception https://oss.oracle.com/licenses/universal-foss-exception/

> 2. Is it possible to build a hybrid database schema? For example, SQLite+JSON?

Postgres and MySQL both have JSON support that can do fancy things which includes indexes, but it's extra work and I don't think it actually gains you anything. I'd recommend using regular old columns for anything you ever plan to want a query on. But if you have other stuff, it's perfectly reasonable to dump a bunch of random garbage into a JSON column. People use this for things like user settings, and are generally happy with it.

> 4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?

SQLite is already basically a file format. You can store JSON blobs in strings somewhere inside it if that's convenient.

Postgres jsonb column type has worked great without much effort. The old JSON type wasn't great and required special index, but that's decently in the past now. Not sure of MySQL in this aspect.

If you ask this at HN most likely it will be PostgreSQL.

If you ask scripting language it got to be python or javascript here.

I use HN daily and noticed as a group, HN readers do in general have their preference on things like this.

By all means, MySQL's market share is much larger than PostgreSQL, and MySQL 8+ is very impressive. I compared both and eventually picked MySQL for my projects, even though I really _hate_ Oracle.

You could use Percona MySQL, which isn't by Oracle and is much better anyway.

Wouldn't MySQL's market share be larger due to the historical usage in the LAMP stack?

1. Depends. If you have a lot of update queries, MySQL wins [https://eng.uber.com/mysql-migration/]. MySQL also has better replication and sharding support at the moment.

2. PostgreSQL and MYSQL also sport JSON... JSON functions can be a little tricky. Still, they work.

3. Depends on the structure of your XML. Might even possible to use RegEx to achieve this objective. XML to JSON is easier though. That can be saved in JSON fields in a DB.

4. You can build a custom file based on anything. The only catch, the SQLite client will only read a valid SQLite file. As such, you have to know what you're doing.

[Edit] Formating

That's incredibly well written, constructive feedback by the Uber team! I wonder if the Postgres team has taken this on board and attempted to resolve some of the design issues, e.g.: by moving the cache in-process.

The Uber team betrays their lack of knowledge about PostgreSQL in the article, so take it with a huge grain of salt. I recommend reading Markus Winand's article.


But that does not mean though that there are not also real issues with table bloat in PostgreSQL, and they are being worked on by creating a new storage engine with a different MVCC implementation.

#1: depends on your scenario. Postgres is more like Oracle--there are types of query it can do natively that would have to be externally assisted with MySQL. MySQL is simpler, has all kinds of warts, but scales decently.

Here's why uber moved to MySQL: https://eng.uber.com/mysql-migration/

#2-4: My use of SQLite has always been very basic, so no idea. Per SQLite's author: "SQLite does not compete with client/server databases. SQLite competes with fopen()."

That's a very interesting article. Does anybody knows if Postgres 10 and 11 addressed any of those issues?

https://news.ycombinator.com/item?id=14222721 is a continued discussion of the response.

Re: Uber it's not like they didn't cause some of their own issues by keeping very long running transactions. Also, they don't use MySQL directly, my understanding is thqt they wrote their own database and use MySQL as a kv store.

Based on what I have read, "kv store" is a major over-simplification.

A number of companies have built special-case storage services/APIs on top of MySQL. This is not the same thing as writing your own database. In any case, it shows the strength and stability of MySQL for high-volume OLTP use-cases.

Also I don't think "very long running transactions" were the singular core of Uber's problem. InnoDB MVCC doesn't handle those well either; a long-running tx blocks the purge thread and causes a pile-up of old row versions. While the impact of that is less severe in InnoDB than in Postgres, it's still very bad and will lead to performance degradation and undo-log size bloat.

> InnoDB MVCC doesn't handle those well either; a long-running tx blocks the purge thread and causes a pile-up of old row versions.

Remember that what they moved to isn't directly MySQL eitger, but only uses it underneath.

It's still MySQL/InnoDB under the hood. I don't follow your point -- why does it matter whether an application is talking directly to MySQL, vs using an intermediate service with a non-SQL API? If the intermediate service supports long-running transactions, you'll hit the same underlying issue regardless...

If your point is that they also happened to move to a new access pattern that eliminated long-running transactions, and conflated that access pattern change with their migration off of pg, that's a fair criticism if true. Hopefully someone from Uber can clarify this aspect.

Also from what I recall (been a while since I read Uber's posts about this), Uber's specific MVCC issues related to pg's use of physical replication and locking impact on MVCC; and also separately its storage of old row versions "inline" vs InnoDB's use of a separate undo log. And on a different point, I remember some aspects of InnoDB's clustered index design being advantageous to Uber's workload as well. In total, this is why I suspect simple "long-running transactions" alone were not the primary/singular reason for the switch (at least if "long" means "minutes or hours" in order to negatively affect InnoDB).

Yes, sorry, I didn't mean to imply that long running transactions were the only cause, but I believe I remember.

I just need to go back and read the conversation on the pg-general list, it was very good in that it did discuss that some of the issues mentioned by Uber were real. https://www.postgresql.org/message-id/579795DF.10502%40comma... and perhaps some other blogs from around the time this was being discussed.

For me postgresql's schema, EXPLAIN, Column modification and UUID. But most of the time I stick to sqlite, it satisfies most of the project with low footprint and require for installation restricted servers.

Every time I've used sqlite for any kind of throwaway web app, I've regretted it due to the almost complete lack of support for concurrent operations. Sqlite is for file formats, not anything that might have concurrent writes.

> Sqlite is for file formats, not anything that might have concurrent writes

Thats very well put and excellent advice. I still bust out in giggles that we're in a world where "SQL for file formats" is a thing.

TIL. I really thought SQLite had better support for concurrent behavior.

Do you know if the problem you cite is distinct separate processes of SQLite? Ie, can a single process (aka one web app) handle concurrent behavior properly with SQLite? Or will it fail even in that scenario?

Why write code to deal with that complexity when you can just use pg or MySQL?

Because depending on the use case you don't want to run a 2nd server. For example, I work on some FOSS that lets users self host. The intention is for minimally experienced users to self host, as well as host on small laptops/etc. For obvious reasons, spinning up a full database has issues - requiring them to install Docker or etc is not feasible.

With that said, knowing how DB concurrency behaves in inherently concurrent applications (anything HTTP facing, for example) seems vital. If it needs to be not concurrent, it's paramount to know that.

To be clear, knowing how and when the concurrency breaks down does not mean advocating for it. Merely that if concurrency wasn't supported, you may need to write your applications differently to properly lock the non-threadsafe behavior.

To echo a few other commenters here, a major point of consideration is your team's experience and broader technical network. Generally speaking, either Postgres or MySQL is likely fine for most applications. But you'll have a better time with whichever one you can most smoothly build on, operate, and hire for.

Separate from that, and with the disclaimer that my career has been heavily MySQL-oriented, although I try to provide a balanced perspective:

MySQL can be a good choice for OLTP-heavy workloads, such as a high volume of relatively straightforward queries. It also supports a very wide variety of replication options. In some aspects, MySQL is simpler to operate than pg. (This can be a point of contention, as MySQL is more idiosyncratic and this leads to some land-mines that are well-known to the community, but unclear from the manual alone.)

At this point there are also many hundreds (thousands?) of engineers and DBAs worldwide who have extensive experience working on massive-scale MySQL deployments. This may or may not be relevant to your project, depending what you're building and its eventual best-case size.

MySQL is admittedly less "batteries included". Although MySQL/InnoDB does support things like windowing functions, fulltext, geospatial, etc they're not necessarily a best-in-class implementation yet. In a MySQL shop, if you need OLAP, you're better off doing ETL into a separate column store or Hadoop cluster; if you need fulltext search, you're better off using a separate search index system like Solr; etc. Depending on your point-of-view, this approach is either "more UNIXy" or a major operational headache, or both :)

Meanwhile Postgres may be a better choice if you want to handle a bunch of things using a single piece of software: mixed OLTP + OLAP workloads, fulltext search, geospatial. It also has some interesting indexing features MySQL lacks, such as partial indexes, and ability to index an entire nested JSON structure in a single index.

Postgres gets major points for having a true open source development model, cleaner core code, better adherence to SQL standards, etc. Conceptually, it is very well-aligned with the textbook description of a relational database. Its adherents love it, sometimes fanatically so, which can be a mixed bag to outsiders. I sometimes feel parallels to the Mac user community in the late 90s/early 00s: it's probably a better choice for many use-cases, but personally I raise an eyebrow to suggestions that it's the universal best choice for all possible situations.

I'm not a very tech savvy and know very little about databases. But I have read countless posts/comments over the years and everywhere everyone recommends postgres.

Again, I have no clue about other database systems, but postgres documentation is just awesome. I believe you can practically everything from its documentation.

I find a comment like this to be underrated; it’s like a meta-analysis.

1. Both are great, they're both absolutely fantastic products!

PostgreSQL does have slightly more popularity nowadays, especially since they came out with JSONB support first and have more advanced features built-in.

I have used PostgreSQL+JSONB and it works great. Doing indexes on JSON fields adds a lot of value.

What do you use the JSONB column for?

We've loved the flexibility of JSONB while our schema wasn't nailed down, but eventually migrated many fields to be columns for ease of reading and writing nested objects.

There was an Uber engineering post years ago about why they switched to mysql. Not sure how relevant it is today


Use what you're most comfort with, not what others are.

Postgres and MySQL (MariaDB/Percona) are both big players with a huge community.

Maybe consider to use Mariadb or Percona instead of MySQL to avoid oracle vendor lock in.

EXPLAIN is almost useless on mysql, but very helpful in postgres.

EXPLAIN is a very useful command once you have some degree of success (users).

EXPLAIN with the JSON or TREE formats in MySQL got a lot better. Also MySQL's optimzer traces give lots of information more https://dev.mysql.com/doc/internals/en/optimizer-tracing.htm...

MySQL's tabular explain is largely understandable once you also understand how simple its optimizer and execution mechanism is. All you mostly need to look at is the join order and indexes used, and whether there's a filesort in there, and then it's clear enough.

The linear list of steps doesn't work well for a smarter engine.

Check EXPLAIN ANALYZE too as it's freshly baked into MySQL https://dev.mysql.com/doc/refman/8.0/en/explain.html#explain... especially useful if you switch from PosgreSQL to MySQL

1. Mostly, Postgres, as it's superior in most areas, but MySQL seems to still have advantages in replication (and, if you are hosting in AWS, there Aurora Serverless MySQL offering supports somethings, like their own Data API, that the equivalent Postgres offering doesn't yet which may tip the balance.

> Is it possible to build a hybrid database schema? For example, SQLite+JSON?

Yes, but many DBs will hold JSON natively and, conversely, lots of DB data fits nicely in JSON. What's the motivating use case?

> 4. Is it possible to build a custom file format based on SQLite or hybrid one based on SQLite+JSON?

Not sure what you are asking here; SQLite has its own file format.i suppose, as with many combined formats these days, you could stop that together with a folder of JSON documents and call it a custom format. What's the motivating use case?

Aurora PG Serverless is out as of July, but I’ve read some horror stories about it so far.

Don't know about horror stories, but Aurora PG Serverless doesn't (yet?) have the Data API support (which enables a couple of other Aurora serverless features) that Aurora MySQL Serverless has, and IIRC has some other feature limitations that aren't directly tied to Data API.

Data API now supports Aurora PG Serverless, https://aws.amazon.com/about-aws/whats-new/2019/09/amazon-au...

Interesting. Do you have any links or further info? Evaluating setting up a standard PG RDS instance or going Aurora PG Serverless.

Thank you all, I've tried to read all the comments. I think I have to clarify the questions I asked:

1. Is related to our upcoming startup, we chose MySQL + Redis, but some said use PostgreSQL

(However, my friend explained that MySQL before v8 was weak and after v8 has tried to come back to its salad days, this is why he chose MySQL)

2-3-4: In the industry, especially in the industry I'm active, Digital Built Environment Industry, there's a lot of invaluable schemas, especially XML(XMI) that I feel "these schemas convert structured data to semi-structured data" which is not good and efficient

For this reason, I'm thinking of SQL (Server) and SQLite (File) to preserve structured data

There's a vitally important schema in the built environment industry called Industry Foundation Classes (IFC) [1] which is based on EXPRESS/STEP and also has XML(XMI) schema too, but I want to convert the schema to SQL and SQLite

A friend did an invaluable job and after years introduced a correct method to convert it to SQL [2][3] but don't have any idea of developing the schema based on SQLite [4] that will accelerate the IFC development

[1] https://www.buildingsmart.org/about/what-is-openbim/ifc-intr...

[2] https://forums.buildingsmart.org/t/ifc-for-relational-databa...

[3] https://forums.buildingsmart.org/t/store-modify-and-retrieve...

[4] https://forums.buildingsmart.org/t/ifc-sqlite-project/1359

For point 3 (Is it possible to convert XML(XMI) schema to SQLite schema automatically?), are you asking about a tool that takes an XML schema and gives you back an SQL schema?

I haven't heard of anything, but I would imagine it exists. I have, on the other hand, done a fair amount of XML to SQL conversion and have found that generating a good schema involves understanding the data so you know what to denormalize. You don't really want to have thousands of tables, each representing an XML tag. That would make your database very hard to use.

Trial and error, and a good knowledge of what kind of questions you are going to be asking your database to answer...

Answering Q1: MySQL over Postgres. Postgres has 2 critical things going against it - The query optimizer is a mess - It's implementation of secondary indexes is not cache friendly, so it can be slow

I don't know what you're trying to say. PostgreSQL makes no distinction between primary and secondary indexes. MySQL, which only knows clustered indexes (see "Unreasonable Defaults" [1]) has to make such a distinction, so that when you use a secondary index it also has to go through the primary index in addition to the secondary index. Also

[1] https://use-the-index-luke.com/blog/2014-01/unreasonable-def...

> implementation of secondary indexes is not cache friendly

Can you get into more details about this? What about the implementation is not cache friendly?

I would also recommend Postgres for the features alone. Like the full text search.

One reason not mentioned often is the vast amount of plugins which can do lots of useful things.

Like PostGis or TimescaleDb which all work really well.

Instead of requiring a plugin, MySQL has good GIS support natively built-in.

As previously mentioned, I agree that there is much less difference between them than there used to be. Use what you know or have access to those who know.

Personally I prefer PostgreSQL and the main benefit of using MySQL is how easy it is to set up replication and operate in master-slave, master-master, or group replication.

For high write volumes, TokuDB and now RocksDB do work well, the former for sequential writes and the latter is LSM Tree based.

Postgres is a long-term support stack at this point, like the JVM. Too many people rely on it. You need a very good reason to pick anything else.

1. PostgreSQL. Fast, reliable, (relatively) small (for some reason mysql got a lot bigger). Had a few historic problems with replication but I think that's been resolved. I dealt a lot with large scale large data sets and I wouldn't recommend mysql for that at all, except if they were very simple and relations were either not used or used minimally. There are better database systems than either, but I can't think of a less reliable one than MySQL. If you use PHP though, mysql is easier.

(I'm going to assume the rest of the questions are SQL related and not SQLite, which is a particular binary format)

2. why? SQL + json would make some sense, sometimes. I haven't needed it - I just do SQL. Others have better answers.

3. There are tools to do some XML to and from SQL. YMMV. I always rolled my own.

4. it's always possible. Is it worth it? I used CSV with embedded json for a long time with archival storage because it was reversably parseable and could scale time-based databases somewhat reasonably. There are smarter ways to do this too ...

> I dealt a lot with large scale large data sets and I wouldn't recommend mysql for that at all

Was your use-case transaction processing (OLTP) or analytics (OLAP)? There are a ton of examples of massive-scale MySQL deployments for OLTP... Facebook, YouTube, Pinterest, Slack, Uber, Wikipedia, GitHub, Yelp, Etsy, Shopify, Booking.com, Wordpress.com, Tumblr, Box, Dropbox, Alibaba, Square, Venmo... a large chunk of the internet, in other words :)

It seems most people on HN (and to their credit, they also admit), they have limited recent experience with MySQL, so they don't know that they are 99% functionally equivalent (with edge cases being better for one or the other).

analytics mysql was horrible for that, eventually bogging down to the point it was breaking the system down. Storing was ok. Analyzing emphatically was not.

And no amount of testimonies will counter that MySQL has a lot of fundamentally broken assumptions when it comes to data structure and relations. I mean it works if you don't mind handling that at higher levels, and sometimes that's a good call. And sometimes it's not.

You don't mention SQLite in question 1, but then include it in questions 2-4.

Did you maybe mean "SQL" and not "SQLite" here?

For very simple usage, you won't see a lot of difference, but I would go with postgres. It has a lot of features that MySql doesn't, and isn't any harder to set up and use.

Point 2: as long as you also write software to deal with it, I guess so. Postgres has a JSON type. This question makes me want to ask what you are trying to accomplish.

Point 3: Yes, probably, and for all I know there's something out there to do it for you. I'm assuming basic usage; there is undoubtedly some weird corner of xmi that makes it a Turing-complete language or something equally ill-advised. So writing a schema converter for very basic stuff is probably quite easy, but really filling in all the junk is possibly less so.

4 appears to be a repeat of 2.

I'm going to write a couple of things about the first point.

Several years ago, a knowledgable guy told me that the most compelling reason for choosing between PostgreSQL and MySQL was the expected I/O: "for read-intensive workloads (e.g. blogs), choose MySQL; for mixed workloads (e.g. forums), choose PostgreSQL".

But I honestly don't know if that may still be valid as of today.

Nowadays, I think that for basic things, it doesn't really matter; but for peculiar things, Postgres may have some advantages (both technically and not). Also keep in mind that, for some popular scopes, SQLite is likely everything you really need.

> Several years ago, a knowledgable guy told me that the most compelling reason for choosing between PostgreSQL and MySQL was the expected I/O: "for read-intensive workloads (e.g. blogs), choose MySQL; for mixed workloads (e.g. forums), choose PostgreSQL".

This seems a bit backwards. MySQL's main strength is OLTP workloads, including mixed read-write workloads / high write volumes. A majority of the giant social networks are using MySQL (or previously used MySQL before moving to in-house custom databases) and have insane write volumes.

For OLAP workloads, or mixed OLTP/OLAP workloads, Postgres tends to be a better choice. OLAP is inherently read-heavy, which is why I would disagree with the advice above.

The biggest social network uses MySQL because they already dumped so much engineering into it that changing is basically impractical. But it's mostly used as a key-value store backing a custom graph database.

That is simply not correct. Please understand that I worked on MySQL at Facebook, so I know what I'm talking about here :)

Facebook developed an entirely new MySQL storage engine (MyRocks, which is a RocksDB-backed engine for MySQL) and then migrated their largest sharded tiers to it. This is basically just as much work as developing a new database from scratch, i.e. more work than something like migrating to Postgres. This completely debunks the "changing is basically impractical" claim.

And while Facebook's primary db tier (UDB) does have a restricted API / access pattern, calling it a "key-value store" is a gross oversimplification at best, or completely inaccurate at worst. Range scans are absolutely core to the UDB access pattern, for starters.

Many other social networks are also built on MySQL (linkedin, pinterest, tumblr; and several in China) or previously used MySQL before moving to a custom in-house db (twitter). I think reddit and instagram are the only two using pg? And I recall parts of instagram were being moved to mysql, although I'm way out-of-date on whatever happened there.

> This is basically just as much work as developing a new database from scratch, i.e. more work than something like migrating to Postgres. This completely debunks the "changing is basically impractical" claim.

I disagree for three reasons:

1. The long tail of code using MySQL at the company, like at any large software company, is prohibitive. You would have to maintain MySQL and PostgreSQL in parallel for years. A new storage engine, on the other hand, is controlled by one team.

2. Migrating from InnoDB to MyRocks consists of successively adding MyRocks replicas, letting them catch up, and removing InnoDB replicas. That is a dramatically easier proposition than migrating tiers to PostgreSQL.

The fact that RocksDB was a hard technical project is kind of irrelevant. The new storage engine provided major wins and could be done within a team, while migrating to PostgreSQL would provide at most small improvements and demand changes to huge amounts of code and massive data migration projects. That makes the former project deeply practical and the latter impractical. If the usual stack back in the day had been the LAPP stack instead of the LAMP stack, we would be having this discussion the other way.

> calling it a "key-value store" is a gross oversimplification at best

That's fair. The right thing to have said would be that the query patterns that are used are extremely simple selects over a single table, which is a place that MySQL has traditionally shone. MySQL's query planner still does strange things on complex queries from time to time. I had a case about six months ago where one shard decided it was going to reorder indexes in a query and load everything in the database's core tables before filtering it down instead of using the proper index order like the other nine hundred something shards. Easily fixed once we realized it (we forced the index order in the query), but the fact that we had to... I have heard that this has all gotten much better in MySQL 8.0.

You're severely underestimating the amount of effort that went into MyRocks. The development and deployment was a 3+ year effort spanning quite a few different teams.

Automating the physical rollout (as you correctly described) is the easy part. That doesn't account for all the many difficult spots that occurred prior to it: the massive complexity of mysql storage engine development in general; huge numbers of various performance edge-cases; converting replication-related system tables to MyRocks in order to achieve crash-safe replication; developing online hot-copy for MyRocks from scratch; schema change support; adding tons of new status vars and system vars; fast bulk import to MyRocks which is necessary for the replica migration to even be possible; updating hundreds of thousands of lines of automation code written under the assumption of InnoDB being the only storage engine in use and using various InnoDBisms...

The MyRocks migration wasn't a project I personally worked on, but I'm very aware of what was involved. It appears you joined FB PE in 2017 and therefore missed much of this historical context? I'm not really sure why you would have such strong opinions about it.

You say that FB is using MySQL because "changing is basically impractical", but also say MyRocks "provided major wins", which seems to be a contradiction. In any case, I'm not aware of any pg feature that provides compression ratios anywhere near that of MyRocks, and pg is only recently even adopting an arch that supports pluggable storage engines at all. In combination it's really hard to make a case that FB is using MySQL just due to historical investment and inability to change.

Honestly I would also not be surprised if FB moves some core tiers off of MySQL to a pure-RocksDB solution at some point in the future. The number of intermediate data services and proxies make this sort of thing absolutely possible. For the same reason, in theory a move to another db like pg would be completely possible without needing to run both in parallel for years (again, just talking in theory; moving to pg just doesn't make practical sense).

> The right thing to have said would be that the query patterns that are used are extremely simple selects over a single table

For UDB, sure. What about all the other MySQL tiers? The non-UDB MySQL footprint at FB, despite being a minority of FB's MySQL fleet, is still larger than the vast majority of other companies' relational database fleets worldwide. The range of use-cases in the DBaaS (XDB) tier alone spans a massive combination of different database features and access patterns.

> You say that FB is using MySQL because "changing is basically impractical", but also say MyRocks "provided major wins", which seems to be a contradiction.

I think I must not be expressing myself clearly. 3+ year projects involving a large number of teams to get back to where you started are impractical. That's what migrating to PostgreSQL would be. Perhaps I should have written "switching from MySQL to PostgreSQL would be impractical"?

Apologies if I'm misunderstanding. To take a step back and paraphrase this subthread, as I understand it:

* `dezzeus said MySQL was better for read-intensive workloads, Postgres better for mixed read/write

* I replied saying there are a number of huge social networks with insane write rates, which is contrary proof against that claim. (Having personally spent most of the past decade working on massive-scale MySQL at several social networks / UGC sites, this topic is near and dear to my heart...)

* You replied saying, iiuc, that FB is only using MySQL for historical reasons and difficulty of switching. (IMO, your initial comment was tangential to the original topic of comparative read vs write perf anyway. Regardless of why FB is using MySQL, factually they are an example of extremely high write rate, previously via InnoDB for many years. That said, I wasn't the person who downvoted your comment.)

* I replied saying that's inaccurate, as FB demonstrably does have the resources and talent to switch to another DB if there was a compelling reason, and furthermore MySQL+MyRocks provides a combination of feature set + compression levels that other databases (including Postgres) simply cannot match at this time. At FB's scale, this translates to absolutely massive cost savings, meaning that MySQL+MyRocks is a better choice for FB for technical and business reasons rather than just historical reasons or difficulty of switching.

I may have misunderstood, but it definitely felt like your original comments were throwing shade at MySQL, and/or publicly stating historically inaccurate reasons for why FB is currently using MySQL.

I think I have done a poor job of communicating, and I'm sorry for it.

Question on this area. What have you found to be the best way to migrate an existing app to Postgres?

We plan to migrate to Postgres, but with every LOC written we're entrenching ourselves into Mysql. We all want Postgres, but at the same time I just can't see the transition happening with each passing day.


edit: I should be clear too. I think our goal would be to not require a full rewrite spanning weeks. So an active translation layer, or replication.. or something tricky.. but I just don't foresee us being able to spend a month putting everything on hold so we can migrate.

My company recently looked at MySQL vs PostgreSQL. While we chose Postgres there is one thing that stood out in my research, I haven't found any comments on this:

We were looking at scalable, HA, possibly multi-DC workloads and such, and yes, relational DBs are not necessarily the right choice, but MySQL looked a bit better in this regard, especially for master-master replication. There are several high-profile teams that have huge production deployments running on MySQL (IIRC Uber and Github) and there are a lot of things to be learned from them.

Did you evaluate Citus as well?

You can compare it from here https://sql-workbench.eu/dbms_comparison.html

Is there a Galera-equivalent system available of Postgres? I.e., multi-master replication.

I know of BDR, but that doesn't seem to be publicly available for recent versions of Pg?

Not sure about master-master, but I've used repmgr[1] to manage a PostgreSQL master-slave setup and it works pretty well though it is by no-means plug and play. There are a variety of steps and manual processes to get it up and running.

[1] https://repmgr.org/

We find it handy to have multi-master, which leans towards being more active-active: we have keepalived doing health checks on two systems, and point our app server(s) at a vIP.

The app server runs Galera's arbd so we don't need three DBs.

It's not a Postgres, but has a some limited Postgres dialect: CockroachDB. It has a range-based master-slave replication with routing and therefore you can use it instead of master-master replicated solutions. Be aware of high latency of distributed databases.

My first goto database is always SQLite. Only when SQLite is unable to handle the requirements, for example, too many writes, do I look at PostgreSQL.

Regarding 1): choosing a software stack (or any part thereof) without a clear use case is a recipe for failure. What's the use case?

To be fair, often you don't have your requirements completely nailed beforehand, or they change midway through the project.

That's why I choose Postgres: aside from extreme scale or latency, you can throw pretty much anything at it. Also, given expert advice, its performance degrades gracefully i.e. lots of easy ways to optimize and workaround issues.

Example: pg's type system is insanely flexible incl native support for high speed JSON, latlng, geometry, and more - and types are "batteries included" incl matching, inspection/extraction (e.g. JSON navigation), conversion, concurrency and concurrency control, ACID recovery, replication, etc. For example, it's one way line of code to create an index on the results of a JSON navigation expression whose result is then converted to another datatype e.g. timestamptz. Performance is then 100x+ on queries that use that expression.

Pg datetime handling is also world class incl timezones and conversion, query and manipulation functions. Want the records from last Tuesday across timezones, given data in seconds-since-the-epoch? The SQL is shorter than in English.

If the requirements are still up in the air, then I would aim to use the simplest, most evolvable solution. This to me means no database.

If the requirements are still up in the air, but you know that you're going to be storing some data, definitely go with a database, even if just a single SQLite file. Using plain files means figuring out the file format, which turns into a huge mess when requirements get tacked on later. The other option would be to store a JSON file or a bunch of JSON files in a directory, at which point you're building a shitty version of either SQL or MongoDB instead of focusing on your business logic.

The grandparent comment is right about one thing: you need to figure out what your use case is, in order to make a good judgment call. “Some data” can mean a lot of things. Transactional data? Sure, get a sql database in. Images, video, or plain json documents? SQL databases are a poor fit there.

One should also consider if they even need to manage data persistence. I’ve had success using Google Sheets as the backing data storage for a prototype. Below a certain scale, it is a lot easier to visualise and manipulate data in a spreadsheet than through a sql client.

This. Answer “pg!” without any clue of what the objective is, and you'll end up with surprises like “oh, you know, this is for a revolutionary concept where people can check in their source code at different stages of development, so that they have a history of their changes and can restore an older version if needed” or “hey, we've followed your expert recommendations but now we're facing some availability issue after a link to our shiny new search engine made the front page of HN and Slashdot”. Migrating from pg to git or SOLR is not really enjoyable, and you can't reuse much of what you've done (unlike when migrating one SQL database to another).

Although there is already pretty unanimous consensus here, one side note is that administrating mysql is always a headache in my experience whereas postgres seems to just work.

Every time I start a new project and I see it's using mysql i sigh a little bit knowing that I'm going to end up spending hours fighting some dumb configuration issue.

What does your app/framework support as a default store (what do the devs use and test against).

What features do you expect from the db? What versions of each database can you run?

Have you got an environment where one option is better supported than the other? (For example AWS aurora cluster and MySQL?)

Do you have an ops team with existing experience with either db?

MySQL: Wait a sec, what do I use for this table, InnoDB or MyISAM? What are the tradeoffs again? Oh, right, do I want ACID transactions or fulltext search? Man, this tradeoff sucks.

PostgreSQL: Wait a sec, which schema (namespace) should I put this table in? Man, it's nice to have namespaces within a DB.

MyISAM is considered deprecated at this point. It isn't crash-safe and basically should not be used at all, let alone for a new project.

InnoDB has supported fulltext indexes since 5.6 (GA release back in feb 2013). However, to be fair, fulltext indexes in InnoDB are not widely used and have a mixed reputation at best. Large MySQL-based companies tend to lean on external search services e.g. Solr.

I'm not familiar enough with pg's fulltext to compare. I would expect the functionality is superior to InnoDB's, but I am genuinely curious whether large pg-based companies rely on it solely vs using an external search index service.

InnoDB has Fulltext search for about 10 years now. Also in other regards InnoDB has all MyISAM funcionality for a while and is faster in by far most realistic scenarios.

No one has been using MyISAM for the last 10 years.

Funnily enough, I have not been using MySQL for the last 10 years either, because I switched to Postgres and never went back.

Fair, but why comment on MySQL's supposed deficiencies if your knowledge of it is so out-of-date?

But vanilla PG didn't get good master / slave feature until "recently" so you were running without replicas in the last 10 years?


Your link is from 9 years ago.

The release from 9 years ago was not even close to what MySQL was offering at the time.

This happens when someone asks a versus question on a Sunday.

Unfortunately this is not true, I found it used in the most unexpected places, even new developments as old as 2-3 years.

That's mostly because it's the default in some versions and the developers don't know/care that MyISAM is dangerous (as it's not transactional and it will transparently ignore any kind of foreign key statement).

We specifically had to switch to MyISAM for some things when upgrading from 5.5 to 5.7, in 2017. We were hitting InnoDB’s 4k row limit in many cases (without clobs). Fortunately the tables in question are filled once, updated a couples times right after, and from then on only read, so lack of ACID is ok just for that. (All the real transactional tables are InnoDB.)

InnoDB got fulltext search support nearly 7 years ago if I remember correctly...

MySQL or better yet AuroraDB if you want scalability. MySQL still has a sizable speed advantage, is easier to learn, maintain, and has a more vibrant world of tooling. MySQL is also easier to find experienced developers for. This has been my experience anyway.

Are you confusing SQLite with SQL?

Isn't MySQL owned by Oracle?

To each his own, but that would just about answer the question for me.

MariaDB isn't.

In case it's not been said already, Oracle own mysql, use mariadb instead

I'd say Postgres because of tools like https://hasura.io that let you automatically layer graphql on top of it and jsonb :)

I really like the direct DB to filesystem mappings of the MyISAM engine in MySql. A database is simply a directory. A table is simply 3 files in that directory:

    - One for the structure
    - One for the data 
    - One for the indexes
You can simply copy a MyISAM database by copying the directory (while the MySql server is stopped).

MariaDB has forked the MyISAM engine and created the Aria engine. It is a crash safe version of MyISAM and is now their default engine for all system tables.

So in contrast to MySql, you can completely disable InnoDB in MariaDB. MySql does not support that anymore as their system tables are InnoDB.

Especially for non-concurrency workloads, MyISAM and Aria are super fast.

For these reasons, I really like MariaDB.

You also get "mariabackup" which is a fork of perconabackup for consistent, physical backups of InnoDB and Aria/MyISAM

1. Whatever you feel most comfortable using and administering because the alternative is dangerous for production.

2. Probably.

3. Probably, but you'll probably have to write the code to do it.

4. Why bother?

Postgres hands down.

MySQL or MariaDB?

Materialized views, common table expressions, outer joins, are all missing in mysql and essential. The Postgresql's stored procedure language blows mysql out of the water, and it's possible to use different language backends.

SQLite has json extensions. You need to select them at compile time though, so you can't just assume they'll be there by default.

MySQL has supported left/right outer joins for as long as I can remember, and I've been using it continuously since 2003.

MySQL doesn't support full outer join, but it's rarely needed, and you can often get a similar result using a UNION. (Clunky, I'll admit... but I'd say the same thing about Postgres recommending against use of NOT IN, which is a common construct and more readable than an anti-join.)

CTEs were added in MySQL 8.0 (~1.5 years ago).

Agreed on the other points.

> . but I'd say the same thing about Postgres recommending against use of NOT IN, which is a common construct and more readable than an anti-join.

The recommendation against that is due to the semantics defined in the standard. If there is any NULL in the list matched against, the entire a NOT IN (...) expression also has to evaluate to NULL. Which is extremely confusing to most people (and has some consequences in how efficient things can be executed).

I have quite a bit of experience with #2.

We use a SQLite hybrid with JSON blobs (stored as regular TEXT columns) as the principal transactional store for our clients' business system state. The simplicity of this approach (which is definitely not for everyone) is what made it feasible for our small development team to manage the project as far as we have. If we were still using a big-boy SQL solution like PostgreSQL or SQL Server and managing all of the concerns specific to these dependencies (multiplied by the number of environments we are responsible for deploying these to), we probably would have been forced to close up shop by now.

Being able to deploy a single binary which brings every dependency (including the database) it needs along for the ride proved to be a lifesaver for us. The process before this involved spending an entire day getting SQL Server configured relative to the service accounts, and whatever ridiculous quirks may exist in each clients' specific AD. Today, it is literally a 30 second file extract, sc create, and it's running. We go from RDP into a blank box to a working production system within 5 minutes. This is also how we've been able to completely sidestep the docker question, since no one could justify the added complexity once we got the equation down to: a single binary folder + a bare-ass Windows Server 2016/19 box = good to go. This is also enabled by .NET Core's Self-Contained Deployments, so SQLite isn't a free lunch all on its own.

Again, the above is not for everyone. We are responsible for maintaining production services running across many client sites, and needed a solution that could scale not only in terms of our clients' requirements, but also in terms of our abilities to manage all of these environments all at once. Managing a hosted SQL service requires direct access to each of these environments which has tremendous overhead for us due to the email back-and-forth remote access game that has to be played each time.

If you are in an environment where its just a single instance of the production/test/qa/development stack, I would absolutely push for a hosted SQL option like PostgreSQL/MySQL/SQL Server. In these cases you have a single (or 4) points of touch, and it is very reasonable and consistent and all within your control. If you find yourself in a situation similar to mine, I would still advocate for the hosted SQL by default unless you are a very brave soul and understand the risks of going a bit off the beaten path. Disclaimer aside, there really are some interesting treasures out here. If you can afford to waste a month, try doing your entire persistence layer in SQLite with JSON serialized business models where appropriate. Assuming you budgeted for complete failure, worst case you walk away with a new perspective. Better case, you end up in the position I find myself in now.

Just don't forget to set PRAGMA journal_mode=WAL; if you decide to go on the adventure. It makes orders of magnitude difference in concurrent write throughput (SQL Server was never this fast for us).

MySQL is simply garbage that shouldn't be used under any circumstance.

1. No transactional DDL 2. Implicit, confusing coercions 3. Terrible Unicode handling 4. the list goes on

1. Thanks to the new data dictionary in 8.0 there is light at the end of that tunnel ...

2. Yeah, MySQL tried to be "simple" and it took a while to be confident for changing it's default to being strict (the option for opting exists for ages ...) But meanwhile defaults are proper.

3. Especially the 3 byte Unicode type was a historic mistake. It reduced required storage space but didn't forsee the need for Emojis and other 4 byte sequences ... utf8mb4 charset fixes that though

4. ...


1. For small databases (up to a few GB) my personal preference in the past 5 years was MS SQL (Express Edition, free), then PostgreSQL and last is MySQL. That means the answer to your specific question, PostgreSQL. Why? That is the order of features, ease of use and performance in my experience, other people may have seen different.

For larger databases I have no experience with either PostgreSQL or MySQL, I was spoiled to use MS SQL for up to Terabyte size and that's the largest I've ever worked with.

MS SQL Server definitely has some important features that are missing from PG. Top on that list IMO is real stored procedures that can return multiple heterogeneous result-sets with simple select statements. Another big one is client side tools. Query optimization and schema flexibility (being able to alter things instead of dropping in re-creating) might still be not as good in PG as well.

You won’t get a ton of agreement from this crowd though, despite the majority of them probably never having used it.

Only this is not a way to select a DBMS. PostgreSQL adds important features much faster than competitors, it is free and much leaner and more flexible.

Free and leaner, yes. More flexible? That’s debatable. As I pointed out, it’s certainly not more flexible when it come to returning multiple result-sets from a single procedure.

I think that PG actually did just add a feature called stored procedures though. So maybe soon for returning multiple heterogeneous selects! It’s one feature that I’ve been waiting a long time for…

I chose SQL Server in many cases for it’s very high quality, support, superior ease of use and flexibility. Free and leaner weren’t very high on my list comparatively. You think those are the best characteristics to go by?

MS SQL Express Edition is also free and arguably more advanced than the free competition, I don't know a single way PostgreSQL is better. That does not make PostgreSQL bad at all, especially when you go beyond the limit of free MS SQL (10 GB per database) then PostgreSQL is my first choice.

You don't really benefit from MySQL speed if you are not Facebook, so I would always start with PostgreSQL.

We use PostgreSQL but also Firebird, which maybe is the great underdog in opensource databases.

While not quite on topic, let me second that Firebird is a fine and under-appreciated RDBMS. Though it can obviously be used to run on your own server, it really shines when it's deployed in the field. It's used for desktop software I help maintain, running at several hundreds of small businesses. The install is super lightweight, requires very little configuration and practically zero maintenance. Firebird strikes a nice balance when it comes to features. What's great is that it can be used with or without a server process. That way you can start using it like SQLite and scale up to a more PostgreSQL-like setup, or go the other way, with no effort. Many years later, we're still very satisfied with our choice at the time.

I used InterBase (Firebird predecessor) 15 years ago, and recall it had some big limits around versioning/lots of updates. To reclaim disk space we’d have to periodically backup/restore the db. It got so bad/frequent that we moved to Postgres and haven’t looked back since. I guess if one is considering SQLite that it’s not too relevant, but is this still an issue with modern Firebird?

Many of IB's limitations and bugs have been removed over the years. I think it was around FB2.5 that it really became a better product than IB ever was. A cool thing, though, is that it's very backward compatible. If you have applications that expect IB6, they will happily connect to any version of FB, even using the old client lib. The pace of development has ramped up since FB3, which was focused on rearchitecting the core. That's mostly a good thing. Still, I fear a little that it will also affect reliability.

Not sure exactly what kind of situation got you in trouble, but I haven't had any issues (ab)using the database myself. Disk space is still not reclaimed. It does, however, get used when the amount of data grows again, of course. Effectively the database is always the largest size it ever needed to be, but no larger. Most of the time, that shouldn't be any issue. If you expect huge spikes, there are other ways around it.

This entirely depends on the domain and data model, and unless you need SQL - mongodb should be a consideration too.

No. MongoDB has a track record of losing data and not actually providing durability. It is considered a joke in the professional community.

interesting: https://www.mongodb.com/jepsen

A lot of MongoDB users who wouldn't be considered a "joke" https://www.mongodb.com/who-uses-mongodb

Rather than listen to unqualified nonsense, they run tests. And gather facts to guide technology decisions. and yes, I work for MongoDB - just before anyone starts whining about disclosure You're welcome

I posted my problems with it as a response to a sibling comment. Since this was four or five years ago now, maybe all of this has been fixed, but that's not something I'm going to depend on in production any time soon.

Also, the MongoDB users you list must be taken with a grain of salt. For example, you list Facebook on that page. Nope. The description is "...adapted the storage engine API, extending MongoDB into new workloads and new capabilities." That is, Facebook acquired Parse, which used Mongo. Facebook ported MongoDB to run on RocksDB. Facebook shut down Parse. End of MongoDB at Facebook. You list Google, and the mouseover is basically that you can run MongoDB on GCE. I'm not saying that Mongo is unusual in this behavior. It's how most people listing "who uses X" build their lists.

Do you know if it is still happening in the newer versions (v3.6+)? If so, could you point to the references? I would love to learn more about the data loss issue.

Mongodb's new storage engine was written by Keith Bostic and Dr Michael Cahill.

They are database veterans. Wiredtiger is really nicely written.

It's been four years since I looked at it, but WiredTiger didn't replicate atomically under high write rates.

We had a master and replica running on the same machine and we were doing packet capture and inspection writing to the database. Our capture pipeline wrote to the master, and our system queried the replica so we didn't destroy the write rate. We were doing a zero allocation capture pipeline where the pages that the kernel passed back to us were already embedded in the skeleton of a BSON document that we then filled in and finally shoved straight onto the wire to the local Mongo master.

We started seeing records in the replica that were missing fields that were present on the master. These weren't eventual consistency things. Records are supposed to be replicated atomically, and these persisted in the system. We ended up having to run full database scans to find and clean them up.

Mongo claimed they fixed it twice. In neither case was it fixed. We migrated to a different system at that point. Maybe it's all wonderful now, but since MongoDB was a pain to use for other reasons, I have never felt the slightest inclination go back and try it again.

Well technically wiredtiger is not in charge of the replication.

That's fair. We didn't have any particular problems with WiredTiger itself.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact