Hacker News new | past | comments | ask | show | jobs | submit login

Write amplification is a result of PostgreSQL's decision to not used clustered indexes, there's not much that can be done to avoid it without a massive redesign of the storage engine - though there are patches out there to reduce the penalty in some cases. In all reality though, Uber wanted a key-value store and not an RDBMS, MySQL was a better choice for this since InnoDB isn't much more than a fast K/V store (hence why MySQL uses clustered indexes).



> Write amplification is a result of PostgreSQL's decision to not used clustered indexes, there's not much that can be done to avoid it without a massive redesign of the storage engine

I don't think that's entirely accurate - the issue is more that indexes contain pointers to the heap position (simplified) of a tuple, rather than being indirect and pointing to the primary key, which then is resolved by another index (be that clustered / primary or not).

Updates already don't have to update indexes iff none of the indexed columns change (HOT - Heap-Only-Tuples). The proposed change (WARM - write amplification reduction method), allows to avoid updating indexes on non-changing columns, even if other indexes change.

https://www.postgresql.org/message-id/CABOikdMNy6yowA+wTGK9R...

> In all reality though, Uber wanted a key-value store and not an RDBMS

Agreed on that.


Not arguing with your assessment of Uber's requirements; but in general, why do you view InnoDB as not much more than a K/V store? And why do you equate clustered indexes with K/V stores?

InnoDB is a complex piece of software, supporting transactions, row-level locking, MVCC, schemas, secondary indexes, crash recovery, hot copy/backup, complex caching and buffering, many tunables, and extensive metrics visibility. Just because it's more appropriate for Uber's rather unusual EAV-like use-case, this doesn't mean InnoDB is a glorified K/V store.

Re: clustered indexes, it's a storage engine architecture choice with well-known trade-offs, both positive and negative. SQL Server also uses clustered indexes and is widely respected among database experts.

Regarding the topic overall, there are use-cases where Postgres is the best choice, and there are use-cases where it isn't. That doesn't inherently mean that other databases are uniformly worse. People like to trash MySQL, sometimes for completely valid reasons, but other times for FUD. But fwiw, several of the major features in Postgres 10 have already been supported in MySQL/InnoDB for a long time, in some cases for over a decade. Of course, that goes both ways; there are awesome major features that Postgres has had for a decade that MySQL still lacks.


Clustered indexes aren't an all-or-nothing choice. SQL Server allows heap tables without any clustered indexes, tables with a clustered index on the primary key, and tables clustered on a secondary index with the primary key as a non-clustered index. It is really nice to have all three of those options available.


> massive redesign of the storage engine

Have the Postgres thought about adding support for more than one storage engine? Then they could implement new ideas in a fork, an one could run them side-by-side and migrate over to it.

https://www.postgresql.org/message-id/4CB597FF.1010403@cheap...

For example MySQL had been mocked for its old ISAM storage engine. Then MySQL added InnoDB as another storage engine, the SQL interface is the same.


Pluggable storage engines for databases don't work that well in practice. Either you end up with the MySQL situation where the storage engine is so dumb that you can't push any smart optimizations into it (making having pluggable engines moot in the first place), or you have to write such a large interface that it's not worth providing.


That depends on what you mean by pluggable storage and how it's implemented ...

For example PostgreSQL supported custom index access methods, which you might see as a custom storage format (although only for secondary storage). You had to modify the source code and rebuild PostgreSQL, but there was a fairly clear separation / internal API that allowed that. Since PostgreSQL 9.6 you can do that without the custom build (i.e. you can create a new index in an extension and use CREATE ACCESS METHOD to plug it into the server).

We don't have such clear internal separation for the primary storage, but it's hard to ignore the possible benefits of alternative storage types. Another reason is that we're constantly scavenging for free bits in various places (e.g. flags in tuple headers needed by new features etc), and allowing multiple formats would help with this by supporting "old" and "new" table format. So it'll likely follow what happened to indexes - build a clear internal API, allow multiple storage formats internally, eventually make it usable from extensions.

(These are just my personal opinions, of course.)


Yes. But it's not a settled matter. The storage engines in mysql didn't work out that well, each of them duplicating a lot of work, having significantly different behavior, ...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: