Hacker News new | comments | show | ask | jobs | submit login

Postgres does not store keys in primary key order, actually, but in write order. This is an artifact of the MVCC model. There is a CLUSTER command that would do what you describe above, but it is (effectively) useless since you can't really run it without taking the database offline while the table gets rewritten.



You can do online clustering with pg_repack (https://github.com/reorg/pg_repack).

Given a table T, it will:

1. Create a new empty table T'.

2. Add a trigger on T that will clone any operations to T'.

3. Copy all rows from T to T' in clustering order.

4. Atomically switch the tables so that T' becomes T.

Unlike CLUSTER, the only lock required is in step 4, when it needs an exclusive lock on the table; it cannot do the switch until all current transactions are done with T (you can specify a timeout). The only other drawback is that that it does not support TRUNCATE, CREATE INDEX, or some ALTER TABLE operations being run while it's busy in step 3.

Also, unlike PostgreSQL's own CLUSTER command, it does not need an index to order by.

We run pg_repack regularly at night to keep our databases clustered. It works really well, though I really wish Postgres could get native online clustering. SQL Server has had automatic clustering for many years, as has DB2, and Oracle has index-ordered tables.


> though I really wish Postgres could get native online clustering.

Tell the community - pg_reorg could be cleaned up, improved, and merged into core... and should be!


I agree, although by native clustering I meant that the Postgres engine itself should support it, much like incremental vacuum was eventually built in.

pg_repack is a fork of pg_reorg (which has not been maintained since 2011), by the way, which has become the de facto replacement. No idea if the code is in need of cleanup or not, though.


The indexes, though, both benefit (locality) and suffer from (lock contention) such random identifiers.

That having been said, many data sets work with UUID just fine and the fact that such sets are easier to catenate and compare is very convenient.

Unless you actually need some sequential semantics to your records or are redlining a system where the locality of the index really helps, I prefer the non-coordinated assignment of IDs overall, as pvh points out in his slides.

Also, as other posters mention, uuid has sequential forms that can obviate some of this and reduce it to a 128-bit number...but the common form of a good-quality random one is really quite good enough until the indexes really start to get quite big, and this path is most useful if one remains disinterested in partitioning, which may be ill advised. Having to renumber (aka 'renaming', a form of one of the "two hard problems" in computer science...) everything is a big barrier to partitioning, so if one can afford to avoid renumbering, I think avoiding it is a good idea.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: