Hacker Newsnew | comments | show | ask | jobs | submitlogin
Postgres: The Bits You Haven't Found (postgres-bits.herokuapp.com)
216 points by craigkerstiens 495 days ago | comments


parfe 495 days ago | link

The slides recommend UUIDs as a primary key ("just use UUIDs. seriously"). I took a look at the manual and there isn't too much of a Why (as well as 4 different versions of UUID to choose from). Anyone have a write up that explores UUIDs vs integer primary keys? I was only able to find people asking permission ("I have X with a UUID, can I use it as a pk?) rather than people talking about Why you would prefer UUID as the default.

Sidenote: Literally any button press you could reasonably expect to move to another slide works. PgDown, Right, Down, Space. Stop shitting up a decent submission to whine about it.

-----

pvh 495 days ago | link

The biggest reason is that your IDs become universally unique - across shards, across database recoveries, rollbacks and session problems, you name it. These are all the kinds of things that can happen over the lifespan of a dataset.

In Postgres, UUIDs are stored in 128 bits, so it's hardly any overhead in almost all use cases, so if you can save yourself a day or two, or three, or a week of work just by switching your mental default from an integer to a UUID, that's a pretty big win.

-----

gingerlime 494 days ago | link

I see some very good points were already made, but I think UUIDs are also good for at least three other reasons:

1. Offline / sync - you can have out-of-sync or offline clients (e.g mobile apps) inserting new data without worrying about pk mismatch or collisions.

2. security - it's obviously a bit of security through obscurity, but not being able to guess the key of an object can be very useful some time. It shouldn't be relied on as a first line of defence, but it gives you a naturally impossible-to-guess reference. If you want to avoid someone crawling through your image-base or any object that is generally accessible, but you want to avoid having it accessible in-bulk, UUID is your friend.

3. Silly mistakes - If you pass an object id, you really want to know it can only be for one object. Maybe the front-end is sending an id of one object, but the backend is working on another (shit happens). When you use sequential ids, you increase the chance of weird stuff going on. With UUIDs you simply can't use the wrong id, even by mistake.

-----

FooBarWidget 495 days ago | link

I've found UUID and other kinds of random primary keys to be very bad for performance once the data set no longer fits into memory. When that happens, the working set of the primary key index becomes that of the entire index, so you end up with a lot of random disk I/O. Unless you have SSDs, your database performance then goes down the drain.

Of course you can also use a UUID that's less random, e.g. something with an incrementing first part and a small random last part.

-----

jacques_chester 494 days ago | link

If you want UUIDs to double as a nonce (I use them this way), then you will need to use v4 UUIDs and yes, they are basically impossible to index because they should be normally distributed across a very large range of possibilities.

If you want incrementing UUIDs, use v1 UUIDs, which are based MAC address + a timestamp. These change in a predictable way and are more indexable.

-----

jacques_chester 494 days ago | link

s/normally distributed/uniformly distributed/

This is what happens when I try to sound cleverer than I actually am. :(

-----

sunjain 495 days ago | link

Using UUID(or anything other than sequentially increasing numbers) for primary keys is a good idea because when you have high number of concurrent inserts happening, if you have a sequentially increasing primary key(implemented via a b-tree index), the inserts into index will result in hotspot(as all the inserts will be at the end, in the same place). Whereas if it is random number (like UUID), the entry into indexes will happen at different places thus avoiding hot spots.

-----

sandGorgon 495 days ago | link

isnt a hotspot a good thing? since it is localized data, ergo less cache misses. is the complexity of an insert >>> cache miss penalty.

-----

Someone 495 days ago | link

Yes, but with many treads modifying the same structures, you end up doing a lot of locking.

So, your system would not be memory bandwidtht constrained, not because it has many threads using roughly the same memory blocks, but because it has one thread at a time working on those memory blocks.

-----

helper 495 days ago | link

You will find that UUIDs are often used as primary keys in distributed systems. By their nature you can generate them independently without worrying about collision and without any coordination overhead (e.g. querying a central authority).

-----

abalone 495 days ago | link

Instagram (a big postgres user) blogged about it. The main reasons they didn't use UUIDs: storage requirements and lack of a natural sort.

http://instagram-engineering.tumblr.com/post/10853187575/sha...

-----

saurik 495 days ago | link

Sort was a design requirement, but not a reason not to use UUIDs, which are trivially sorted by date (in fact, Cassandra not only does this, but in many contexts actually renders UUIDs as timestamps). So, while that rules out various other implementations, the issue with UDID was just the size.

What they ended up with was pretty much UUID but for a more limited deployment: the real issue is just that UUID is extreme overkill, designed to allow every individual user to generate unique identifiers (something that sounds cool, but in practice is worthless as you can't trust the clients).

-----

stock_toaster 495 days ago | link

> Sort was a design requirement, but not a reason not to use UUIDs, which are trivially sorted by date

Assuming UUID v1 right?

-----

saurik 494 days ago | link

Correct. However, that is both the generally fair assumption when dealing with use cases that actually care about UUIDs, and is the fair thing to compare Snowflake against, as Snowflake is pretty much a 64-bit limited-entropy implementation of the general concept of a v1 UUID.

-----

pvh 493 days ago | link

The Instagram guys said they didn't actually bench using a UUID + created_at column when they spoke at SFPUG, they just used their approach because it was recommended to them and seemed to work. That said, it is a pretty awesome robust mechanism, it's just arguably complex and not proven to be necessary even at their scale.

-----

jokull 494 days ago | link

I was under the impression they went with a home baked id generator was to be able to encode the logical shard id into the id. I think UUID 1 encodes info about the hardware (mac address), but their data changes physical hardware as they split their data up to new shards.

-----

saurik 493 days ago | link

Where the data ends up is not important to its identifier: the reason to have a "logical shard ID" is to provide for sequence and timestamp uniqueness.

The way to think about the problem is that at the granularity of your timestamp you lose the ability to uniquely generate identifiers across multiple nodes (on the single node, this is handled with the sequence number), so you need some kind of identifier for the instance of the generator itself: with UUIDs, they spend a relatively immense number of bits storing the computer's MAC, but if you know you know you only have a smaller-scale deployment and are able/willing to centrally plan identifiers for the deployed nodes, you can get buy with something akin to this "shard ID".

(BTW, note that a v1 UUID need not be generated with a true MAC: the specification both notes that you can just by node addresses from them for a fairly small cost, or to use fake addresses that are marked as such by setting the multicast bit. The result is that if you want to use UUID v1 off-the-shelf with "shard IDs" you are more than welcome to do so, not just in the "analogously-equivalent" sense but in the "to the letter of the spec" sense. You can find more information in Section 4.5 of the UUID specification.)

-----

abalone 487 days ago | link

Actually they also noted that it helps with easier mapping. By including the logical shard ID in the ID, they don't need to keep a giant index of IDs-to-shards to figure out which machine an ID lives on. Just a tiny mapping of logical to physical shards, which every app server instance can cache in memory.

One less moving part at the expense of wedding yourself to a prepartitioned logical shard scheme, I guess. (I wonder how painful it would be to rebucket data into a different logical shard should the need arise...)

-----

eksith 494 days ago | link

Actually don't use UUIDs.

We use something very similar to what Instagram does with Postgres : http://instagram-engineering.tumblr.com/post/10853187575/sha...

This allows far greater flexibility with regard to sharding and scale. Bigint with a function generated nextval will sustain up to 9223372036854775807. If you need more than that, your next option is Numeric, but I don't think we'll exhaust bigint any time soon.

-----

jacques_chester 494 days ago | link

You still need to correctly merge the records for aggregate reporting.

If you use UUIDs, this is easy. If you used SERIAL, you now need a scheme to intelligently merge records. Not hard when everything is in a single table. A bit tougher when you have a wholly or partly normalised schema, because now you need to come up with correct renumbering schemes for every single table which has another table holding a foreign key into it.

I can tell you from personal experience that this is not fun.

v1 UUIDs are "guaranteed" to be unique, modulo people fiddling with your MAC addresses. v4 UUIDs are random keys and the odds of collision are so infinitesimal that it's worthless to fuss about it. v4s also have the nice property that they can double as nonces, but the sucky problem that they are not that great for indexes.

Then there's stuff like replaying logs reliably on multiple machines, blah blah blah. Every instance of 186360 is indistinguishable from every other instance of 186360. So you will need additional logic to keep them straight.

But take this uuid:

69c8e822-83b0-11e2-a6a5-001f5bfffe14

The odds that this has been created anywhere else, at any point, is as good as zero. You're more likely to have an integer key collision due to cosmic radiation flipping some bits.

So no special merging / field-source tracking logic is required.

-----

eksith 493 days ago | link

I can't be too sure of the specifics, but our db admins decided against UUIDs after a few days debating and lots of coffee.

We aren't using serial, but instead using functions to generate our own nextvals with the server IDs, epoch etc..., kinda the same way as Instagram (though, obviously our databases are no where near the same size). It came down to how much logic we were comfortable implementing in the schema from the start to avoid continuing complexity in the applications themselves.

-----

jokull 494 days ago | link

I’m not sure I understand. The way Instagram do it, they ensure uniqueness across shards so reporting should not have to merge anything.

-----

malkia 494 days ago | link

I guess the point is, that you don't need to "ensure uniqueness across..."

-----

ExpiredLink 495 days ago | link

UUIDs are used as surrogate keys: http://en.wikipedia.org/wiki/Surrogate_key

-----

parfe 495 days ago | link

In a way that an integer is not?

-----

ExpiredLink 494 days ago | link

I may have misunderstood the question. A UUID is a (very long) integer.

-----

chris_wot 494 days ago | link

Integer values are also used as surrogate values. Use of a natural key, IMO, is bad practice.

-----

e12e 494 days ago | link

Why do you think natural key(s) (w/index(es) as needed) are bad practice? Data model refactoring?

-----

chris_wot 494 days ago | link

A natural key as a primary key is terrible practice because I've never even once seen a natural key that wasn't tied to business logic. Logic that potentially can change.

Don't do it, you only make it worse for yourself if you pick a natural key for a PK!

-----

bjourne 494 days ago | link

> caveat: WITH expressions are optimization boundaries

This is a pretty big caveat and one of the rare areas in which postgres does worse than other database systems. In SQL Server a non-recursive common table expression is treated by the optimizer similar to a macro - You can break up a complicated query with unions and group bys into easier to read cte:s and be confident that the optimizer will piece them all together into a query whose execution plan is equal to the original one.

But with postgres you can't because of the optimization boundaries. Its optimizer will attempt to make each cte as individually efficient as possible which can lead to a much worse execution plan overall. You can use views instead which, in contrast to cte:s, postgres optimizer can see through.

-----

gleb 494 days ago | link

It's more subtle than "worse." Apparently CTE optimization fence is part of SQL standard, and it can be useful to hand-optimize queries.

But you are right, when using CTEs for readability it tends to hurt more than help.

Interesting thread about being able to control this: http://www.postgresql.org/message-id/201209191305.44674.db@k...

-----

bjourne 494 days ago | link

Yeah, I've also heard that claim being made on the mailing list but it is probably wrong: http://dba.stackexchange.com/questions/27425/is-the-optimisa.... For writable cte:s it matters, for read-only ones the number of execution times doesn't.

-----

pvh 493 days ago | link

Sorta. It can be handy for guiding the optimizer away from a bad optimization choice but that's really a bad habit to get into.

-----

bsg75 494 days ago | link

Are there more details about this? As a regular CTE use is both Postgres and MSSQL, but was unaware of this difference, I would like to know where I may be shooting performance to hell.

-----

lobster_johnson 494 days ago | link

From the docs (http://www.postgresql.org/docs/9.2/static/queries-with.html):

> A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

In other words, something like:

    with x as (select * from users) select * from x limit 1
will be executed identically to

    select * from users limit 1
However, a query like this:

    with x as (
      select * from users
      join images on images.id = users.image_id
    )
    select * from x limit 1
will incur a huge overhead compared to:

    select * from users
    join images on images.id = users.image_id
    limit 1
In my test, the "WITH" query had a cost of 97840 and touched 8881 buffers, compared to a cost of 132 and 41 buffers for the second query.

In other words, the planner pretty much considers the "WITH" subquery separately from the outer query, and plans them separately.

-----

stox 495 days ago | link

One note, Postgres is well over 18 years old. Postgres using SQL is 18 years old. Postgres had its own query language which was replaced with SQL in Postgres95, which in turn became PostgreSQL.

-----

rgbrenner 494 days ago | link

PostgreSQL evolved from the Ingres project at the University of California, Berkeley. In 1982, the project leader, Michael Stonebraker, left Berkeley to make a proprietary version of Ingres. He returned to Berkeley in 1985 and started a post-Ingres project.

Starting in 1986, the team published a number of papers describing the basis of the system, and by 1988 had a prototype version. The team released version 1 to a small number of users in June 1989

http://en.wikipedia.org/wiki/PostgreSQL#History

-----

pvh 495 days ago | link

Navigate with arrow keys - these are slides from a talk I gave.

-----

untog 495 days ago | link

I can't find the arrow keys on my phone :/

-----

njharman 495 days ago | link

They are renamed "swipe" on mobile devices.

-----

untog 495 days ago | link

Definitely doesn't work on Android Chrome, at least.

-----

melpomene 494 days ago | link

Workng on 4.0.2 with Chrome for me. Landscape mode and then swipe.

-----

yareally 495 days ago | link

Some keyboards on mobile have arrow keys. I know Swype does and so does swiftkey.

edit: couldn't get the keyboard to pop up on Android, but putting it into landscape mode worked swiping slides (on the stock browser with Android 4.2 [not Chrome]).

-----

yangyang 494 days ago | link

So does Hacker's Keyboard, along with ctrl, alt and a load of other useful stuff: https://play.google.com/store/apps/details?id=org.pocketwork...

-----

weaksauce 494 days ago | link

Tap works well on iOS.

-----

thepumpkin1979 495 days ago | link

In case someone is interested, I've created a ruby gem that uses hstore as backing store of multi-language text fields for Rails models. Contributions are welcome -> https://github.com/firebaseco/multilang-hstore

-----

ozataman 495 days ago | link

How were the slides generated? Looks pretty slick.

-----

phillmv 494 days ago | link

Note from someone who organizes conferences:

Do NOT use any slide deck framework that doesn't have a solid export function - jpgs, pdfs, whatever.

The videographer who will be editing your video will not be able to deal with your esoteric format - and screengrabbing every slide is horrible busy work.

-----

craigkerstiens 494 days ago | link

showoff can generate PDF's just fine

-----

pvh 494 days ago | link

awesome, would you generate a PDF of my slides then?

-----

pvh 495 days ago | link

Using Showoff, like this: https://github.com/pvh/postgres-bits

-----

mcintyre1994 495 days ago | link

How do I move through the slides? Zooming out suggests there are more slides, but there's nothing to advance through them.

Chrome console:

Failed to load resource: the server responded with a status of 500 (Internal Server Error) http://postgres-bits.herokuapp.com/js/sh_lang/sh_bash.min.js

Uncaught HTTP error: status 500 sh_main.min.js:4

-----

znowi 494 days ago | link

Yes, the bash highlight is missing and the showoff handler kicks in instead resulting in 500. Thanks to the dev mode you get complete stack trace with code preview :)

-----

craigkerstiens 495 days ago | link

You should be able to progress through the slides by using the arrow keys.

-----

mcintyre1994 495 days ago | link

I admit that was a stupid thing to miss, it works fine, sorry. The Javascript error persists though, not sure if that's affecting anything, but there's nothing noticeable. Works fine for me on Android (default browser JB) too.

-----

lysium 495 days ago | link

Interesting read! I assume, this is all not portable? Or is there similar functionality for other DB, say MySQL (not that I am aware of) or Oracle?

-----

pvh 494 days ago | link

Some of it is in the standard, some of it is PG specific. MySQL is pretty terrible and doesn't support any of it that I'm aware of. I expect Oracle has their own versions of much of this stuff, but I've made it a hobby in my life to not learn Oracle.

-----

ibejoeb 494 days ago | link

Oracle has subquery factoring (the "with" clause) and recursion; native arrays, hash maps, collections, nested tables, and other complex data types; remote database links; queues (similar to listen/notify); regex querying; attribute-level timezone support; extremely powerful window functions; text search; tons of built-in crypto (if you really insist); and most of the related things on this list.

The biggest killers (vs. postgres): no simple helpers (generate_series, for example, is achievable with simple, efficient recursion, but you have to write the whole statement); a truly antiquated client program (sqlplus); built-in json, but you do have associative arrays/hash maps and nested tables, so you can achieve exactly the same thing, and it's indexable*; and range types (meh imo).

-----

fusiongyro 494 days ago | link

It's worth noting that using some of these features bump you up into a different pricing tier. And last time I checked the usual way to achieve a recursive query was with the non-standard "CONNECT BY" syntax. I have found rlwrap does a lot to mitigate the pain of sqlplus, but I still use it (and Oracle) as little as I can possibly get away with.

-----

mercurial 494 days ago | link

Even with rlwrap, sqlplus is awful. However, you can try SQL developer, which is fairly decent GUI frontend:

http://www.oracle.com/technetwork/developer-tools/sql-develo...

-----

mickeyp 494 days ago | link

Or if you're an Emacs user you can use sqlplus.el, a very nice wrapper around SQL*Plus.

-----

fusiongyro 493 days ago | link

I actually wind up using both. But SQL Developer is pretty heavy and I prefer using psql to a GUI tool with Postgres, so I wind up in sqlplus a lot (when dealing with Oracle, which is seldom).

-----

jacques_chester 494 days ago | link

I think I've had Oracle blow up on multiple AS statements, so that's one place where Postgres would come out well in front.

Basically WITH-AS lets you declared temporary named subqueries. That is, a pretty stock standard programming technique -- give a name to a fiddly expression to simplify subsequent code. Very useful.

Oracle has lots of annoyances though. No SERIAL or IDENTITY type. Having to write the same boilerplate sequence/insert trigger/update trigger over and over and over is very tedious and a recipe for subtle errors when you forget one of them or mess it up.

No boolean type in SQL. I don't even know what to say about that.

-----

jfb 494 days ago | link

"No boolean type in SQL. I don't even know what to say about that."

Tom ties himself into unbelievable knots trying to defend this. It's the purest form of technical Stockholm Syndrome I've ever seen from a putatively intelligent person.

-----

e12e 494 days ago | link

I'm not very familiar with it, but the big Free SQL Server that everyone seems to keep forgetting, Firebird, appears to have at least some of these:

  http://www.firebirdsql.org/en/sql-conformance/
Whenever I think of Firebird, i get a little bit of a bad conscience because I've never really played with it :-)

-----

knightni 494 days ago | link

WITH and window functions are portable (and both awesome - they will change the way you write your SQL for the better). I'm not sure if WITH RECURSIVE is portable - my recollection is that oracle and db2 at least don't need/use the RECURSIVE qualifier for recursive ctes: they just work it out for you.

Time intervals are also in the SQL standard as I recall (although not in the format listed here - postgres supports standard syntax too though). Not sure how fully supported it is cross-database, though.

-----

pdog 495 days ago | link

The slides recommend using UUIDs (and not "numbers") as the primary key:

    CREATE TABLE t (
      uuid uuid PRIMARY KEY 
                DEFAULT uuid_generate_v4(), 
      name text);
I understand that rows are stored physically in primary key order. Any idea if the "uuid-ossp" module, used to generate UUID primary keys, ensures that new rows are written sequentially on the disk after existing rows (which helps for both read- and write-locality)?

-----

pvh 495 days ago | link

Postgres does not store keys in primary key order, actually, but in write order. This is an artifact of the MVCC model. There is a CLUSTER command that would do what you describe above, but it is (effectively) useless since you can't really run it without taking the database offline while the table gets rewritten.

-----

lobster_johnson 494 days ago | link

You can do online clustering with pg_repack (https://github.com/reorg/pg_repack).

Given a table T, it will:

1. Create a new empty table T'.

2. Add a trigger on T that will clone any operations to T'.

3. Copy all rows from T to T' in clustering order.

4. Atomically switch the tables so that T' becomes T.

Unlike CLUSTER, the only lock required is in step 4, when it needs an exclusive lock on the table; it cannot do the switch until all current transactions are done with T (you can specify a timeout). The only other drawback is that that it does not support TRUNCATE, CREATE INDEX, or some ALTER TABLE operations being run while it's busy in step 3.

Also, unlike PostgreSQL's own CLUSTER command, it does not need an index to order by.

We run pg_repack regularly at night to keep our databases clustered. It works really well, though I really wish Postgres could get native online clustering. SQL Server has had automatic clustering for many years, as has DB2, and Oracle has index-ordered tables.

-----

pvh 494 days ago | link

> though I really wish Postgres could get native online clustering.

Tell the community - pg_reorg could be cleaned up, improved, and merged into core... and should be!

-----

lobster_johnson 494 days ago | link

I agree, although by native clustering I meant that the Postgres engine itself should support it, much like incremental vacuum was eventually built in.

pg_repack is a fork of pg_reorg (which has not been maintained since 2011), by the way, which has become the de facto replacement. No idea if the code is in need of cleanup or not, though.

-----

fdr 495 days ago | link

The indexes, though, both benefit (locality) and suffer from (lock contention) such random identifiers.

That having been said, many data sets work with UUID just fine and the fact that such sets are easier to catenate and compare is very convenient.

Unless you actually need some sequential semantics to your records or are redlining a system where the locality of the index really helps, I prefer the non-coordinated assignment of IDs overall, as pvh points out in his slides.

Also, as other posters mention, uuid has sequential forms that can obviate some of this and reduce it to a 128-bit number...but the common form of a good-quality random one is really quite good enough until the indexes really start to get quite big, and this path is most useful if one remains disinterested in partitioning, which may be ill advised. Having to renumber (aka 'renaming', a form of one of the "two hard problems" in computer science...) everything is a big barrier to partitioning, so if one can afford to avoid renumbering, I think avoiding it is a good idea.

-----

alexanderh 494 days ago | link

Can you re-order columns yet?

http://wiki.postgresql.org/wiki/Alter_column_position

18 years, and still, nobodys implemented this yet?

Its not a total deal breaker, but jeebus christopher columbus christ. It certainly would be nice.

-----

ww520 494 days ago | link

Can you just specify the order in your query? Why do you care how the physical order is stored? Or do you even care about the physical order? Would an "apparent" logical order be sufficient?

-----

alexanderh 494 days ago | link

If you're using postgresql strictly as a programming data store, its fine. Your app will certainly abstract away the ordering of the columns

Its just nice though, if you often find yourself working on raw tables using one of the many Admin tools. Sometimes you just need it to work like Excel to be productive. MySQL makes all of that trivial. PostgreSQL obviously keeps track of the order you created the columns, so being able to edit this ordering seems trivial in my mind.

It just seems odd that there certainly is some demand there for it, yet nobodys picked up the slack in 18 years. Obviously there are workarounds. Why dont they include the work arounds as scripts with the distrobution, etc?

Maybe I will try to implement something and submit a patch. No experience in database dev though :(

Like I said though, its not a deal breaker. I'm just used to Open Source projects over-implementing features. Not under-implementing them. Especially after 18 years.

-----

ww520 494 days ago | link

So you only care about the logical ordering of the columns. It seems trivial then to add a column mapping layer in the metadata to map the user-defined apparent order to the physical order so that select * would return the order you want.

In relational algebra the ordering of the rows and the ordering of the columns are undefined. It's up to the database to arrange it in the best way possible. The ordering of rows and columns is specified when queried.

-----

e12e 494 days ago | link

  > Its just nice though, if you often find yourself working on raw tables using one of the many Admin tools. Sometimes you just need it to work like Excel to be productive.
Clearly this is a bug in the admin tool? I seem to recall MS SQL allows you to "drag columns around" -- obviously doing absolutely nothing to the database -- it's just a view. After all rows are just relational tuples -- they have no ordering.

-----

alexanderh 494 days ago | link

True, a lot of the tooling for PostgreSQL is in its infancy when compared to something like MySQL. This problem definitely could be abstracted away by the tool. But because PostgreSQL doesnt support it natively, a side effect has been that many of the popular tools, comparable to ones in MySQL, dont have this feature either.

Anyone have any good suggestions for web frontend to PostgreSQL that supports reordering of columns as easily as MySQL? phpPgAdmin doesnt seem to compare to phpMyAdmin.

Admittedly I probably should be using a more robust OS native PostgreSQL admin tool, and not a web fronted. But alas, all of this is solved with MySQL quite nicely. I would like to see PostgreSQL catch up.

-----

jaytaylor 494 days ago | link

Sure it'd be nice.. but really, what's the difference? The sooner you stop worrying about that the sooner you can start building awesome shit with it!

-----

JoachimSchipper 494 days ago | link

Not the most elegant solution, but a view plus some triggers will let you create a fully functional "table" with any column order you please.

-----

jacques_chester 494 days ago | link

Oracle and SQL Server can't do this either.

Most databases are, by default, row stores. Moving columns around on disk kinda sucks in such situations, and there's no need to when you can specify results in any order by naming fields.

You name fields in your queries, right? Right?

-----

alexanderh 494 days ago | link

I know, I know. Its a minor gripe. But think about how trivial something like this would be to implement. And with the power of an Open Source project as big as Postegre, just seems odd nobody's had a weekend to knock it out. Here we are discussing all these cool advanced features, and it doesnt even have something as simple as this :P

I'll certainly try, if I thought they'd ever accept a patch from a total newb.

PostgreSQL obviously keeps track of the column order, as far as the order you created them in goes. So being able to edit this order doesnt seem like that tall of an order. Would be a pretty convenient feature that many other DB's do infact have.

Its just a nicety that PostgreSQL should have, if its hoping to win over people who are familiar with MySQL, which does sorta seem like its goal these days.

-----

pilif 494 days ago | link

Trivial? The rows are ordered in creation order because that's how they are stored on disk. So to be able to move columns around, you either have to rewrite the whole table (needing a long-lasting exclusive lock which most of the schematic altering operations in postgres do not require) or you introduce another abstraction to keep track of the order which you then have to use for every query to get the correct position within the row you just read.

This always means more complexity and a performance overhead and frankly provides next to zero value because columns are named in queries anyways (or the order becomes meaningless when joining)

-----

jacques_chester 494 days ago | link

I'm going to guess that MySQL can do this largely as a side-effect of swappable engines.

There is, however, a lot to be said for close coupling between query planning, on-disk format, indexing and so forth.

edit: as for patching, the PostgreSQL source is some of the best I've ever read. Carefully organised, fastidiously documented, immaculately consistent to coding standards. It's a delight.

-----

bsg75 494 days ago | link

Where is this an issue, where you dont just specify column order in a query? If from something like Excel, why not use a passthrough query.

Does any major RDBMS support this?

-----

michaelmior 495 days ago | link

Fantastic! I'm mostly a MySQL user, but the more I read about Postgres, the more I like.

-----

jaytaylor 494 days ago | link

I used to be really, really into MySQL, but since getting into Postgres I've come around to preferring it. They're both good.. but anecdotally Postgres is more developer friendly and generally easier to deal with in production. Postgres also comes with features that blow MySQL away in terms of flexibility and letting you do truly cool stuff. One example: Advanced query optimization with partial or reverse indexes.

-----

crypto5 494 days ago | link

One of the things I am missing in postgresql is compressed tables. It's actually big game changer in some cases.

-----

throwawayG9 495 days ago | link

I moved away from MySQL to PostgreSQL recently, you can do it too =). Just be sure to read about its differences and how to tune it for performance.

-----

michaelmior 494 days ago | link

That's my biggest concern about making the switch. I've gotten used to the performance characteristics of MySQL and some of it's quirks. Some small side projects on Heroku have been pushing me to learn more Postgres and perhaps I'll use it for something more important soon.

-----

thomseddon 495 days ago | link

Shame you can't view it on mobile (iPhone), saved for later.

-----

jnazario 495 days ago | link

really really great stuff, thank you for posting. i had no idea about a lot of these. and you note that the source is still pretty clean after all these years.

-----

Ensorceled 494 days ago | link

It looks like dozens, maybe hundreds, of people all had problem figuring out how to see the next slide.

Do the math, that's, cumulatively, probably an hour or two of life wasted perhaps more.

I expect webpages to have a mouse based interface, not a keyboard based interface and I'm not alone. It's not churlish of us to complain about our wasted time.

Interesting! And, as a postgresql lover, thanks for posting.

-----

pvh 494 days ago | link

Glad you enjoyed the slides. I put a comment about navigation on the first slide. Hopefully that will help.

-----

Ensorceled 494 days ago | link

Me too!

I absolutely learned a ton from it ... I hope other Postgres fans didn't just walk away.

-----

np422 495 days ago | link

This was a really good presentation, I've worked with postgres and enterprisedb on and off for many years but I still managed to learn a few new things.

-----

lysium 495 days ago | link

Hm, how is that site supposed to work? Looks like there should be slide, but I can only see the title page.

-----

More



Guidelines | FAQ | Lists | RSS | Bookmarklet | DMCA | News News | Bugs and Feature Requests | Y Combinator | Apply | Library | Contact

Search: