
Postgres: The Bits You Haven't Found - craigkerstiens
http://postgres-bits.herokuapp.com/#1
======
parfe
The slides recommend UUIDs as a primary key ("just use UUIDs. seriously"). I
took a look at the manual and there isn't too much of a Why (as well as 4
different versions of UUID to choose from). Anyone have a write up that
explores UUIDs vs integer primary keys? I was only able to find people asking
permission ("I have X with a UUID, can I use it as a pk?) rather than people
talking about Why you would prefer UUID as the default.

Sidenote: Literally any button press you could reasonably expect to move to
another slide works. PgDown, Right, Down, Space. Stop shitting up a decent
submission to whine about it.

~~~
ExpiredLink
UUIDs are used as surrogate keys: <http://en.wikipedia.org/wiki/Surrogate_key>

~~~
chris_wot
Integer values are also used as surrogate values. Use of a natural key, IMO,
is bad practice.

~~~
e12e
Why do you think natural key(s) (w/index(es) as needed) are bad practice? Data
model refactoring?

~~~
chris_wot
A natural key as a primary key is terrible practice because I've never even
once seen a natural key that wasn't tied to business logic. Logic that
potentially can change.

Don't do it, you only make it worse for yourself if you pick a natural key for
a PK!

------
stox
One note, Postgres is well over 18 years old. Postgres using SQL is 18 years
old. Postgres had its own query language which was replaced with SQL in
Postgres95, which in turn became PostgreSQL.

~~~
rgbrenner
_PostgreSQL evolved from the Ingres project at the University of California,
Berkeley. In 1982, the project leader, Michael Stonebraker, left Berkeley to
make a proprietary version of Ingres. He returned to Berkeley in 1985 and
started a post-Ingres project.

Starting in 1986, the team published a number of papers describing the basis
of the system, and by 1988 had a prototype version. The team released version
1 to a small number of users in June 1989_

<http://en.wikipedia.org/wiki/PostgreSQL#History>

------
bjourne
> caveat: WITH expressions are optimization boundaries

This is a pretty big caveat and one of the rare areas in which postgres does
worse than other database systems. In SQL Server a non-recursive common table
expression is treated by the optimizer similar to a macro - You can break up a
complicated query with unions and group bys into easier to read cte:s and be
confident that the optimizer will piece them all together into a query whose
execution plan is equal to the original one.

But with postgres you can't because of the optimization boundaries. Its
optimizer will attempt to make each cte as individually efficient as possible
which can lead to a much worse execution plan overall. You can use views
instead which, in contrast to cte:s, postgres optimizer can see through.

~~~
gleb
It's more subtle than "worse." Apparently CTE optimization fence is part of
SQL standard, and it can be useful to hand-optimize queries.

But you are right, when using CTEs for readability it tends to hurt more than
help.

Interesting thread about being able to control this:
[http://www.postgresql.org/message-
id/201209191305.44674.db@k...](http://www.postgresql.org/message-
id/201209191305.44674.db@kavod.com)

~~~
bjourne
Yeah, I've also heard that claim being made on the mailing list but it is
probably wrong: [http://dba.stackexchange.com/questions/27425/is-the-
optimisa...](http://dba.stackexchange.com/questions/27425/is-the-optimisation-
fence-behaviour-of-a-cte-with-query-specified-in-the-sql2). For writable cte:s
it matters, for read-only ones the number of execution times doesn't.

~~~
pvh
Sorta. It can be handy for guiding the optimizer away from a bad optimization
choice but that's really a bad habit to get into.

------
pvh
Navigate with arrow keys - these are slides from a talk I gave.

~~~
untog
I can't find the arrow keys on my phone :/

~~~
njharman
They are renamed "swipe" on mobile devices.

~~~
untog
Definitely doesn't work on Android Chrome, at least.

~~~
melpomene
Workng on 4.0.2 with Chrome for me. Landscape mode and then swipe.

------
lysium
Interesting read! I assume, this is all not portable? Or is there similar
functionality for other DB, say MySQL (not that I am aware of) or Oracle?

~~~
pvh
Some of it is in the standard, some of it is PG specific. MySQL is pretty
terrible and doesn't support any of it that I'm aware of. I expect Oracle has
their own versions of much of this stuff, but I've made it a hobby in my life
to not learn Oracle.

~~~
ibejoeb
Oracle has subquery factoring (the "with" clause) and recursion; native
arrays, hash maps, collections, nested tables, and other complex data types;
remote database links; queues (similar to listen/notify); regex querying;
attribute-level timezone support; extremely powerful window functions; text
search; tons of built-in crypto (if you really insist); and most of the
related things on this list.

The biggest killers (vs. postgres): no simple helpers (generate_series, for
example, is achievable with simple, efficient recursion, but you have to write
the whole statement); a truly antiquated client program (sql _plus); built-in
json, but you do have associative arrays/hash maps and nested tables, so you
can achieve exactly the same thing, and it's_ indexable*; and range types (meh
imo).

~~~
jacques_chester
I think I've had Oracle blow up on multiple AS statements, so that's one place
where Postgres would come out well in front.

Basically WITH-AS lets you declared temporary named subqueries. That is, a
pretty stock standard programming technique -- give a name to a fiddly
expression to simplify subsequent code. Very useful.

Oracle has lots of annoyances though. No SERIAL or IDENTITY type. Having to
write the same boilerplate sequence/insert trigger/update trigger over and
over and over is very tedious and a recipe for subtle errors when you forget
one of them or mess it up.

No boolean type in SQL. I don't even know what to say about that.

~~~
jfb
"No boolean type in SQL. I don't even know what to say about that."

Tom ties himself into unbelievable knots trying to defend this. It's the
purest form of technical Stockholm Syndrome I've ever seen from a putatively
intelligent person.

------
mcintyre1994
How do I move through the slides? Zooming out suggests there are more slides,
but there's nothing to advance through them.

Chrome console:

Failed to load resource: the server responded with a status of 500 (Internal
Server Error) <http://postgres-bits.herokuapp.com/js/sh_lang/sh_bash.min.js>

Uncaught HTTP error: status 500 sh_main.min.js:4

~~~
craigkerstiens
You should be able to progress through the slides by using the arrow keys.

~~~
mcintyre1994
I admit that was a stupid thing to miss, it works fine, sorry. The Javascript
error persists though, not sure if that's affecting anything, but there's
nothing noticeable. Works fine for me on Android (default browser JB) too.

------
thepumpkin1979
In case someone is interested, I've created a ruby gem that uses hstore as
backing store of multi-language text fields for Rails models. Contributions
are welcome -> <https://github.com/firebaseco/multilang-hstore>

------
pdog
The slides recommend using UUIDs (and not "numbers") as the primary key:

    
    
        CREATE TABLE t (
          uuid uuid PRIMARY KEY 
                    DEFAULT uuid_generate_v4(), 
          name text);
    

I understand that rows are stored physically in primary key order. Any idea if
the "uuid-ossp" module, used to generate UUID primary keys, ensures that new
rows are written sequentially on the disk after existing rows (which helps for
both read- and write-locality)?

~~~
pvh
Postgres does not store keys in primary key order, actually, but in write
order. This is an artifact of the MVCC model. There is a CLUSTER command that
would do what you describe above, but it is (effectively) useless since you
can't really run it without taking the database offline while the table gets
rewritten.

~~~
atombender
You can do online clustering with pg_repack
(<https://github.com/reorg/pg_repack>).

Given a table T, it will:

1\. Create a new empty table T'.

2\. Add a trigger on T that will clone any operations to T'.

3\. Copy all rows from T to T' in clustering order.

4\. Atomically switch the tables so that T' becomes T.

Unlike CLUSTER, the only lock required is in step 4, when it needs an
exclusive lock on the table; it cannot do the switch until all current
transactions are done with T (you can specify a timeout). The only other
drawback is that that it does not support TRUNCATE, CREATE INDEX, or some
ALTER TABLE operations being run while it's busy in step 3.

Also, unlike PostgreSQL's own CLUSTER command, it does not need an index to
order by.

We run pg_repack regularly at night to keep our databases clustered. It works
really well, though I really wish Postgres could get native online clustering.
SQL Server has had automatic clustering for many years, as has DB2, and Oracle
has index-ordered tables.

~~~
pvh
> though I really wish Postgres could get native online clustering.

Tell the community - pg_reorg could be cleaned up, improved, and merged into
core... and should be!

~~~
lobster_johnson
I agree, although by native clustering I meant that the Postgres engine itself
should support it, much like incremental vacuum was eventually built in.

pg_repack is a fork of pg_reorg (which has not been maintained since 2011), by
the way, which has become the de facto replacement. No idea if the code is in
need of cleanup or not, though.

------
ozataman
How were the slides generated? Looks pretty slick.

~~~
phillmv
Note from someone who organizes conferences:

Do NOT use any slide deck framework that doesn't have a solid export function
- jpgs, pdfs, whatever.

The videographer who will be editing your video will not be able to deal with
your esoteric format - and screengrabbing every slide is horrible busy work.

~~~
craigkerstiens
showoff can generate PDF's just fine

~~~
pvh
awesome, would you generate a PDF of my slides then?

------
michaelmior
Fantastic! I'm mostly a MySQL user, but the more I read about Postgres, the
more I like.

~~~
jaytaylor
I used to be really, really into MySQL, but since getting into Postgres I've
come around to preferring it. They're both good.. but anecdotally Postgres is
more developer friendly and generally easier to deal with in production.
Postgres also comes with features that blow MySQL away in terms of flexibility
and letting you do truly cool stuff. One example: Advanced query optimization
with partial or reverse indexes.

~~~
crypto5
One of the things I am missing in postgresql is compressed tables. It's
actually big game changer in some cases.

------
jnazario
really really great stuff, thank you for posting. i had no idea about a lot of
these. and you note that the source is still pretty clean after all these
years.

------
alexanderh
Can you re-order columns yet?

<http://wiki.postgresql.org/wiki/Alter_column_position>

18 years, and still, nobodys implemented this yet?

Its not a total deal breaker, but jeebus christopher columbus christ. It
certainly would be nice.

~~~
ww520
Can you just specify the order in your query? Why do you care how the physical
order is stored? Or do you even care about the physical order? Would an
"apparent" logical order be sufficient?

~~~
alexanderh
If you're using postgresql strictly as a programming data store, its fine.
Your app will certainly abstract away the ordering of the columns

Its just nice though, if you often find yourself working on raw tables using
one of the many Admin tools. Sometimes you just need it to work like Excel to
be productive. MySQL makes all of that trivial. PostgreSQL obviously keeps
track of the order you created the columns, so being able to edit this
ordering seems trivial in my mind.

It just seems odd that there certainly is _some_ demand there for it, yet
nobodys picked up the slack in 18 years. Obviously there are workarounds. Why
dont they include the work arounds as scripts with the distrobution, etc?

Maybe I will try to implement something and submit a patch. No experience in
database dev though :(

Like I said though, its not a deal breaker. I'm just used to Open Source
projects over-implementing features. Not under-implementing them. Especially
after 18 years.

~~~
e12e

      > Its just nice though, if you often find yourself working on raw tables using one of the many Admin tools. Sometimes you just need it to work like Excel to be productive.
    

Clearly this is a bug in the admin tool? I seem to recall MS SQL allows you to
"drag columns around" -- obviously doing absolutely nothing to the database --
it's just a view. After all rows are just relational tuples -- they have no
ordering.

~~~
alexanderh
True, a lot of the tooling for PostgreSQL is in its infancy when compared to
something like MySQL. This problem definitely could be abstracted away by the
tool. But because PostgreSQL doesnt support it natively, a side effect has
been that many of the popular tools, comparable to ones in MySQL, dont have
this feature either.

Anyone have any good suggestions for web frontend to PostgreSQL that supports
reordering of columns as easily as MySQL? phpPgAdmin doesnt seem to compare to
phpMyAdmin.

Admittedly I probably should be using a more robust OS native PostgreSQL admin
tool, and not a web fronted. But alas, all of this is solved with MySQL quite
nicely. I would like to see PostgreSQL catch up.

------
thomseddon
Shame you can't view it on mobile (iPhone), saved for later.

------
np422
This was a really good presentation, I've worked with postgres and
enterprisedb on and off for many years but I still managed to learn a few new
things.

------
lysium
Hm, how is that site supposed to work? Looks like there should be slide, but I
can only see the title page.

------
Ensorceled
It looks like dozens, maybe hundreds, of people all had problem figuring out
how to see the next slide.

Do the math, that's, cumulatively, probably an hour or two of life wasted
perhaps more.

I expect webpages to have a mouse based interface, not a keyboard based
interface and I'm not alone. It's not churlish of us to complain about our
wasted time.

Interesting! And, as a postgresql lover, thanks for posting.

~~~
pvh
Glad you enjoyed the slides. I put a comment about navigation on the first
slide. Hopefully that will help.

~~~
Ensorceled
Me too!

I absolutely learned a ton from it ... I hope other Postgres fans didn't just
walk away.

------
TommyDANGerous
awesome read, learned a lot.

------
codgercoder
yet another example of "responsive interface" meaning "designed for a phone or
a tablet"

~~~
MBCook
I'm on my iPhone and its totally unviewable. It looks like I can see maybe 20%
of a slide, with parts if that covered up by black bars. I can't resize or re-
zoom.

Total disaster.

~~~
res0nat0r
Calm down and view it via your browser on your PC.

------
moron4hire
how about we just stop using keys that have no semantic meaning to their data,
period?

~~~
LaGrange
Not sure if you're trolling, but just in case you're not: because any key that
has semantic meaning to the data sooner or later is proven to be neither
unique, always present nor constant.

~~~
jfb
If a selected key isn't unique, it's a failure of the data model.

~~~
jacques_chester
It's a failure of clairvoyance, you mean.

Reality is more complex than any model.

That's _why_ we build models. They are, by their very nature, _simpler_ than
reality.

~~~
jfb
Yeah, true. A model can be valid for some subset of data for some fixed period
of time. The real failure is when the model isn't valid even for the defined
period. Skill at articulating the limitations and implicit assumptions of any
model is far too rare.

~~~
jacques_chester
That's one thing I found really useful in reading Kimball's books -- the
concept of fixed, slowly changing and quickly changing dimensions.

Everything is in flux, and you can choose to model that or not model it.

At the first level, you just model the current state of the system.

Then you begin to model the events that change the current state.

Then you begin to model changes in the shape of the problem domain.

Models all the way down ...

