
Postgres Demystified - craigkerstiens
https://speakerdeck.com/craigkerstiens/postgres-demystified
======
radimm
You can watch the session here <http://www.youtube.com/watch?v=3wwRmmd0P3k>

~~~
craigkerstiens
There's also a version available here -
<http://www.infoq.com/presentations/Postgres-Introduction>

~~~
johnwards
and a 28 minute version in HD here: <http://vimeo.com/56682925>

(Hey Craig! ;))

~~~
cheapsteak
Thank you. This one has the best audio and video quality for sure.

------
viraptor
From the datatype wish list, these look like a bad idea actually:

\- url - If all schemes are allowed, this is almost equivalent to "does it
include :// after the first word". I'm not sure why it should be included.

\- phone - Around the world this is equivalent to: does it contain digits. And
maybe also + # and star. And maybe also letters for internal voip phones.
And....

\- zip - Every country has its own. There is no common format. It can have
letters, numbers, symbols, ... anything.

\- email - This actually could work if it allowed all possible formats.

~~~
jacques_chester
PostgreSQL handles UUIDs, IP addresses and booleans quite intelligently. Where
appropriate they include specialised functions that you can use in queries,
and the formats for input are generous so long as your input is unambiguous.

For example, the boolean type in PostgreSQL will accept TRUE, 'true', 't',
'1', 'on', 'y' and 'yes' (case insensitive) for truthful values and the
equivalents for false values. It's hard to state how much better this is than
Oracle, to take my work environment as an example.

My point is that I imagine that when the PostgreSQL team get around to these
types, they will address them with the same care and forethought as they have
for everything else. I trust them because their feature plans are not based on
maximising shininess.

~~~
wereHamster
> For example, the boolean type in PostgreSQL will accept TRUE, 'true', 't',
> '1', 'on', 'y' and 'yes' (case insensitive) for truthful values and the
> equivalents for false values.

But does it support 'affirmative', 'correct', 'aye'
(<http://thedailywtf.com/Articles/Are-You-Sure.aspx>) or 'YOUBET', 'YEP',
'NOWay', 'YEAH', 'NOPE', 'Maybe' ([http://thedailywtf.com/Articles/The-Object-
Test,-a-New-PI,-a...](http://thedailywtf.com/Articles/The-Object-Test,-a-New-
PI,-and-More-SHEEIT.aspx)) ?

Note the mapping of 'Maybe' to a boolean through '__TIME__[7]&1'!

------
ysangkok
I don't like how timetz was highlighted. From the PostgreSQL docs[1]: "Time
zones in the real world have little meaning unless associated with a date as
well as a time, since the offset can vary through the year with daylight-
saving time boundaries."

Also, I quote: "It's Postgres or Postgres-Que-Ell, not Postgre-Es-Que-Ell".
And then a slide shows "PostgresQL" while <http://postgresql.org> shows
"PostgreSQL". That pronounciation makes no sense to me if the slide is
correct.

[1]: [http://www.postgresql.org/docs/9.2/static/datatype-
datetime....](http://www.postgresql.org/docs/9.2/static/datatype-
datetime.html#DATATYPE-TIMEZONES)

~~~
ibotty
that's a typo.

------
rogerbinns
I'm waiting for the day when databases figure out their own damn indexes. They
are sitting there with billions of CPU cycles per second, trillions of bytes
of RAM and trillions of bytes of storage. Yet someone has to go in and babysit
the thing to tell it which indexes to create.

(Yes I know the perfect dbadmin can make the perfect indexes. But I expect the
database to figure it out itself and rarely be wrong.)

~~~
ysangkok
Sounds like you need strong AI. :) But really, isn't that like wishing the
compiler would find out if you need a priority queue or a stack? Or rewriting
your n² sort automatically to n log n?

~~~
rogerbinns
No, it is more like the compiler working out that when you say x=0 whether it
should be done by loading a literal, using a zero register or xorring the
register with itself.

Note that you don't actually need strong AI. Look at what MongoDB does
[http://docs.mongodb.org/manual/core/read-
operations/#query-o...](http://docs.mongodb.org/manual/core/read-
operations/#query-optimization)

Essentially it is dumb in that it tries multiple query plans concurrently,
figuring out which was most efficient and using that in the future, monitoring
its expected performance and trying all candidates again when it deviates.

With modern systems having such a surplus or CPU, RAM and storage a dumb
system could try multiple indexing strategies, and work out which worked best.

------
baubu
The biggest problem with PostgreSQL is the lack of a REPLACE statement and in
general first-class "upsert" support.

This is the most fundamental mutation operation on dictionary, in all
programming languages, denoted by:

map[key] = value

It's truly mystifying how something so basic could be missing from something
that is supposed to be a better system for storing data.

It also doesn't even fully implement the SQL standards, missing for example
the MERGE statement in SQL _2003_ standard, which is also utterly baffling.

Note that all databases have catastrophic issues like these, it's just
ridiculous how they can be in this terrible state.

~~~
jacques_chester
I like MERGE too, in the proper place. I'm not sure that the absence of MERGE
constitutes "this terrible state".

It would be like a commercial RDBMS lacking boolean and serial types,
necessitating reams and reams of repetitive, error-prone boilerplate
constraints and insert/update triggers.

~~~
mercurial
Yeah, but boy, they're sure going to bill you for it.

------
fjordan
One feature I've missed while using Postgres is the built-in REPLACE INTO that
MySQL provides. I know it's possible to create a function to accomplish this
in Postgres for specific cases, but is there some general variant which could
be applied to multiple tables?

~~~
3amOpsGuy
A delete followed by an insert in the one transaction - break into batches for
a huge table?

That's really simplistic but I don't know the use case.

~~~
jeltz
That is not enough to solve the general case due to locking considerations.

If we have one transaction running delete and then insert concurrently with
one transaction doing a normal update then the update will find zero rows to
update if it is ran after the delete+insert. The inserted row will be
invisible to the update while the old row will be locked. This means the
update will wait until the other transaction is committed and then find
neither the old nor the new row.

A correct REPLACE implementation would make sure the UPDATE would UPDATE the
replaced row.

The above is assuming READ COMMITTED isolation level.

~~~
adrianmsmith
Interesting, PostgreSQL works just like you said.

However MySQL doesn't. I've just tried it, and MySQL blocks the UPDATE until
the first transaction (DELETE, INSERT) is finished (like PostgreSQL), but
after the commit, the UPDATE statement updates the newly inserted row.

It seems MySQL firstly searches the table for the row and then attempts to
acquire a lock on it (thus waits for the DELETE, INSERT to finish, as it's
locked the row being DELETEd), and, after the lock has been acquired, searches
the table _again_ for rows matching the query in order to perform it.

~~~
jeltz
Interesting, I did not know this.

Did you run MySQL in READ COMMITTED or REPEATABLE READ? And would it matter?

------
neya
It's amazing how the newcomers to tech seem to ignore 'proven' databases like
PostgreSQL so easily that have actually stood the test of time and actually
have the _real_ capacity to scale enormously on demand (which when done
right). What is more disappointing is how the newcomers seem to trade in for
beta-quality NoSQL databases that promise them of better reliability and
scalability out of the box, which is simply not true[1]. For instance,
databases like MongoDB promise the developer to not worry about scalability
and focus just on their code, while it does some things under the hood in a
terrible way [2].

PostgreSQL is one of the few databases that supports SQL and can actually be
less painful to scale than MySQL. Also, the recent versions have enormous
performance boost that can silence the SQL critics, IMO.

If you are interested in testing out PostgreSQL for your project, you might
want to develop on the WAPP stack and give it a shot:

<http://bitnami.com/stack/wapp>

Disclaimer:

I was a mongoDB developer too and I loved it so much. I loved the fact that I
could skip running db migrations on my rails app and I can focus on just
coding, instead of worrying about scaling. But, it was a terrible trade off.
First of all, you can't use NoSQL for everything. It was a very painful lesson
that I learned the wrong way. 99% of the time, you want to use SQL, because
most use cases can be executed perfectly with SQL db's. Next, some db's have a
terrible architecture and design decisions under the hood, mainly to improve
their benchmarks (seriously!). For example, turning off write-safe by
default... :cough: :cough: Last but not the least, SQL for the wrong use case
complicates things and ends up adding redundancy into the database (like
storing User details inside multiple collections, un-avoidable, especially
when you are against embedding documents). The worst trade-off using a NoSQL
database is the analytics part - "Hey db, show me the list of users who are
from the United States and who have subscribed to Plan X and who are the
highest paying customers and who love chocolate pie" SQL - "Here you go."
NoSQL - "Sorry sir, not possible. Possible, but possibly a nightmare."

If you want to build a 'scalable' app for your next big thing and don't want
the scaling hassles of MySQL, which actually scales extremely well already[3],
you can use PostgreSQL.

Cheers

[1]Scaling is _hard_. Don't let anyone else tell you otherwise!

[2] [http://blog.engineering.kiip.me/post/20988881092/a-year-
with...](http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb)

[3] [http://www.quora.com/Quora-Infrastructure/Why-does-Quora-
use...](http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use-MySQL-as-
the-data-store-instead-of-NoSQLs-such-as-Cassandra-MongoDB-or-CouchDB)

~~~
jacques_chester
You must always pay the development complexity piper.

SQL databases start off difficult but pay dividends on the back end. NoSQL
databases start of easy but begin to extract their costs later.

Similarly, a well-normalised database is tuned to write-and-store performance,
but the cost of joins can crush you at query time.

What's usually missing from discussions like these is highlighting the
difference between OLTP and OLAP.

You favour one or the other. TANSTAAFL.

~~~
yen223
TANSTAAFL = "There ain't no such thing as a free lunch", for those who are
wondering.

~~~
neya
Thanks! I actually was really wondering what it meant :D

------
dude_abides
Loved this line: "Why Postgres?" "It is the emacs of databases"

~~~
Ziomislaw
psh~ it's the vim of databases ;)

~~~
jacques_chester
No, really. It's an emacs. Features upon features.

SQLite might be the vim of databases.

Actually, BDB hash files.

~~~
Smudge
And MS SQL Server Compact Edition is the Wordpad of databases.

(I'll just see myself out...)

------
T-R
Does Heroku Postgres support PLV8 now? In the questions section of the
presentation, he doesn't mention it as unsupported, but as recently as Sept
2012, it wasn't [1] - has that changed?

[1] [http://stackoverflow.com/questions/12593080/heroku-hosted-
po...](http://stackoverflow.com/questions/12593080/heroku-hosted-postgresql-
and-plv8)

~~~
craigkerstiens
If you have a newly provisioned Heroku Postgres database try: create extension
plv8;

------
morganwilde
If this was meant to demystify something, then I completely missed it. Since
I'm new to PostgreSQL, I was hoping to get a birds eye view from someone that
knows his stuff, but what I got was keyword clouds. Could that be anything
else but mystifying?

~~~
mercurial
That's more a presentation of some not so well-known features and extensions
found in Postgres than something for newcomers. In any case, watch the video
linked in a comment above instead of reading the slides.

~~~
house9-2
just watched the video - good stuff, he also mentions these two resources at
the end \- <http://www.postgresguide.com/>

\- <http://www.youtube.com/watch?v=3yhfW1BDOSQ> (Christophe Pettus "PostgreSQL
when it's not your job.")

------
asax
I have spent quite a few years working with MS SQL Server, but have only
dabbled with Postgres. Can somebody recommend a good Postgres book or tutorial
for people like me?

~~~
zorlem
One of the best resources for PostgreSQL is the online documentation - I've
hardly seen a F/OSS project with a better written documentation. The only
drawback is that it is quite verbose and it's easy to get lost. You might want
to start with the official tutorial [1] and might also check the dated "MS-SQL
to PostgreSQL Migration" (from 2001) [2] which still provides some nice
transition tips.

Make sure to check the "Postgres Guide" [3] which is a recent addition and is
excellent to get you started quickly.

The quick intro by Packt Publishing [4] looks useful and concise as well, but
is not as good as the Postgres Guide.

Since you're familiar with the RDBMS concepts I would suggest you take a look
at the nice "PostgreSQL 9 Admin Cookbook"[5] which has a solid collection of
recipes that you can use to learn the specifics of the DB and a bunch of nice
features.

[1] <http://www.postgresql.org/docs/9.2/static/tutorial.html>

[2]
[http://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_Post...](http://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_PostgreSQL_Migration_by_Ian_Harding)

[3] <http://www.postgresguide.com/>

[4] <http://www.packtpub.com/article/introduction-to-postgresql-9>

[5] <http://www.amazon.com/dp/1849510288/>

_edit: fixed the link to postgres guide._

~~~
asax
Thanks for that! Much appreciated!

