
Modern SQL in PostgreSQL - lelf
http://www.slideshare.net/MarkusWinand/modern-sql
======
rosser
One of my favorite idioms lately for doing bulk updates without incurring lock
contention is to chain CTEs, like so:

    
    
      with candidate_rows as (
         select id
           from table
          where conditions
          limit 1000
            for update nowait
      ), update_rows as (
         update table
            set column = value
           from candidate_rows
          where candidate_rows.id = table.id
      returning table.id
      )
      select count(1) from update_rows;
    

...and loop on issuing that query until the "count(1)" returns zero some for
number of iterations (three works pretty well).

Want to add a column to your "orders" table and populate it without blocking
concurrent writes for as long as it will take to rewrite a multi-million row
table? Want to re-hash user passwords using something stronger than MD5, but
not prevent users from ... you know, _logging in_ for the duration?

CTEs are all that and the bag of chips.

~~~
lastofus
Do you have any resources for learning cool advanced SQL like this? I've used
CTEs in the past, just from reading about them in the documentation, but I
feel like I'm only scratching the surface of what's possible out there.

~~~
mahmud
The author of the slides, Markus Winand, wrote the book "SQL Performance
Explained".

~~~
xtrumanx
The book appears to me to be the contents of use-the-index-luke.com. My
friends impulsed bought it when I told him I was planning on buying it after
confirming that it had more content than the website but after skimming the
table of contents it doesn't appear to have anything more than the website.

Still, its good to support Markus' effort and if you prefer having a PDF
rather than going to the website, just buy it. It's like $15.

~~~
fatalmind
Quoting from [http://Use-The-Index-Luke.com/](http://Use-The-Index-Luke.com/)
(the main page):

Use The Index, Luke is the free web-edition of SQL Performance Explained. If
you like this site, consider getting the book. Also have a look at the shop
for other cool stuff that supports this site.

[http://use-the-index-luke.com/shop](http://use-the-index-luke.com/shop)

------
nroose
I was interested in NoSQL. Went to a Mongo pres. Both examples would have been
easier and faster in SQL. Even the SQL I used 20 years ago. I asked for an
example that would show performance advantage. I got a tired vague statement
about the vague performance advantage. Seems like snake oil to me.

~~~
cgh
Unfortunately you were downvoted, but it wasn't so long ago on HN that every
second story was NoSQL this, NoSQL that. There were even "SQL is
dead"/"relational DBs are dead" posts, just ridiculous. So it's nice to see
stories like this.

I've been writing a lot of recursive queries for Postgresql lately using CTEs.
Quite cool though a little mindbending at times.

~~~
crdoconnor
HN is very fad-oriented.

~~~
andybak
Except some fads turn out to be long-term trends, so it's worth keeping one
eye open.

~~~
crdoconnor
Absolutely, but it pays to be very skeptical and to pay closer attention to
the negative "I have used this technology and it sucks because [links to the
bug tracker]" articles than the ones that gush about how great they are.

~~~
davidgerard
There's probably a trendy Hacker News technology lifestyle cycle chart to be
drawn, step 3 or 4 of which is "developer is bitten by deficiency in the
technology, writes blog post saying it sucks and not to use it, gets 200
points and front page".

------
dtech
MySQL (and thus MariaDB I presume) and SQLite seem to be pretty poor in
supporting these "new" features.

SQLite seems logical because it needs to be kept lean for embedding purposes,
but do people know why MySQL is lagging behind so much?

~~~
barrkel
My experience with PostgreSQL and MySQL is that PostgreSQL supports lots of
features, but performance needs a very careful eye - often a neat feature
isn't as useful as it seems because optimization isn't good enough. Whereas
MySQL tends to be very good at what it can do, and makes it easy to do silly
things, like turn off referential integrity in a slow query to boost
performance.

It almost feels like a worse-is-better story. As a programmer, PostgreSQL is
much better to work with; more tools, better EXPLAIN, more features, more
types, more of almost everything. But to use in the heat of battle, it's less
clear-cut. PostgreSQL's replication story is complicated. MySQL master-master
replication is fairly easy to set up, and if you use a master as a hot
failover, it all mostly just works; when the primary site comes back up, it
resyncs with the failover. PostgreSQL has a lot of different replication
stories - without a strong central narrative, it's hard to gain confidence.

~~~
noehp
This doesn't match my experience at all. MySQL performance is highly variable,
and it's incredibly easy to write simple queries that have truly abysmal
performance.

Postgres on the other hand, has always tended towards making things work well
(a step that mysql often skips), and then work quickly thereafter.

A lot of MySQL acolytes say Postgres is slow because, unlike MySQL, it doesn't
ship with unsafe defaults. MySQL doesn't just allow you to do dumb things, it
starts off with many of those settings as the defaults.

To me, the real problem is that people likebarrkel exist. He doesn't know what
he's on about, but he likes MySQL. Most of wheat he wrote is flatly false, but
he said it confidently. And he's employed someplace that probably uses MySQL.

MySQL got adoption for two reasons:

1) it used to be easier to install; and

2) it has unsafe defaults that mean if an idiot runs a benchmark, it wins.

That's it. That's how it won market. After that, it was network effects, and
nothing else. MySQL is a turd. It requires substantial expertise to use MySQL
because it is such an awful and dangerous tool. It slows you down as you get
better. But most of the people who use it don't know any better, or (like
barrkel) they spew nonsense that is the opposite of reality. So it wins.

Network effects suck.

~~~
barrkel
_He doesn 't know what he's on about, but he likes MySQL._

You're putting words in my mouth that I didn't say. I don't like MySQL. I
prefer PostgreSQL. And I have had a few rough times optimizing some queries in
PostgreSQL, whereas I've had fewer such bad times with MySQL, despite using it
more often. It's anecdata. Take it for what it's worth.

Time sinks in MySQL have come more from its crappy defaults, from its bizarre
error handling (or lack thereof) in bulk imports, and most recently, a
regression caused by a null pointer in the warning routine.

If I were working on my own project, I'd probably go with PostgreSQL and
figure out the replication story. But I'm not. I do use PostrgeSQL on my
personal projects.

(If there was one feature I'd add to PostgreSQL, it would be some means of
temporarily and selectively disabling referential integrity. Not deferring it,
not removing and readding foreign keys, just disabling. The app I work on does
regular 10k-1M+ row bulk inserts, usually into a new table every time (10s of
thousands of tables), but sometimes appending to an already 100M+ row table.
It would be nice to have referential integrity outside of the bulk inserts,
but not pay the cost on bulk insert.)

~~~
joevandyk
You can disable all triggers in a session.

    
    
       set session_replication_role='replica';
    

If memory serves me correctly. Foreign keys are maintained by triggers.

------
mjrpes
One thing that would be nice is if SQL provided first class support for sub
records.

So instead of "SELECT name, (SELECT GROUP_CONCAT(CONCAT_WS(',', post_id, post)
SEPARATOR ';') FROM posts p WHERE p.user_id = u.user_id) AS 'posts' FROM users
u WHERE u.user_id = 1",

you could do "SELECT name, (SELECT post_id, post FROM posts p WHERE p.user_id
= u.user_id) AS 'posts' FROM users u WHERE u.user_id = 1".

and the query result would be { name : 'Todd', posts : [ { post_id : 1, post :
'My Comment' } ] }.

Obviously this is a simple example and could have been rewritten as a query on
the posts table, inner joined on the user table, and duplicating the user's
name in the result. But it becomes much nicer to have as queries get more
complex.

A query that supports sub records would gives you flexibility to structure
data like a JSON object and simplify the server end of REST apis.

~~~
exogen
With Postgres:

    
    
        SELECT u.name, json_agg(p) AS posts
            FROM users u, posts p
            WHERE p.user_id = u.user_id
            GROUP BY u.user_id;

~~~
mintplant
Thank you so much for this! I spent a good chunk of today trying to work out
how to do this, and had given up concluding that it wasn't possible.

~~~
mjrpes
I was playing around with postgres and was able to get a query that puts
everything in JSON:

select json_agg(sub) from (select u.username, (select array_agg(p) from posts
p where u.id = p.user_id) posts from users u) sub;

~~~
danneu
Using json_build_object (Postgres 9.4) to map each username to an array of
posts:

    
    
        SELECT 
          json_build_object(
            u.username,
            (SELECT json_agg(p) FROM posts p WHERE u.id = p.user_id)
          )
        FROM users u
    

Output:

    
    
        [
          {"chuck": null},
          {"blair": [
            {"id": 1, "markup": "hello"},
            {"id": 4, "markup": "world"}
          ]},
          {"serena": [{"id": 5, "markup": "testing"}]}
        ]
    

At least I think you were trying to do that.

~~~
mjrpes
EDIT: Nevermind my last post. Your query makes sense now and would indeed work
well.

------
headgasket
Postgresql 9.4 with jsonb sends mongo to the dustbin, IMHO. If you have to
write it in js close to the data or if plpgsql is too steep of a learning
curve, you can play with the experimental plv8.

But you should really pick up plpgsql, it's "python" powerful, with the an
awesome db (and has a python 2.x consistent API, sadly, but the doc is very
good) There is a great sublime 2.0 package that makes the writing and
debugging of functions in one file just awesome. Write an uncalled dumb
function that has a lot of the API in it at the top of your file, and you'll
get autocomplete on this part of the API.

Specifically no not miss getting acquainted with json and hstore, specifically
using json as a variable size argument passing and returning mechanism, it's
just hilariously effective.

cheers, and keep making this place(not only HN, our blue dot) better, F

------
gamesbrainiac
Anyone know where I can get the full video lecture for this? I'd really like
to see it.

~~~
fatalmind
I've given this talk on Saturday in Moscow again and it was video taped there.
Organizers told me they'll publish the English original as well as the Russian
voice over version. I'm keen for the second one ;)

I keep you posted.

------
leoh
Interesting. I don't think django implements a lot of this in their ORM.

~~~
qooleot
I work on several Django apps, and also write advanced SQL in Postgres (CTEs,
plv8 functions, materialized views, etc.). One thing the ORM allows though is
writing a custom sql query and its pretty easy to 'cast' the sql results into
a Django model.

With that said, its still sometimes a struggle to justify if almost all the
queries are highly custom, and the app doesn't require Django Admin. Certainly
there are advantages, but the minimalism of Flask and focusing on the REST
interface for integration and building micro services is more appealing in
some cases.

~~~
IndianAstronaut
If one wants to use these Postgres features, Flask without an ORM is the way
to go. Django is too constraining, especially with its primary key limiting.

~~~
collyw
If you use Django without the ORM, why would it be any more constraining than
Flask?

I usually try to keep my queries within the ORM, as it means things like
sorting and pagination become very easy, but its lacking in certain areas.

~~~
Iftheshoefits
It's not necessarily more constraining, but from my perspective there are two
reasons to choose Django over other python frameworks: the ORM and the out-of-
the-box Admin (which depends on the ORM). If you take those away, there is no
good reason, in my opinion, to choose Django over, say, Tornado.

I wouldn't choose Flask for anything; it's too slow, even by the relatively
poor standards set by Tornado and Django.

~~~
collyw
The admin has stopped m going to another framework before.

My additional reasons for choosing Django over a minimalist framework:

There is a large selection of third party add ons for Django. I had a look
around I couldn't find anything like reversion for any other framework (maybe
it exists, but I didn't find it).

There are a standard set of components, so you get a default choice. These
will work together and likely have some consistency in the way they get used.
Choosing your own components for Flask (or whatever framework), they may well
work together, or you may get problems.

------
juliangregorian
I often find myself wishing that SQL had a more uniform syntax. So much of it
resembles COBOL to me, with all the capital letters and "english-like"
arbitrary word order. At the same time, relational algebra exists and is very
elegant, in an APL-like way. Would be very interested if there was a project
to expose such an api to SQL databases.

~~~
collyw
SQL is case insensitive. It is a bit quirky though.

~~~
domoarevil
Indeed, something people forget sometimes leading to collation problems in the
(odd) instance when CS is set.

------
xvirk
is it even possible ?

------
liprais
According to my understanding ,Teradata has already supported these so-called
"modern" features for several years.

------
ecopoesis
And yet, despite all these cool features, Postgres still doesn't support
upsert or merge, which even MySQL manages to get right.

~~~
rosser
What's really funny about this comment is that, while all of the features
discussed in TFA are fully standards-compliant SQL, MySQL implements none of
them, _and_ its "upsert" isn't.

EDIT: And before one slags PostgreSQL too hard for not _currently_ supporting
upsert, one might peruse the relevant pg wiki page [1] to better understand
where development stands and why we don't yet have it. (Hint: it's actually
kinda complicated, assuming you want it done, you know, "right".)

[1]
[https://wiki.postgresql.org/wiki/UPSERT](https://wiki.postgresql.org/wiki/UPSERT)

~~~
stream_fusion
Everyone who needs 'upsert' is forced to go and read those discussions (and
probably a few more articles as well), and then implement their own version of
the same thing many times over.

The fact that it's complicated is precisely the reason this ought to be solved
for the general case.

Otherwise, Postgresql is still an awesome product.

~~~
jeltz
Nah, you do not need to read those discussions unless you want to help out
with the current patch which if enough people help out might land in
PostgreSQL 9.5.

