
PostgreSQL’s New LATERAL Join Type - drob
http://blog.heapanalytics.com/postgresqls-powerful-new-join-type-lateral/
======
sehrope
LATERAL is awesome. It makes a lot of queries that required sub-select joins
much simpler to write and later read.

It's also great for set returning functions. Even cooler, you don't need to
explicitly specify the LATERAL keyword. The query planner will add it for you
automatically:

    
    
        -- NOTE: WITH clause is just to fake a table with data:
        WITH foo AS (
          SELECT 'a' AS name
               , 2 AS quantity
          UNION ALL
          SELECT 'b' AS name
               , 4 AS quantity)
        SELECT t.*
             , x
        FROM foo t
          -- No need to say "LATERAL" here as it's added automatically
          , generate_series(1,quantity) x;
        
        name | quantity | x
        ------+----------+---
        a    |        2 | 1
        a    |        2 | 2
        b    |        4 | 1
        b    |        4 | 2
        b    |        4 | 3
        b    |        4 | 4
        (6 rows)

~~~
bchallenor
Thanks for the example. It seems to be the same as MS SQL Server's CROSS APPLY
/ OUTER APPLY?

Anyway, good that Postgres has it too, now. There are several Postgres
features I'd love in SQL Server, like range types...

~~~
tracker1
Usually when I've had to use cross/outer apply it's been to work around overly
normalized, and somewhat bad data.

Agreed on range types.. proper enums in T-SQL would be nice too. I'm really
liking where PL/v8 is going, and would like to see something similar in MS-SQL
server as well.. the .Net extensions are just too much of a pain to do much
with. It's be nice to have a syntax that makes working with custom data types,
or even just JSON and XML easier.

If PostgreSQL adds built-in replication to the Open-Source version that isn't
a hacky add-on, and has failover similar to, for example MongoDB's replica
sets, I'm so pushing pg for most new projects.

Maria/MySQL seem to be getting interesting as well. Honestly, I like MS-SQL
until the cost of running it gets a little wonky (Azure pricing going from a
single instance to anything that can have replication for example). Some of
Amazon's offerings are really getting compelling here.

------
zaius
I'm a huge believer in postgres' json store, but anything above the basic
queries can get a bit messy. Lateral joins really clean up aggregate data
queries on json columns. E.g.

    
    
            SELECT id, COUNT(keys)
            FROM users,
              LATERAL json_object_keys(login_history) keys
            GROUP BY id;

------
MrBuddyCasino
As someone who only works now and then with Postgres, this is what made it
click for me:

"Loosely, it means that a LATERAL join is like a SQL foreach loop, in which
PostgreSQL will iterate over each row in a result set and evaluate a subquery
using that row as a parameter."

~~~
ams6110
Sounds like what Oracle calls a "correlated subquery" is it the same thing?

~~~
pjungwir
No, Postgres has supported those for a long time. For example:

    
    
        SELECT *
        FROM   employees e
        WHERE  EXISTS (SELECT 1
                       FROM   employee_projects ep
                       WHERE  ep.employee_id = e.id
                       AND    ep.project_id = 5)
        ;
    

This new feature is a lot like correlated subqueries except you can put that
nested SELECT into the FROM clause and still access the employees table.

------
ajw0100
The `sum(1)` and `order by...limit` approach really isn't the best way to
build the funnel. And if you take another approach then this could have easily
been built with normal left joins.

Also, you should probably show some explain plans before making this claim:
"Without lateral joins, we would need to resort to PL/pgSQL to do this
analysis. Or, if our data set were small, we could get away with complex,
inefficient queries."

Here's a comparison of the explain plan from your query without the `sum(1)`
and `order by...limit` business and a query using only left joins (no use of
lateral):
[https://gist.github.com/ajw0100/0d86715d6ce97389f306](https://gist.github.com/ajw0100/0d86715d6ce97389f306).
Note, I ran this against an empty copy of your exact table (no data, no
statistics). However, the explain plans are the same.

My understanding is that lateral was really meant for set returning functions
like generate_series as others have already mentioned.

Edit: I should mention I know you were just trying to demonstrate how lateral
works and that it is always good to see people writing about new Postgres
features!

~~~
ajw0100
Looking at this a little further, the outer nested loop could cause issues in
the left join vs. left join lateral version, depending on how many use_demo
events there are in the week following the user's first view_homepage event. I
added another query that uses CTEs which allows for intermediate aggregation
which should make the size of the nested loops similar between both versions.
However, I wouldn't be surprised if the CTEs take more memory than the lateral
joins because CTEs are basically temp tables that only last for the duration
of the query. Lateral may indeed be the best option but ideally I would
populate this table with real data, gather statistics, and then run explain
analyze on each query.

------
buckbova
> Without lateral joins, we would need to resort to PL/pgSQL to do this
> analysis. Or, if our data set were small, we could get away with complex,
> inefficient queries.

I don't buy the performance benefit over derived tables with properly indexed
fields for this example. However, I'd definitely use this more so with
functions.

~~~
tracker1
In MS-SQL I've done similar things with CROSS/OUTER APPLY, which seems to be
about the same things... It's usually to work around wonky data and has some
overhead.

In my own case it was so I could create a view that would give me a
denormalized view of most of the data for a given record... most of the fields
were in a common PROPERTIES table with some funky data in it... Sometimes
normalization goes too far.

------
jdnier
Here's another explanation from the Postgres docs with additional examples:
[http://www.postgresql.org/docs/9.3/static/queries-table-
expr...](http://www.postgresql.org/docs/9.3/static/queries-table-
expressions.html#QUERIES-LATERAL)

------
platz
seems similar to CROSS APPLY and OUTER APPLY from mssql
[http://technet.microsoft.com/en-
us/library/ms175156%28v=sql....](http://technet.microsoft.com/en-
us/library/ms175156%28v=sql.105%29.aspx)

------
SixSigma
Fantastic. This is _exactly_ what I wanted earlier in the week. I ended up
just writing some python to do the looping for me as it was only a few
thousand rows and I was optimizing for developer time (i.e. mine).

------
lsaferite
How does this relate to WITH RECURSIVE (CTEs)?

[http://www.postgresql.org/docs/8.4/static/queries-
with.html](http://www.postgresql.org/docs/8.4/static/queries-with.html)

~~~
adamtj
LATERAL allows a subquery or function to refer to the records that it's being
joined to when computing results.

WITH RECURSIVE allows a query to refer its own results when computing its
results. That may sound mind-bending, but it's really just poorly named way to
do a "while" loop in SQL.

A WITH RECURSIVE has the form (base-case-query UNION ALL iterative-step-
query). The query for the iterative step can refer to itself via the WITH
alias. When it does so, it's actually operating on only those records produced
by the previous step of the iteration. The iterative-step-query will execute
possibly multiple times, stopping only when it doesn't produce any more
records.

Here's the WITH RECURSIVE example from the Postgres docs, translated into
Python:

    
    
      all_records   = []
      previous_step = [1] # base case, i.e. "VALUES (1)."
      while previous_step:
          all_records.extend(previous_step)
          # iterative step, i.e. "SELECT n+1 FROM t WHERE n < 100"
          current_step = [n+1 for n in previous_step if n < 100]
          previous_step = current_step
    

The confusing part is that the alias "t" in the SQL example means different
things in different places. Outside the WITH RECURSIVE, "t" is equivalent to
"all_records" in the example. Within the WITH RECURSIVE definition, "t" is the
same as "previous_step".

------
mlrtime
cross join lateral also works with json arrays. Select your json array then
CROSS JOIN LATERAL with a jsonb_to_recordset() to take your json array into
individual columns.

------
kidmenot
From the many specific comments on here, it sounds like most people don't use
an ORM. Is that the case?

Not trolling or trying to start a flame, I'm genuinely curious as to how
people here get stuff done.

~~~
jordanlev
Even folks who use an ORM still need to go outside the ORM for some operations
-- especially reporting (like the example in the article). I find ORM's very
helpful for basic CRUD operations, especially when one conceptual entity of
data maps to one or two database tables directly. But there are just some
things that can be expressed in SQL more efficiently than you could in OOP
programming (and vice-versa).

~~~
kidmenot
Thanks. That's for sure true, using an ORM to avoid the boring CRUD stuff is
definitely a good thing. However, this shows once again that, for non-trivial
systems, you still have to resort to plain SQL, which kind of locks you in to
your database of choice.

------
digital-rubber
Nice read.

Somethings i would have liked to have read/seen though are, the hardware
specs, database dimensions (nr of entries in tables etc) and query times.

Just my penny. Hope you can spend it still somewhere :)

------
ars
So it seems this does the same thing as putting the subquery in the list of
columns to return, except more efficient.

i.e.

    
    
       SELECT a, (SELECT b FROM ...) b FROM ....

~~~
tracker1
Kind of.. it's more expensive than a direct join, but you can do more with it.
AFAIK it's syntactic sugar around a correlated sub-query... It may be slightly
more efficient now too. It should probably be used sparingly and over limited
result sets.

~~~
jeffdavis
I think it goes beyond syntax sugar, but perhaps there's a transformation
between them that I'm not aware of.

------
smegel
Is this the same as a correlated sub-query (something that exist in Teradata)?

~~~
jeffdavis
No:

[https://news.ycombinator.com/item?id=8690389](https://news.ycombinator.com/item?id=8690389)

But they are similar... Kind of like correlated subqueries in the from clause.
But they allow a few new twists in the semantics.

