
Following a Select Statement Through Postgres Internals (2014) - rspivak
http://patshaughnessy.net/2014/10/13/following-a-select-statement-through-postgres-internals
======
codehusker
This article was also discussed before:
[https://news.ycombinator.com/item?id=8449329](https://news.ycombinator.com/item?id=8449329)

Pat Shaughnessy has 3 other similar articles about Postgres internals. He does
a great job of combining high level explanations, visual aids, and gritty
details.

Discovering the Computer Science Behind Postgres Indexes
[http://patshaughnessy.net/2014/11/11/discovering-the-
compute...](http://patshaughnessy.net/2014/11/11/discovering-the-computer-
science-behind-postgres-indexes)

A Look at How Postgres Executes a Tiny Join
[http://patshaughnessy.net/2015/11/24/a-look-at-how-
postgres-...](http://patshaughnessy.net/2015/11/24/a-look-at-how-postgres-
executes-a-tiny-join)

Is Your Postgres Query Starved for Memory?
[http://patshaughnessy.net/2016/1/22/is-your-postgres-
query-s...](http://patshaughnessy.net/2016/1/22/is-your-postgres-query-
starved-for-memory)

~~~
pat_shaughnessy
Thanks everyone for the kind comments, and for taking the time to read this
stuff :)

~~~
monksy
I'm not going to read all of that right now. Buy you've earned yourself a
subscriber.

------
ckuehne
"What’s going on here? Why is Postgres wasting its time, continuing to search
even though it’s already found the data we’re looking for?"

Because Postgres has not already found what we are looking for. The query is

    
    
      select *
      from users
      where name = 'Capitain Nemo'
      order by id asc
      limit 1;
    

Only after Postgres has found all users with the name 'Capitain Nemo' it can
sort them by their 'id' attribute and limit the result set to the first one in
the sorted relation.

Otherwise very nice post though.

~~~
ghayes
Yes, the OP here [0] says that the query was one that was generated from
ruby's ActiveRecord. Clearly, removing the sort clause would speed up this
query. I believe the issue (from the ruby perspective) is using ActiveRecord's
`User.where(name: 'Captain Nemo').first` instead of `User.where(name: 'Captain
Nemo').take` [1]

[0]
[https://news.ycombinator.com/item?id=8449329](https://news.ycombinator.com/item?id=8449329)
[1]
[http://api.rubyonrails.org/classes/ActiveRecord/FinderMethod...](http://api.rubyonrails.org/classes/ActiveRecord/FinderMethods.html#method-
i-take)

------
haberman
One thing I've gotten pretty good at is taking some question I have about some
software and answering it by looking through the source. I've wondered whether
people would find it interesting if I showed the process of how I do this.

I've wondered whether YouTube might be a good medium for this. Blog articles
feel kind of "produced" \-- I thought it might be cool to capture the
exploratory nature of how this works.

~~~
Denzel
I'm the same way: take a question and answer with the source. But as my time
gets more and more constrained, I wish more large open-source projects had
design docs akin to CockroachDB. I read through the CDB design doc in one
sitting, then read the source and it was such a treat to have that high-level
context while wading through the details.

I can't say the same thing for ElasticSearch (or most large open-source
projects). It's very difficult to get the high-level context without a well-
written design doc explaining why things are the way they are. Some parts of
ElasticSearch make sense, and other parts just seem hackneyed. Of course, I
could pull together little bits and pieces of high-level insight from multiple
disparate blog posts and forums and mailing lists here and there.

But what I would like is a design doc! If you were to produce something along
the lines of that for open-source projects that need it, I wouldn't hesitate
to pay a modest fee.

~~~
egor83

      > If you were to produce something along the lines of that 
      > for open-source projects that need it, I wouldn't hesitate 
      > to pay a modest fee.
    

Try BountySource [1] maybe?

There you can post a proposal for some improvement to an OSS project (a design
doc, in your case) and pledge a bounty for it.

If other people have the same problem, they might chip in further, making it
more visible/profitable to solve.

\----

[1] [https://www.bountysource.com/](https://www.bountysource.com/)

~~~
Denzel
Thanks for the link! First time I've seen that marketplace. I'll look into it.

------
filereaper
I've been looking for books or articles on Database Architecture and internals
for a while now.

I found this paper "Architecture of a Database System" [1] to be an excellent
starting point, it's written by database veterans James Hamilton and Micheal
Stonebraker and references other much deeper works.

I've always liked Pat Shaughnessy's work, Ruby under a Microscope was a
fantastic introduction to MRI 2.0. I really hope to see more works like these.

[1]
[http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf](http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf)

~~~
anonetal
That paper is excellent and serves as a great intro for a graduate database
course. It would be remiss not to list the main author though (Joe
Hellerstein, Berkeley).

------
anarazel
> Next Postgres analyzes and rewrites our query, optimizing and simplifying it
> using a series of complex algorithms.

Analyze & rewrite doesn't really use any complex algorithms.

Analyzing basically means that we resolve
tables/objects/operators/databases/... in the parse-tree into what they mean
in the current database with the current settings. The process of parsing
itself (via a bison parser) doesn't do any object lookups and such. So errors
about non-existant tables, operators and such will mostly happen during the
analyze phase.

The rewrite phase, which often will do nothing, will resolve both explicitly
(CREATE VIEW) and implicitly (row-level security) referenced views by their
definition. It also processes rules in an equivalent manner, but you should
never use those...

The actual optimization you're referring to will happen as part of planning.

EDIT: typos

~~~
anonetal
Rewrite phase is also supposed to "flatten subqueries/views" (cf.
"Extensible/rule based query rewrite optimization in Starburst"; SIGMOD 92)
and can be fairly complex. PostgreSQL doesn't do it to the same extent as DB2,
but there is certainly some complex logic involved there.

~~~
anarazel
That paper's definition of rewriting isn't the same as postgres' (they even
say that: "The work presented here should not be confused with the query
rewrite facility of POSTGRES [SJGP90]."). Flattening subqueries and views
happens during planning in postgres.

Source: I'm a PostgreSQL Developer.

------
0x37
Interesting stuff, I've recently got interested in databases and did some toy
programming related to them, trying to understand how they work.

I just wish the author had gone closer to the nitty gritty details of _how_
the data is actually fetched from the disk. How does Postgres store the data?
What do the data files look like, how is the parsing done?

In any case I appreciate the effort. I guess I might have to dive into the
source code myself.

~~~
anarazel
There's some docs about this in the docs. c.f.
[http://www.postgresql.org/docs/devel/interactive/storage.htm...](http://www.postgresql.org/docs/devel/interactive/storage.html)

------
jnordwick
Depending on the indexes and other magic, more efficient might be:

select * from users where name = 'Captain Nemo' and id = (select min(id) from
users where name='Captain Nemo')

trying to forget bad memories of forcing the Oracle query planner into
submission with more hints than actual sql

