
The internals of PostgreSQL - adamnemecek
http://www.interdb.jp/pg/index.html
======
elvinyung
A really cool paper to read is _The Design of POSTGRES_ [1] by Stonebraker et
al. It's dated by its mentions of things like POSTQUEL, but it's still really
interesting to read about the early design of seminal features like the
extensible ADT system, from a time when it was still innovative.

1:
[http://db.cs.berkeley.edu/papers/ERL-M85-95.pdf](http://db.cs.berkeley.edu/papers/ERL-M85-95.pdf)

------
combatentropy
I would like to read through this when I'm not so tired. Thank you!

From what I can tell, PostgreSQL is sensibly structured. At least, it seems
better than MySQL. The hackiest thing I have heard so far about MySQL is that
if you run the exact same SQL text more than once, it will fetch the result
from cache. If you run a differently worded query, it will skip the cache,
even if the query were to bring back the exact same data.

This is different than what I remember about PostgreSQL (correct me if I'm
wrong). I remember reading in some book that PostgreSQL just let its rows
become memory pages, and if a query resulted in data already in memory, then
it got it from memory, otherwise it got it from disk. No need to make sure you
didn't insert a space or something into your query the second time.

~~~
znep
While that is true and the mysql query cache definitely became a scaling issue
as machines got bigger with lock contention etc., postgres doesn't even have a
similar feature period.

Suppose you have a query that is doing a group by on a low cardinality field
(eg. state) on a hundred million row table. Large amount of data in, small
amount out. postgres has to actually look at all the rows to re-run the query.
The mysql query cache only has to pull the tiny number of results out of the
cache for that specific query.

This is a dangerous feature because there is a super slippery slope as traffic
increases... any change to any of the tables invalidates the cache when you
need caching most, and hits all queries against those tables at the same time.
But if used well in an environment with many copies of the data (replicated or
otherwise) where you could manage when updates happened, it can be a
lifesaver.

InnoDB also has its own buffer pool that caches rows and is in general terms
similar to the postgres buffer cache.

It is fair to say that mysql has a lot more features like this that are
"sometimes life-saving hacks that can bite you hard if you don't understand
them" and postgres is much more restrained in ensuring the features are a
little more general and thought through.

~~~
brianwawok
MySQL needs the cache more often though with a limit of one index per query.
Postgres seems like it more often can combine a few indexes and turn what
scans 100k rows in MySQL into bitmasking two indexes and pulling 10 rows.

Seems sensible, each has features to support the base design.

~~~
morgo
The limit of one index per query was lifted in 2005, with mysql 5.0.

~~~
brianwawok
It's now one per table. Poor word choice by me, sorry.

~~~
morgo
[https://dev.mysql.com/doc/refman/5.5/en/index-merge-
optimiza...](https://dev.mysql.com/doc/refman/5.5/en/index-merge-
optimization.html)

------
crudbug
I have been reading about Postgres architecture. The modular design enables -
microkernel-like API where different languages can be integrated seamlessly.

The only thing, I am missing are incremental materialized views.

~~~
makoz
What resources have you been using to familiarize yourself with Postgres?

~~~
qaq
I would highly recommend talks by Bruce Momjian
[https://momjian.us/](https://momjian.us/) he has separate talks on pretty
much everything related to PG + All The Dirt On Vacuum talk by Jim Nasby
[https://www.youtube.com/watch?v=L8nErzxPJjQ](https://www.youtube.com/watch?v=L8nErzxPJjQ)

------
alexwebb2
This is now returning a 403 Forbidden.

~~~
etatoby
Obligatory Archive.org link:

[http://web.archive.org/web/20170126103158/http://www.interdb...](http://web.archive.org/web/20170126103158/http://www.interdb.jp/pg/index.html)

Also a reminder to everybody to give them a few $CURRENCY if you can.
Archive.org offers a valuable service and are very underfunded.

~~~
mrkgnao
The (intentional?) pun in $CURRENCY made me laugh.

------
sqldba
It looks GREAT but...

I always see these things posted while they're still being written and then I
never go back to see them when they're completed. And nobody posts when it's
completed.

I'd prefer the posting not be done until it's finished, or, that there be an
"email me when it's finished" button.

~~~
snoman
If you can't be bothered to set a reminder, subscribe to the RSS feed, create
an ifttt trigger or anything like that... I don't know what to tell you.
There's lots of solutions to this predicament. This one's on you.

~~~
hordeallergy
As far as not having to read things twice, draft and final, I'd have to lean
towards @sqldba's position.

~~~
vog
There is also an interesting middleground named Evolving Publication where the
author publishes section by section, and the RSS feed informs you whenever a
new section/chapter is added to the article:

[https://martinfowler.com/bliki/EvolvingPublication.html](https://martinfowler.com/bliki/EvolvingPublication.html)

So the RSS feed is not per article, but per article part.

~~~
karmacoda
That is interesting, ta, although more work for the author.

