Hacker News new | past | comments | ask | show | jobs | submit login
Internals of PostgreSQL (interdb.jp)
464 points by billwashere 89 days ago | hide | past | web | favorite | 25 comments



Following the link to the blog of the author I found a blog post titled “postgres cluster management system in GoCardless”

http://www.interdb.jp/blog/pgsql/pg_pacemaker_01/

Which in turn lead me to

https://github.com/gocardless/our-postgresql-setup

And in turn to a fifteen minute talk titled “Zero-downtime Postgres upgrades”

https://www.youtube.com/watch?v=SAkNBiZzEX8

Which was interesting and informative IMO.

After watching that one I was lead further via the YouTube suggested videos to a 45 minute talk titled “Tuning PostgreSQL for High Write Loads”

https://www.youtube.com/watch?v=xrMbzHdPLKM

Which I liked a lot as well.

I think anyone who came here because they are interested in the subject of the OP link might be interested in watching these videos.

Though disclaimer of course: These videos told me things I didn’t know about scaling PostgreSQL, because there is a lot I don’t know about that. But if you already know a lot about that then of course these videos might not be so interesting to you.


Interesting stuff. If anyone wants to make a PR to add this stuff to https://github.com/dhamaniasad/awesome-postgres I'll be happy to accept.


This is the thing I love about PostgreSQL, you can find out everything about the database. Oracle is ridiculous - they have something of the range of over a thousand hidden parameters, query tuning their CBO is somewhat of a black art and can change between point releases, and leads to articles like the following:

http://www.dba-oracle.com/art_so_undoc_parms_p2.htm

PostgreSQL, however, has none of these limitations.


Your comment reminded me of oraguy comment about oracle db

[1] https://news.ycombinator.com/item?id=18442637


Oracle also has serious quality issues. The last patch set I applied had the wrong binaries for Oracle Text. With what they charge, you wouldn't expect that. I've never had such a thing happen with my Postgresql instances. I just don't see how to justify starting out with Oracle today.


Don’t take anything seriously on dba-oracle.com - Burleson is well known in the Oracle community as an expert in self-promotion but his technical skill is mediocre at best.

Jonathan Lewis, Guy Harrison and Craig “the Hammer” Shallahammer are infinitely more experienced and insightful.


Right. There are many Oracle internals experts who are presenting at conferences, they are also quite approachable. I'd especially recommend Tanel Poder, several of his troubleshooting sessions are available on youtube.


IIRC Mysql too can change between point releases in surprising ways. (Though not sure if that practice began before or after Oracle's purchase of it.)


That's sometimes unavoidable. If there's a clear bug somewhere, fixing it will sometimes also affect queries/plans that weren't visibly affected by the bug. That's especially the case around costing issues - some queries might have regressed noticeably due to a bug since the last major version, but fixing it might affect other queries negatively (in a minor version). Postgres/we try to avoid that, but sometimes that's the most sensible way forward (and I assume the same is true for mysql etc).


Reminded me that "Postgres is easy to install, administer, maintain, and use... with just a little bit of orientation. This is that orientation."

Christophe Pettus: PostgreSQL Proficiency for Python People - PyCon 2014. https://www.youtube.com/watch?v=0uCxLCmzaG4


Does anyone know if something like this exists for other popular DBs?


This only covers one feature of SQLite, but I recommend it nonetheless: https://www.sqlite.org/queryplanner.html


The Inside SQL Server series by Microsoft Press is pretty impressive.


This series is now called Microsoft SQL Server Internals [1]. You can get the latest version - Microsoft SQL Server 2012 Internals (Developer Reference)- from Amazon [2].

[1] https://www.sqlserverinternals.com/publications/

[2] https://www.amazon.com/gp/product/0735658560/ref=as_li_tl?ie...


As a person who have started in SQL server, I really like the succinct clarity of Postgresql functions and its lightweight installation on the server. Thanks for sharing, this could really help me in the future.


Aside: do you still need to use a cache service when using Postgres? Our Django web app forgoes any caching because “Postgres is fast enough” and “has its own cache”.


Assuming you mean application level cache you can't really generalize it. In general you don't want to cache anything if you can because cache invalidation is a really hard problem.


I read a comment like this and just desperately want to look at the query log to see what’s being run. Followed by explaining a few offenders to see what the planner is working with. You can get really deep into optimising your dB, but often there are a lot of low hanging fruit to work with that you can make “good enough” in no time at all.


That really depends on what you're caching. Sometimes e.g. you might want to have local caches on application servers, not because the database would be too slow or couldn't take the load, but because the roundtrip time to the database make page loads too slow if you hit the DB every time.

Or you have some complex ranking/relationship/aggregation calculations - it might computationally be infeasible for the database, as hard as people worked on its efficiency, to calculate such a query on each request...


Depends on what you're using the cache for. Postgres is definitely fast enough for reads of a K/V pair at a scale your master node can support for most use cases, but there are cases that you can't suit with postgres alone at-scale, e.g:

1. Needing a consistent (meaning reading off replicas is not fresh enough) data source for reads and writes that happens every request to very high-request site. Think sessions, api rate limiting.

2. Needing a fast store for denormalized data.


Generally true

But if some query takes 5 minutes to run and you can’t speed it up, you need to do something.

I cache a very few select queries that don’t need to be 100% accurate as scan a ton of rows.


Postgres is fast enough, but Django isn't.


Thank you for posting. Really great resource.


Awesome source, does anyone know how to convert it into a mobi? I'd like to send it to my kindle


You can get an epub with pandoc. A very crude script to do it:

  wget http://www.interdb.jp/pg/pgsql{01..11}.html http://www.interdb.jp/pg/img/fig-{1..11}-{01..34}.png http://www.interdb.jp/pg/img/fig-4-fdw-{1..7}.png http://www.interdb.jp/pg/img/udc1.jpg
  mkdir img
  mv *.jpg *.png img/
  pandoc -s pgsql{01..11}.html -o internals_pgsql.epub
From there, you should be able to convert it to mobi with Calibre or a similar tool.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: