Hacker News new | comments | show | ask | jobs | submit login
Dear Postgres (craigkerstiens.com)
300 points by Twisell 9 months ago | hide | past | web | favorite | 59 comments



Nice to see a "Dear <open source project>" that was a full blown love letter instead of a critique!


I was dreading the big "BUT + <arbitrary rant>", and indeed, it never came.

I share many of the views of the author. It's a great piece of software.


Yeah, I could quibble about how the XML support in Postgres is not really equal to an XML database--it's still limited to xpath/xslt 1.0, and no xquery--but the fact that is has any at all is still handy and useful. I mean, I guess it would be too much to ask for it to do _everything_.


And if you need that, you can easily write a function to do it in one of dozens of languages – from pl/SQL to pl/V8, or even pl/C – or even directly write an extension (and I think there are already a few for that purpose)


Yea, same here on that “BUT.” Which just made me love it more.


We need more of these ;-) I started reading afraid its a thin-veiled complaint and was positively surprised as well :)


I know and the piece is spot on, not to mention the fact that with PostGIS its the best database for spatial anything, much better then expensive proprietary solutions. Thanks postgres!


I'd argue that Spatialite also has its place, but I use both that and PostGIS very happily.


I think it is more a matter of time because that is what mature relationships of the sort that take the good with the bad and acknowledge gradual change and value stability require. It's what separates love songs to Lisp from enthusiasm for React.


> In the early years you had the standard format of index b-tree that most database engines leveraged. Then quietly but confidently you started adding more. Then came K-nearest neighbor, generalized inverted indexes (GIN), and generalized search-tree (GiST), only to be followed by space partitioned GiST and block range indexes (BRIN). Now the only question is which do I use?

I also wonder which index is appropriate for which case. I googled out this [1] and obviously all the chapters from 62 to 65 at [2]. I'm putting them on my reading list and posting them here just in case somebody else is asking the same question.

[1] https://devcenter.heroku.com/articles/postgresql-indexes

[2] https://www.postgresql.org/docs/current/static/internals.htm...


B-tree for most ‘normal’ data. You mostly only need the other indexes for cases where your data isn’t just simple values, like if your column contains stemmed search terms or arrays or JSON.

If you’re using ORM, then your ORM documentation should be telling you which index is the most appropriate if it isn’t a b-tree.

There are also some corner cases where you can see a speed up if you have tons of data that’s being written in an append-only manner, but for the most part b-trees are the way to go for chars, ints, bools, etc.


I haven't found an explanation of when to use BRIN indexes that works with my brain. anyone have a good link?


I wrote http://pythonsweetness.tumblr.com/post/119568339102/block-ra... because I had the same problem, maybe see if it works for you


Awesome, thank you


From Wikipedia, it sounds like the time to use it is when:

- You have enough a huge amount of data (hundreds of millions or more rows)

- You want to index a column whose values are never changed, and are monotonically increasing (e.g. a created_at datetime field).


At the top:

> I’ve always felt an affinity for you in my 9 years of working with you.

At the bottom:

> Postgres, I just want to say thank you for the past ten years together.

It's only logical to assume that it took him a year to write the letter. :)


> It's only logical to assume that it took him a year to write the letter. :)

In German we say "gut Ding braucht Weile", i.e. "A good thing takes time".


Maybe he rounded. He could've started the letter after 9 years and 182 days and finished it a day later.


Maybe on the first year he didn't do professional work with it, and only used in on personal projects?


And specifically, thank you Postgres contributors and maintainers for your time and energy. You've done your community and peers an invaluable service.


Postgres falls into the category of things which should be considered defaults. Yes, sometimes an alternative choice is indeed the correct one, but generally if a project isn't using Postgres it's wrong.

It really is excellent.


Not critiquing but more curious, why does Postgres seem to get so much love while MariaDB is mostly ignored on HN?


I'm not in a position to answer your question, but I thought you might be amused to know that there was once a day where a question like yours would have been "why does MySQL get all the attention while PostgreSQL is largely ignored on Slashdot." I probably asked that question on more than one occasion. PostgreSQL was already more capable and mature than MySQL at the time.


MariaDB is a fork of MySQL... which already had its heyday when Postgres was more immature.

I've been using Postgres since 7.x (early 2000s) and everyone loved MySQL back then. In fact, the late 90s up to maybe 5-6 years ago, MySQL got more attention than postgres.. that's a long time for any software package.

There are a lot of reasons for MySQL sort of losing its position -- from the Oracle takeover, to the fork, to the various compromises that were made early on (that Postgres didn't make)... and together + Postgres continuing to make steady progress during that time, has put Postgres in the lead.


I don't believe there was ever a time when Postgres was more immature than MySQL. There was a time when hype-driven development was immature enough to prefer MySQL, with its silent data loss & data corruption issues, over Postgres. This was, IIRC, mostly due to performance. It turns out, though, that once one configures MySQL not to lose & corrupt data, it's slower than Postgres.

Honestly, Postgres has always been in the lead — it's just that as folks have gained experience, they've discovered that they want a stable, reliable, fast database.


Around 2001 Postgres was more reliable than MySQL and more advanced in many ways, but it did have some odd omissions which made it fair to describe it as immature.

For example if you wanted to drop a column you had to create a new table and rename it over the old one.


when hype-driven development was immature enough to prefer MySQL

I don't think that's an accurate description. In the 90s computers were _a lot_ slower.. and having the option of putting up with a few warts in exchange for increased performance was a significant advantage over postgres.

That early advantage gave them a much larger ecosystem - apps that required MySQL, developers that were familiar with it, etc.

It took years for that early advantage to be overcome by postgres.. both by improving Postgres performance, and as computers became faster people were less willing to make the compromises that MySQL provided.

I think it's worth acknowledging MySQL's early advantages, and not pretending 15 years of popularity was somehow because people were "immature".


I think MySQL's advantage was that it was installed on some many hosting companies servers (along with PHP) that it became ubiquitous for many.

Most DB admins that I knew could see no clear advantage of MySQL when we looked at the details apart from for failover and multi-host uptime has an easier deployment.

Disclaimer I have been using PostgreSQL for perhaps 20 years now.


That's the ecosystem that developed because of MySQL's early performance advantage... that wasn't until later on.

The performance issue is actually a core part of MySQL's history.

We have to go way back to the early 90s.

Back then Postgres didn't use SQL. There was a project called mSQL that created an SQL layer on top of Postgres... but it didn't perform very well on the slow computers of the early 90s. So mSQL created it's own lightweight DBMS to replace Postgres.

mSQL was popular in the mid-90s, and it's performance started the creation of that early ecosystem that you're referring to.

Later MySQL took mSQL and added more features, had a more liberal/cheaper license, and kept the mSQL API.. and that ecosystem that developed around mSQL continued with MySQL.

So the performance advantage that gave MySQL it's lead, began pre-Postgres95 (that added SQL to postgres), that created the ecosystem that kept MySQL popular.

When MySQL passed mSQL in popularity in 1997, Postgre95 was a little over a year old, and was just recently renamed to PostgreSQL (while mSQL was about 3 years old at the time).

If Postgres was more performant, mSQL would have been based on it, and MySQL probably would have never existed. Instead, by the time Postgres addressed the SQL issue, they were 2 years late, and by the time they addressed the performance issue, they were a few more years late (by the early 00s, IIRC the performance was comparable).. and by then MySQL had cemented its lead in the market.


Keep in mind that the "performance advantage" MySQL supposedly "had" over PostgreSQL mostly came down to a bad default setting in PG. Back in early PG versions (7.x from memory), the default shared buffers setting value was extremely low, in order to allow PG to "just work" even on very low memory systems. (8MB?)

The effect of not changing that from the default was PG then pegging the cpu at 100% (moving stuff into and out of the tiny buffer) with terrible performance.

When we raised that default to something reasonable for (at the time) modern systems... our performance magically increased substantially from the perspective of many non-DBAs.

Note - The exact details are in the PG email archives if anyone cares enough to look. :)


NO. Nothing I wrote above has anything to do with what you've written here. (It's so irrelevant, I have to ask: did you read my post at all?)

PG 7.x was 2000-2003. A full 5 years after MySQL was released.. and 7 years after mSQL.

The performance of mSQL versus pre-Postgres95 has nothing to do with a setting in PG 7.x released years later.

The performance advantage of mSQL was directly related to the stripped down DBMS that was developed to replace Postgres (AGAIN IN 1993) and the incomplete SQL implementation (AGAIN IN 1993)... both tradeoffs that Postgres did not make.

The issue you're referring to is a separate performance issue that did not occur until years later.. after MySQL already won the marketshare race.

Between the start of Postgres in 1986 and PG 7.x, there's about 14 years of history you're forgetting.


I think that's not quite right. There's a lot of things besides default settings that lead to not so great performance. Before spread checkpoints, HOT, short varlenas, readonly transactions not needing a txid, the free space map (remember max_fsm_pages), autovacuum, custom plans for prepared statements, asynchronous commits... it was a lot harder to use PG for higher throughput systems.


Those likely contributed, but the shared_buffers setting was the outright major culprit at the time.

Go look back through the -hackers mailing list, probably ~2001-2002. Or maybe just look at the timestamps for when the default changed, in order to get the right time range. :)


I used postgres before and after all of these changes, and I used mysql at that time too...

And before you accuse me of anti postgres bias: I work on postgres all day :)


Some people will hate my summaries here, but:

(1) The strength of PostgreSQL is that it offers a extremely Oracle-like experience, for free.

(2) The strength of MySQL (and its forks like MariaDB, Percona, Vitess, etc) are much better operational support for clustering, replication, sharding, etc.

Conversely, PostgreSQL is harder to take to scale, and MySQL is much more weird and quirky in basic functionality.

Industry hype aside, the reality is that the overwhelming number of applications AREN'T very high-scale. They're dealing with gigabytes of data, not terabytes or petabytes. And the professionals who deal with petabytes for a living tend to spend less time chatting on web forums, compared to students and hobbyists and bored line-of-business developers.

So it's not too surprising that there's more discussion and more love for the database that's best suited for the scale at which most people operate.


2 used to be true, but is less and less with every release. Won’t say it is false yet, but PG 10 is getting there. What is still left to do?


CockroachDB-style "set and forget" clusters.

If I can just put a container image into my kubernetes cluster, and set it to auto-scale, and postgres automatically scales depending on load, and it all just magically works, then it’d be perfect.

Knowing how well engineered everything else in PGSQL is, I’m sure this is going to happen sooner or later.


Mysql also doesn't have range-based-sharding (in any external tool that I know).

You have to look at CockroachDB(postgresql driver) vs TiDB(mysql driver) unless you want to wait 10 years for that to happen in pg/mysql land.


MariaDB has the unfortunate burden of being a drop-in replacement for MySQL which means all the warts, oddities and downright horrifying stuff are still there.


Well, all the horrifying stuff, minus Oracle.


Both are good, but I think the current trend towards postgres is because

- postgresql arguably has a wider featureset than mysql/mariadb [1]

- recent versions of postgresql have solved many of the issues that used to prohibit adoption (scaling issues, logical replication, etc)

- since the Oracle aquisition of mysql and the subsequent mariadb fork, the waters are a little muddied for potential mysql/mariadb adopters with regards to which fork to choose and if that is the mysql fork, whether it will remain free.

- postgres is default on heroku

[1] https://en.wikipedia.org/wiki/Comparison_of_relational_datab...


PostgreSQL gets so much love due to its exceptional quality.

Perhaps MariaDB is not of exceptional quality.


I prefer Mysql/MariaDB's `show tables` vs Postgres' `\dt`. And other commands. More like sql.


Postgres has similar SQL commands.

Commands that start with \ are shortcuts in the psql client that are translated into SQL commands.

You can always use SELECT * FROM pg_tables; and operate on those tables as if it was a normal SQL relation (because it is).

Additionally, if you use psql -E, it shows you the SQL it runs for such shortcuts. For \d (Show relations) it runs:

    SELECT n.nspname as "Schema",
      c.relname as "Name",
      CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END as "Type",
      pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
    FROM pg_catalog.pg_class c
         LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
    WHERE c.relkind IN ('r','v','m','S','f','')
          AND n.nspname <> 'pg_catalog'
          AND n.nspname <> 'information_schema'
          AND n.nspname !~ '^pg_toast'
      AND pg_catalog.pg_table_is_visible(c.oid)
    ORDER BY 1,2;


Good thing there is a shortcut :)

Can you do `alter table` on pg_tables?


pg_tables is a view, which only loads data from a few other internal tables, but you can modify many of them, yes. (And many extensions do that).

Specifically, pg_tables is defined as

    CREATE VIEW pg_tables AS SELECT n.nspname AS schemaname,
        c.relname AS tablename,
        pg_get_userbyid(c.relowner) AS tableowner,
        t.spcname AS tablespace,
        c.relhasindex AS hasindexes,
        c.relhasrules AS hasrules,
        c.relhastriggers AS hastriggers,
        c.relrowsecurity AS rowsecurity
       FROM ((pg_class c
         LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
         LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
      WHERE (c.relkind = ANY (ARRAY['r'::"char", 'p'::"char"]));


Cool. I like the Posgres is written in Postgres. Of course, the normal commands for changing the schema are a better approach.


yes and they are under transactions that your can roll-back, does MySQL do that?


Damned with faint praise.


Here's a couple reasons: - call functions(stored procs) as tables and join to them - regular expressions - order by in your views & functions - order by INSIDE your aggregates - create your own aggregates - make your own window functions - left join on your recursive CTE - put temp tables in your function(stored proc) that you can also call as a table (see above) - lateral joins - filter syntax to do pivots - WITH ORDINALITY - generate_series - dollar quoting - write-able CTE's - arrays - ranges - intervals - jsonb (which can be indexed)

1. transformative/analytic functions 2. huge variety persisted formats 3. ability to store logic in the DB tier and re-use it


Because MariaDB is based on MySQL which is fundamentally broken: check out how basic SQL doesn‘t work on Youtube. And then there is silent data corruption and six different storage engines because Monty & co. just couldn‘t get the code to deliver decent performance for the generic case. Then there’s the nonexistent OS based authentication and very amateurish database backup capabilities. With all of these „features“, MariaDB is „a recipe for success“.

Meanwhile PostgreSQL has always been about reliability and stability and has been getting better and better with every release.


I don't understand this kind of argument. And I'm about as big a postgres enthusiast as you can get. Both postgres and mysql/mariadb have strengths and weaknesses. In some areas they overlap, in some not. In some areas they're of philosophical nature, in others purely technical.

I'm fine with arguing that $pro{je,du}ct is better for most cases. But that doesn't mean the other alternatives have to be denigrated as being pure shit. That's not going to convince anybody that's not already convinced, and stirs um enmity.

FWIW, I think both postgres and mysql/mariadb benefit from the competition.

Do I think that postgres is overall the better project? Yes. Do I think mysql is shit? No.


It's really simple: MySQL / MariaDB do not function correctly for the feature set they supposedly support, while PostgreSQL works correctly for that which it supports and people recognised this eventually, which is thankfully why PostgreSQL is the leading open source relational database management system.

There is not much philosophy here: MySQL should have never been used since it's always been busted, and thankfully now it's lost so much market share that it's become all but irrelevant. If you want to build bullet proof, robust systems and don't want or can't afford Oracle, one of PostgreSQL variants is a critical piece for building such a system.

I like sleeping through my nights instead of wasting my life away troubleshooting MySQL. How about you?


> I like sleeping through my nights instead of wasting my life away troubleshooting MySQL. How about you?

I waste them working on postgres. Doesn't mean I've to denigrate other systems.


At least you’re investing your time in a product which works correctly and isn’t throughly broken by design. Sometimes it’s necessary to point out MySQL’s flaws lest someone get the “bright” idea to put it in production, then I get the job and then get stuck with that disaster of a database (that actually happened to me several times). Raising awareness is the best prevention.


Stonebreaker's Ingres Papers revenge. The Berkeley IT crowd was remarkably generative of good and enduring outcomes. Their Free/Open Cathedral efforts still anchor our frothier bazaars. Great read.

There are vast real world application domains globally where Postgres one stop shopping for both infrastructure, dev targeting, training and MRO is the clear win with negative returns on BFC better faster cheaper.

Good enough and comfortably familiar haptics is really important. The worldly focus brings us worldly wins. We apply the applied math tools to human concerns to find jobs solving problems and not get jobs to solve our problems. Balancing those two sets of problem is the act.


+1 it's a great piece of software that delivers tremendous value to users and continues to add amazing features every year. Cheers and thank you to all the maintainers and contributors.


any good resources for going deeper on the analytical capabilities of postgres? ya know, beyond basic window functions. How can i leverage stored procs etc?


Look into PL/pgSQL [1]. PostgreSQL also offers a few other languages such as Python and Perl [2]. Please note that the individual extensions have to be installed.

[1]: https://www.postgresql.org/docs/current/static/plpgsql.html

[2]: https://www.postgresql.org/docs/current/static/server-progra...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: