
PostgreSQL is the worlds’ best database - known
https://www.2ndquadrant.com/en/blog/postgresql-is-the-worlds-best-database/
======
unnouinceput
For most of the projects where the DB really mattered, throughout my 10+
freelancer carrier, it came down to one thing that client really cared about.
Performance. Nothing else mattered, not license price, not whistles and bells,
not hype. My clients wanted to have data in front of their eyes the same
second when they clicked the button. And when you have a table with 100
million rows in it, and an application is not loading data from that table to
show it to its users in the same second, you already lost the battle.

Out of my tries, my clients tries, and my friends tries, only one DB was up to
this task. Not Maria, not Access, not Oracle, not NoSQL, not MSSQL, not MySQL
- all of them failed. The only one was PostgreSQL.

And before starting to bash me, please do this. Make a small application that
will show a map, put 100 million points of interest on that map, that are
contained in the table we talk about, and now as you scroll the map, select
the middle of view as your circle and select on a small radius only those
points of interest inside that radius. No more then a thousand points of
interest, lets say. When you do that within a second, you got yourself a good
database. For me PGSQL was the only one capable to do this reliable.

~~~
jka
While I do love PostgreSQL (and PostGIS which is excellent at transforming,
indexing and querying geographic content) - I feel like MSSQL should get a bit
of a shout-out in relation to your comment on performance.

An experienced database developer with help from "SET STATISTICS IO ON"[1] and
query plans[2] can achieve incredible MSSQL query optimization results.

PostgreSQL has good query plan output via the EXPLAIN[2] statement - but I
haven't (yet?) seen PostgreSQL produce per-query IO statistics.

[1] - [https://docs.microsoft.com/en-us/sql/t-sql/statements/set-
st...](https://docs.microsoft.com/en-us/sql/t-sql/statements/set-statistics-
io-transact-sql)

[2] - [https://docs.microsoft.com/en-us/sql/relational-
databases/pe...](https://docs.microsoft.com/en-us/sql/relational-
databases/performance/display-an-actual-execution-plan?view=sql-server-ver15)

[3] - [https://www.postgresql.org/docs/current/sql-
explain.html](https://www.postgresql.org/docs/current/sql-explain.html)

~~~
ants_a
Configuration parameter `track_io_timing = 'on'` will measure I/O time. And
running `EXPLAIN (ANALYZE, BUFFERS)` will output per plan node buffer
statistics and I/O time spent. On most modern system IO timing has no
measurable overhead and should be permanently enabled. Collecting buffer
statistics is also relatively cheap and could be enabled for all queries. For
example the following configuration will get a log entry with an explain plan
and per node I/O stats for every query above a threshold:

    
    
        shared_preload_libraries = 'auto_explain'
        auto_explain.log_min_duration = '5s'
        auto_explain.log_analyze = true
        auto_explain.log_buffers = true
        auto_explain.log_timing = false

~~~
mulander
You don't need to log every auto explain. Just enable track_io_timing and
pg_stat_statements and you get per query IO performance metrics much cheaper.

Table F.21. pg_stat_statements Columns

[https://www.postgresql.org/docs/11/pgstatstatements.html](https://www.postgresql.org/docs/11/pgstatstatements.html)

blk_read_time

double precision

Total time the statement spent reading blocks, in milliseconds (if
track_io_timing is enabled, otherwise zero)

blk_write_time

double precision

Total time the statement spent writing blocks, in milliseconds (if
track_io_timing is enabled, otherwise zero)

~~~
jka
Thanks for this - 'pg_stat_statements' definitely looks like a similar
feature, in particular because it reflects the count of disk-related
operations (not just I/O time).

It took some brief configuration but I've been able to try it out locally and
will refer to it when doing PostgreSQL query performance tuning in future.

------
irrational
We were on Oracle for 15 years, but the cost was just too high. We decided to
move to Postgres. We thought it wouldn’t be as good as Oracle (you get what
you pay for, right?), but it ended up being better. Performance is better.
Documentation is better (by far). Adherence to SQL standard is better. Many
SQL queries are simpler. Null/empty string is sane. Etc.

Now I don’t have experience with MySQL, SQLServer, etc., but between Oracle
and Postgres, Postgres is definitely the best database.

~~~
briffle
My last employer was a large Oracle place, although everything was standard
edition. I am so much happier now.

Everything that used to be a 'that is an enterprise edition feature' is now
baked in.

concurrent index creation is amazing.

The simplicity and speed of pgBarman vs RMAN is mind boggling. RMAN in Oracle
Standard Edition can only do full backups.

Full read-only access to a standby? That is huge! (enterprise only for the big
O)

Realtime replication (not log shipping), also enterprise only for the big O.

I'm a sysadmin, so that is what I notice right away, but our developers love
it too, for many other reasons.

------
kardianos
I use PostgreSQL and MS SQL Server. I love Postgres. There are some things
that SQL Server does differently that I would love to see supported in
Postgres:

* Protocol: (a) no wire level named parameters support; everything must be by index. (b) Binary vs text is is not great, and binary protocol details is mostly "see source" (c) no support for inline cancellation: to cancel a query client can't signal on current TCP connection, the client must open or use a different connection to kill an existing query, this is sometimes okay, but if you are behind a load balancer or pooler this can end up being a huge pain.

* Multiple Result Sets from a single query without hacks. While you can union together results with similar columns or use cursors, these are hacks and work only in specific cases. If I have two selects, return two result sets. Please.

* Protocol level choosing of language. Yes, you can program in any language, but submitting an arbitrary query in any language is a pain and requires creating an anonymous function, which is a further pain if your ad-hoc query has parameters. I would love a protocol level flag that allows you to specify: "SQL", "PG/plSQL", "cool-lang-1" and have it execute the start of the text as that.

I do love Postgres recently added Procs! Yay!

~~~
egeozcan
I'd love to see the MERGE statement from MSSQL in other databases. Wonderful
tool for integrations.

~~~
manigandham
Postgres has UPSERT (INSERT INTO ... ON CONFLICT DO ... )

I wish MSSQL had this.

~~~
electrotype
But this doesn't deal with DELETEs of a true MERGE statement

~~~
manigandham
No, it's not a replacement. It's just a very useful feature that I wish all
DBs had.

------
INTPenis
My first job in IT, I was 18, it was the early 2000s. I was a nerd but in
professional IT I was essentially a blank slate.

I lucked up and ended up with a real hacker for a boss. As early as 2001-2002
he had saved entire businesses by migrating them from mysql to postgres.

He was a BSD guy and a postgres guy. He made me into a fanboy of both those
technologies.

So out of sheer luck I've preferred Postgres for over 15 years, and it has
never disappointed me.

Unlike BSD, which often disappointed me once I learned how much easier and
mature Linux was to use.

~~~
jonnypotty
In my experience when people are able to "fix" performance issues by moving
from sql database a to sql database b it's because they understand database b
and how to optimise for it. Not because the database is better or worse. My
company was "saved" moving from postgress to mysql. The difference was the guy
and his specific knowledge, not the database technology.

~~~
mamcx
> As early as 2001-2002... from mysql to postgres.

MySql was pretty bad in the first versions. I'm in the enterprise/erp
development world, and when I look at mysql when it start to show in the
radar, I can't believe the joke.

MySql get good around 5? Now I can say all major DBs are good enough not
matter what you chosee, (still think PG is the best around) but MySql was
always the more risky proposition, the PHP of the db engines...

------
drdec
I think that PostgreSQL itself is great. However, the developer client tools
(pgAdmin) leave something to be desired. The old pgAdmin3 was fine, if lacking
bells and whistles. The new pgAdmin4 however, was not very good last time I
tried it [1]. It was implemented as a client/server web application and had
frequent issues.

Can anyone recommend a good client for PostgreSQL?

[1] - I see that there have been some new releases in 2020 so I ought to check
on them. The version I tried earlier was 4.11.

~~~
manigandham
DBeaver: [https://dbeaver.io/](https://dbeaver.io/)

Native cross-platform and works across dozens of databases with lots of
features.

Another option is Jetbrains DataGrip:
[https://www.jetbrains.com/datagrip/](https://www.jetbrains.com/datagrip/)

~~~
Shorel
IMO Java is not native anywhere.

I prefer to use HeidiSQL running on Wine.

~~~
manigandham
It's more native than an Electron app.

------
Jonnax
This is advertising of course. But if I had to select an SQL DB postgres is my
only choice.

Perhaps I don't know enough about databases and their differences.

Anyone have some pros and cons of others?

Like why would I pick MySQL, Microsoft, Oracle, Maria etc over Postgres?

Apart from support that you gotta pay for.

~~~
endymi0n
Having worked with dozens of different databases over lots of years, I can
certainly say that Postgres is my go-to _default_ database over all the other
relational ones.

In the same way that Golang claims it's the 90% language (
[https://talks.golang.org/2014/gocon-
tokyo.slide#1](https://talks.golang.org/2014/gocon-tokyo.slide#1) ), I'd say
Postgres is the perfect 90% database.

However, we're still using at least four other databases in production, and
the reason why is that because it's doing something of everything, there are
lots of niches that it doesn't fill well. It's all about tradeoffs.

If you've got a Full Text Search problem, PG is "good enough", that is until
you need to index more challenging languages like Korean or need more exotic
scoring functions. For that, Elastic is better.

You can do small scale analytics on PG, but at larger sizes, the transactional
setup gets in the way and you're better off with BigQuery/Redshift/Snowflake.

You can scale vertical pretty big these days, but if you need linear
scalability while still guaranteeing single digit ms access, you're better off
with ScyllaDB.

However, there isn't a single project I wouldn't not start on a Postgres these
days, that's how much I love it.

~~~
guiriduro
I wouldn't say PostgreSQL was the analogue of golang: go's initial popular
growth for being 'good enough' while being arguably oversimplified, where
simplicity masks problems, under the pretence that YANGNI always applies
(initial N emphasizing the Not of Aren't), seems much closer to MySQL's
history.

PG was driven by engineering correctness, by considering what DBAs 'Are Going
to Need'. Sometimes that strictness worked against popular growth but in the
end it has worked out well, as many programmers figured out they also needed
it. I'd say the programming language analogue would likely be Rust, lets see
in 10 years where the 90% case lies.

~~~
KptMarchewa
Rust is too complicated for 90% cases. 90% of cases are simpler with GC.

~~~
ansible
Rust is more complicated than some languages with GC.

However, if you are writing some code with multi-threading (who isn't these
days), then you may eventually want the help Rust gives you there as far as
correctness goes.

~~~
jfkebwjsbx
Multithreading is only needed in very few fields. Multiprocessing is way
easier and works fine for most. And the majority of programmers still only
need to write serial code.

~~~
setr
And if not serial.. embarrassingly parallel

------
stickfigure
Postgres is my go-to RDBMS, but I do have one serious complaint: Connections
are too expensive.

This is a side-effect of the old-school one-process-per-connection
architecture that Postgres uses. MySQL (ick) easily handles thousands of
connections on small servers; with Postgres you will need a _LOT_ of RAM to
sustain the same, RAM that would be better served as cache.

I've found (at least, for my current app) that the number of connections
defines my scaling limit. I can't add more appserver instances or their
combined connection pools will overflow the limit. And that limit is
disappointingly small. It's not so painful that I want to reach for another
RDBMS, but I'm bumping into this problem way too early in my scale curve.

~~~
dpc_pw
It might be possible write a small proxy in Rust, that would run next to a PG
instance, accepted connections using `async` and then forwarded queries using
some limited size connection pool.

~~~
stickfigure
These are active connections running transactions - unfortunately there's no
substitute for just having another connection.

~~~
dpc_pw
If there is a lot of connections actually running transactions, then I'd
expect the db will become overwhelmed due to number of transaction, and per-
connection overhead is not going to be a problem.

How I understood the problem is thousands of connections, of which most do a
query from time to time only.

Judging by the other comments, it seems solutions like that are already
available.

~~~
stickfigure
Most of my transactions are relatively long-running (minutes) and mostly idle,
so the db can handle the transaction load.

There honestly doesn't seem to be a good solution to this problem. I have
reorganized my app a bit to try to keep the transactions shorter (mostly
checkpointing) but it's using architecture to solve a fundamentally technical
problem. I wouldn't have this problem with MySQL.

Switching from processes to threads (or some other abstraction) per connection
isn't likely to show up in Postgres anytime soon, so I guess I'm willing to
live with this... but I'm not going to pretend that Postgres is without some
serious downsides.

------
lllr_finger
Postgres use always reminds me of this presentation:
[http://boringtechnology.club/](http://boringtechnology.club/)

I self-admittedly love esoteric databases and storage engines to a fault. I'll
try to shoehorn things like RocksDB into whatever personal project I'm working
on.

At work however, the motto I spread to the teams I work on is "use Postgres
until it hurts". And for many, many teams - Postgres will never hurt. I'm very
happy for its continued existence because its been a solid workhorse on
various projects I've worked on over the years.

~~~
duckmysick
> I self-admittedly love esoteric databases and storage engines to a fault.

Any interesting notes and observations after using those esoteric tools?

~~~
lllr_finger
Putting thought into design patterns and a storage engine that accents that
choice, it's sometimes possible to eschew caching and distributed storage
altogether - saving you from a whole host of complexity.

It can also be challenging to accurately assess a database's performance and
correctness. I've been using C/Rust bindings when possible for the former, and
Aphyr's Jepsen test results for the latter. Unfortunately there's no silver
bullet on these topics - you pretty much have to test all your use cases.

------
aeyes
I love Postgres as much as the next guy but there are three things that really
annoy me:

Postgres is still laid out for for 9 to 5 workloads, accumulating garbage
during the day and running vacuum at night. Autovacuum just doesn't cut it in
a 24/7 operation and it is by far what has caused the most problems in
production.

No query hints and no plan to ever implement it. Making the planner better
only takes you so far when everything depends on table statistics which can be
easily skewed. pg_hint_plan or using CTEs is not extensive enough.

Partial indexes are not well supported in the query planner, thanks to no
query hints I can't even force the planner to use them.

~~~
patrec
Do you have a good Postgres DBA? Autovacuum can indeed ruin your day if you
have a 24/7 DB that is delete or update heavy (the default setting are not
very aggressive and once it falls behind badly it will never catch up and
you'll need to do some emergency manual vacuuming). But I can assure you it's
possible to run 24/7 DBs fine without running into autovacuming problems by
making sure things are configured right. I've experienced both regular
problems due to bad config and their complete disappearance.

------
ggm
PostgreSQL and ZFS is a marriage made in heaven. Block aligns to disk
behaviour, snapshots make great low latency db dumps trivial. Never regretted
making the combo

~~~
pritambarhate
Can you please give us more info about your workload? Is it very high
throughput DB? Especially for Inserts and Updates? I searched on ZFS and
PostgreSQL performance a couple of years ago on the Internet and popular
opinion seems to be databases with Copy on Write type of file systems don't
give the best performance.

~~~
riku_iki
I am not deep expert, but my impression is that Postgres MVCC is kinda Copy on
Write approach already (no in place updates, but you write new page for each
update), so it should be perfect marriage..

------
forinti
I manage Oracle and Postgresql instances and I agree with this, except for one
small thing: Postgresql doesn't have packages.

Otherwise, it is much simpler to manage than Oracle (which really is a system
turned inside out). The documentation is really good and straight to the point
(Oracle can't help but turn everything into an ad - nvl uses the best null
handling technology on the market and such).

~~~
mulmen
I spent many years on Oracle. I do not miss smug Tom Kyte blog posts about the
pointlessness of BOOLEANs. Nor do I miss endless hours Googling around
Burleson Consulting pages because we couldn’t afford support.

I do kinda miss how easy it was to do partitioning but I’m on Redshift now so
it doesn’t matter.

~~~
fullstop
Oh, man, that shirtless guy. Thanks for putting that back into my mind!

------
thorin
This talks a lot about Postgres (which is great ) but not about other
databases so the title isn't very accurate. If you ignore cost it certainly
doesn't explain why you'd work with Postgres rather than Oracle or SQLServer.

I've used Oracle for 20+ years, SQLServer for > 10 years, and Postgres from
time to time and would say they are all good tools for a general use case.

~~~
DeathArrow
Cost isn't an advantage for the company I work for because we licensed SQL
server very cheap in big volume. If devs can work easier and faster with SQL
server, then we end up saving money because time is money.

------
oftenwrong
As mentioned, PostgreSQL's documentation is some of the best in the software
world. With most software project documentation, one ends up scouring
StackOverflow, mailing list archives, and GitHub issues trying to figure out
how to do something. With Postgres, I can just go straight to the official
documentation, and read it. It has yet to fail me.

------
irjustin
While I disagree that it's the world's best, it is my default for all projects
and general RMDBS.

I used to be MySQL but it felt like over time it was becoming hack on top of
hack for each subsequent version while PG lacked a number of features in the
early 2010's it has clearly caught up with great care to its codebase.

There are scenarios that PG does not win and should not be used, but if we're
talking about the most applications that it covers well, PG is it.

~~~
sgt
Out of curiosity, what RDBM do you feel is the best?

~~~
miclill
Why should there be one best DB? It's a game of tradeoffs. Always was, always
will be.

~~~
sgt
Sure... but if you HAD to choose one? Just kidding. I think the word "best"
could easily be substituted with "your go-to DB"... or simply the best for
<category of what you want to do>.

------
oerpli
I used PostgreSQL as student and during research and had to switch to
Microsoft SQL Server as my company uses that since forever.

For features I used on both, the average quality of the SQL Server
implementation is between middling and pathetic (mostly the latter). I cannot
recall a single instance where I thought "that's nice" or anything positive
about how SQL Server does something. At best, it has been "this is not awful".

Things that sucked especially hard:

\- Datetime API: In postgresql it is a joy to use and most features one would
like to have to analyze data are directly there and work as they should. The
analysis for my master's thesis was completely done in postgresql [1]
(including outputting CSV files that could then be used by the LaTeX document)
- without ever feeling constrained by the language or its features. Meanwhile,
the way to store a Timespan from C# via EF is to convert it to a string and
back or convert it to ticks and store it as long.

\- The documentation: The pgsql documentation is the best of any project I
have ever seen, the one for SQL Server is horrible and doesn't seem to have
any redeeming qualities. \- Management: SMSS is a usability nightmare and
still the recommended choice for doing anything. I assume there are better
ways but I haven't found them and if I have, they usually didn't work.

\- The query language: In psql you can do things like

    
    
      select a < b as ab, b < c as bc , count(*) from t group by 1,2; \crosstabview
    

To get even half of that in SQL Server one would do

    
    
      select case when a < b then 1 else 0 end as ab, case when b < c then 1 else 0 end as bc, count(*) as count group by case when a < b then 1 else 0 end, case when b < c then 1 else 0 end
    

I think crosstab queries are unsupported (there are PIVOTs but I haven't yet
looked into them a lot).

Maybe my use cases are a bit weird and I haven't encountered the better
features of SQL Server yet. Would appreciate if someone could point out what I
am doing wrong.

[1]:
[https://github.com/oerpli/MONitERO/blob/master/sql/queries/r...](https://github.com/oerpli/MONitERO/blob/master/sql/queries/ringsize_flow.sql)

~~~
philliphaydon
SQL Server has a ternary. So your example would be written like:

    
    
      select iif(a < b, 1, 0) as ab, iif(b < c, 1, 0) bc...
    

Unsure what crosstabview is as I've never used it.

I hate sql server tho. Basic things like indexable arrays in postgresql make
it amazing.

~~~
oerpli
This makes it somewhat easier on the eyes but is still 80% too verbose.

Crosstabview converts a table from:

    
    
      |a x 0
      |a y 1
      |a z 2
      |b x 1
      |b z 5
    

to

    
    
      |  x y z
      |a 0 1 2
      |b 1   5

------
jwr
PostgreSQL is great if you:

a) are OK with using SQL (this is not obvious)

b) do not need a distributed database

I've spent a lot of time on looking at database solutions recently, reading
through Jepsen reports and thinking about software architectures. The problem
with PostgreSQL is that it is essentially a single-point-of-failure solution
(yes, I do know about various replication scenarios and solutions, in fact I
learned much more about them than I ever wanted to know). This is perfectly
fine for many applications, but not nearly enough to call it "the world's best
database".

~~~
BozeWolf
So what do you suggest then?

Do you really care that much about the language? Shouldn't you care more about
your data? If you are not OK with SQL there are abstractions available.

~~~
jwr
> Do you really care that much about the language? Shouldn't you care more
> about your data? If you are not OK with SQL there are abstractions
> available.

I do care about SQL being a text-based protocol with in-band commands,
bringing lots of functionality that I never need (I do not use the fancy query
features for exploratory analysis, I have applications with very specific data
access patterns).

For my needs, I do not need a "query language" at all, just index lookups. And
I would much rather not have to worry about escaping my data, because I have
to squeeze everything into a simple text pipe.

------
kenfoo
PostgreSQL is also my go-to database for anything relational. My only issue
with it though, is the need to run vacuum full (which locks tables for
significant amount of time) to free up disk space to the operating system.

------
JacKTrocinskI
Is PostgreSQL doing anything in the area of autonomous databases? Oracle seems
to be doing a lot in this area right now and it seems to be promising.

------
ncmncm
That performance carries the day today is a _huge_ improvement since 1999.
Back then, there was also only one question that mattered, but the question
was, "Is it Oracle?", just as the only question about network equipment was
"Is it Cisco?".

It is hard to explain, at this remove, what horrible perversions were
performed in order to be able to say "yes" to such questions, or to avoid need
to answer them.

One example was a router that, at opposite ends of a long-haul network,
managed data flow that was utterly insensitive to the magnitudes of latency or
drop rate. They sold _hardly any_. It turned out the algorithm could be run in
user space over UDP, and that became a roaring business, for a different
company, because you could deploy without getting network IT involved. Mostly
IT didn't even want bribes. They just didn't want anything around that was
unfamiliar.

Database products had to pretend to be an add-on to Oracle, and invisibly back
up to, and restore from, an Oracle instance, because Oracle DBAs only ever
wanted to back up or restore an Oracle instance. The Oracle part typically did
absolutely nothing else, but Oracle collected their $500k (yes, really)
regardless.

------
NicoJuicy
For someone that is using c# , please have a look at the excellent Marten
library that shows the awesome functionality of Postgress as a document ( eg.
NoSql) + event store.

Postgres is not only for Sql, as I see multiple people referencing only the
SQL functionality.

For those that work with events/projections, Postgress supports js migrations
on events so you can update events in you store to the latest version.

------
jhoechtl
Bi-temporal tables support badly wanted.

[https://mariadb.com/kb/en/temporal-data-
tables/](https://mariadb.com/kb/en/temporal-data-tables/)

~~~
jacques_chester
Yes, though the SQL2011 temporal support was a dud compared to the state of
the theoretical art. A strong whiff of vendor-vs-vendor nonsense.

Folks in the PostgreSQL community are aware of temporal tables:

[https://www.pgcon.org/2019/schedule/events/1336.en.html](https://www.pgcon.org/2019/schedule/events/1336.en.html)

[https://www.2qpgconf.com/wp-content/uploads/2016/05/Chad-
Sla...](https://www.2qpgconf.com/wp-content/uploads/2016/05/Chad-Slaughter-_-
Henrietta-Dombrovskaya-Bitemporality-in-PostgreSQL_-What%E2%80%99s-New.pdf)

------
nojvek
I get the Postgres love but as someone who’s been doing mysql for a long time
I found postgres really hard to use. Like even setting it up on Mac was a pain
in the ass. How do I create a new db? Why is pgcli not as friendly as
mysqlcli? After 2 weeks of head bashing and lots of stack overflow reading I
gave up and went back to good ol MySQL.

For ease of use for a new comer, mysql seems to have nicer tooling and out of
box use. I’ve worked with mysql on sites with millions of daily active users
and it hasn’t been a problem. Snappy 10ms queries for most things. I guess one
really needs to learn ANALYZE and do the necessary index tuning.

I’m just saying, use the tool you’re most comfortable with, has good ecosystem
and gets the job done. For some it’s PGSQL, for some it’s MySQL, for others it
could be the paid ones. MySQl8 is pretty solid nowadays.

There is no universal “best”, just as there is no “best” car. All about trade
offs.

~~~
Traubenfuchs
> setting it up on Mac was a pain in the ass

What was your problem? All you need to call is:

docker run --rm --name pg-docker -e POSTGRES_PASSWORD=docker -d -p 5432:5432
-v $HOME/docker/volumes/postgres:/var/lib/postgresql/data postgres:latest

...and you are good to go. In depth guide: [https://hackernoon.com/dont-
install-postgres-docker-pull-pos...](https://hackernoon.com/dont-install-
postgres-docker-pull-postgres-bee20e200198)

------
luord
Might be a bit... Forcibly put, but I gotta say that PostgreSQL is one of my
favorite pieces of software ever created. And I don't even like C all that
much.

------
throwawaysea
> So, you want NoSQL, Riak, REACT, Redis, Mongo, etc.? PostgreSQL does all
> that. Admittedly not with all the bells and whistles of all of the original
> products. For example, PostgreSQL doesn’t create new shards for you for any
> of those. That’s still a manual process.

Isn't this a fairly big issue? I would think the convenience of automated
scaling is a primary reason to use these other tools.

------
_kyran
While we're all here talking about databases I've got a question for the
DBA/well versed software engineers.

What database or database design strategies to be used for an analytics
dashboard that can be queries real-time similar to Google Analytics?

The first real scaling issue I've come across (as a junior dev working on a
side project) is a table with 38 million rows and rapidly growing.

Short of just adding indexes and more RAM to the problem, is there another way
to design such a table to be effectively queried for Counts and Group Bys over
a date range?

Iimagine things can't really be pre-comuted too easily due to the different
filters?

~~~
resolaibohp
I currently run analytic queries on a page view dataset around ~600 million
rows on postgres. You can start by looking into partitioning by date. Also
take a look at your IO on your queries. IO was the biggest bottleneck for me.
Make sure you have hardware that is not limiting it.

------
lazyant
PostreSQL is absolutely fantastic and for most projects, unless there's an
existing legacy database or a purely Microsoft environment should be the
default RDMS.

My only complaints are:

1) No built-in or easy failover mechanism (and will never have one because of
their estated philosophy). I'll settle for a standard one and yes I know there
are different requirements and things to optimize for and there are 3rd party
solutions (just not one to integrate easily).

2) Horizontal Scalability. And yes I know of some solutions and other dbs
being more apt for this and asking the wrong thing.

------
einpoklum
PostgreSQL is a transaction-oriented row-store. It is slow for analytics, and
I mean by something like two orders of magnitude, if not more, relative to
state-of-the-art systems like HyperDB or Actian Vector, on typical analytic
workloads. You can extend it any which way you like, that won't change. (Ok,
if you can extend it with essentially a different DBMS then that could be
fast.)

Security, concurrency control - those are important for transactional work
facing numerous users. Not for analytic processing speed.

------
bertil
I’m assuming anyone who has to make that decision already knows this but,
while PostgreSQL is great to host a production database, it isn’t a great
choice for an _analytic_ database at scale, or to train or store your machine
learning features. It works, but it’s not great at scale.

You can get away with having a scheduled pg_dump early on, some reports on
that, while you figure out an ETL/Messaging process — but picking something
that handles concurrent large-scale queries will matter fast.

~~~
yeellow
Could you name some better alternatives for analytical db?

~~~
bertil
There are a lot of commercial solution that are geared towards reporting, from
old-school Oracle, TeraData even SAS (would not recommend that one). Some
options that came our recently when “Big Data” was a sentence people said:
Hive, RedShift, S3, Cassandra. New players that abstract a lot of the problems
from the others and add Machine-Learning capacities on top: Google BigQuery,
Snowflake. There are non-tabular structure like the graph based ones, Neo4j
e.g.: I’ve yet to find a business where that’s a viable option for analytics
for the first five years but I like that those are bringing something new.

Most of those are great and work: they have connectors to whatever language or
tool you want to use. The most promising tool you’ll need to handle most of
the transformation is either AirFlow or preferably dbt. Those are is
independent from the database, so don’t worry too much about the features that
vendors tout. One key thing: monitor all the queries going to that database,
find the expensive ones, those with suspicious patterns, etc.

Spin-up, response time can matter, but they are rarely a problem for most
“slow” analytic use cases; for instance, Google BigQuery takes 10 seconds no
matter what you query and it’s fine. On the other hand, _concurrency_ has been
an issue for me more than anything: all the analysts and managers trying to
update their dashboards on Monday at 10 am.

You rapidly get to a point where prices are high and negotiated, so you want
to think about your likely usage in the next years before you step into that
meeting. Key decision: by default, prefer the tech that is closest to the rest
of your stack because ingress is the easiest factor to predict.

Looker will be mentioned: it’s on top of all that, downstream from
AirFlow/dbt.

------
sandGorgon
i wish there was a management layer that runs and operates high availability
postgres databases on AWS/GCP/Azure...but is significantly cheaper.

AWS RDS makes it impossible to export snapshots to s3 (because the native
snapshot storage is super expensive).

EC2 m5.xlarge w/ 5000 GB storage costs 700$ RDS db.m5.xlarge w/ 5000 GB costs
1000$

Why is there no SAAS that lets me BYO my own EC2 and just runs and gives me a
dashboardy experience ESPECIALLY for snapshots and restores ?

~~~
briffle
GCP does have HA for postgresql:
[https://cloud.google.com/sql/docs/postgres/high-
availability](https://cloud.google.com/sql/docs/postgres/high-availability)

~~~
sandGorgon
GCP managed databases are more expensive than AWS.

They claim this is because (unlike AWS), they don't provision iops as
"credits".

There is an opportunity to play in this price arbitrage space where you can
take the cloud instances and manage a database on top of it.

------
axegon_
I'd like to add one more thing about postgesql : the source code is arguably
the cleanest code you'll ever see. It's second to none.

------
mcny
Let’s say I have a table for reservations: 1. Item ID 2. Reservation duration

What I would like is if I have an item ID of 1 and a reservation duration of
today 10AM to 11AM and I try to add another record of item ID 1 and duration
10:39AM to 11:39AM the insert should fail.

Apparently it is straightforward in db2 but not in postgresql?

~~~
woutgaze
Something like this will do the job in PostgreSQL:

    
    
      CREATE TABLE reservations (
        item_id uuid,
        duration tsrange,
        EXCLUDE USING GIST (item_id WITH =, duration WITH &&)
      );

~~~
nicoburns
Can this be made to work if the `id` and `duration` are in different tables
(with a joining key)?

------
kingdomcome50
PostgreSQL _is_ a great database, but pgAdmin (or any client I have tried) is
just _painful_ compared to SSMS.

And while SSMS certainly has some pain points, I have yet to find a DB client
tool that even comes close in terms of usability. For this reason alone _I_
prefer MS SQL.

~~~
wdb
Never got SSMS working on my work laptop. Showed the splash screen and that
was it. Weird

~~~
VVertigo
I hit that too: it was a known bug in SSMS and Microsoft released an update
that fixed it.

~~~
wdb
Good to know, thanks!

------
Beefin
Their JSON data type is atrocious, does anybody actually use it or is it a
checkbox item?

~~~
jacques_chester
You can join on arbitrary json fields, with indices. It is an extremely useful
swiss army knife.

I've used jsonb for performing ETL in-database from REST APIs.

I know Concourse from 6.0 uses it for storing resource versions. They had an
inputs selection algorithm that became stupidly faster because they can
perform joins over JSON objects provided by 3rd-party extensions. Previously
it was a nested loop in memory.

[https://github.com/concourse/concourse/releases/tag/v6.0.0#3...](https://github.com/concourse/concourse/releases/tag/v6.0.0#3602)

~~~
Beefin
Isn't it just a string? can you do nested queries? secondary indexes? really
anything beyond just querying the key-value pairs?

~~~
jacques_chester
It's a real type. You can do nested queries, follow paths into the structure
etc. I've performed quite sophisticated joins over jsonb records.

------
thosakwe
I use Postgres for everything. The one thing I wish they would do, though, is
improve the error messages when a query fails. At the very least, if I know
which position in the text is the beginning of an error, I can fix it.

~~~
anarazel
> The one thing I wish they would do, though, is improve the error messages
> when a query fails.

Yea, we definitely need to do better there. But it's harder than one might
think initially :(. My suspicion is that we (the PG developers) will have to
write our own parser generator at some point... We don't really want to go to
a manually written recursive descent parser, to a large part because the
parser generator ensuring that the grammar doesn't have ambiguities is very
helpful.

> At the very least, if I know which position in the text is the beginning of
> an error, I can fix it.

For many syntax errors and the like we actually do output positions. E.g.

    
    
      postgres[1429782][1]=# SELECT * FROM FROM blarg;
      ERROR:  42601: syntax error at or near "FROM"
      LINE 1: SELECT * FROM FROM blarg;
                            ^
    

(with a mono font the ^ should line up with the second FROM)

If you have a more concrete example of unhelpful errors that particularly bug
you, it'd be helpful. In some cases it could be an easy change.

------
consultSKI
Fair enough. Without Sybase XI or whatever version they would now be up to, I
can see where this could be true. But when MySQL added Views in v5.0.3 as I
recall, it met my requirements for an RDBMS.

------
tannhaeuser
Idk, I'm very pro-PostgreSQL but MySQL/InnoDB used to have that one feature
for easy replication via logical DB updates (sending UPDATES to replicas)
whereas PostgreSQL support for this was considered lacking, using binary block
updates instead if I'm remembering correctly. Also, MySQL, with replication,
was considered easier to setup and manage by your average web shop. That may
be a thing of the past, though.

Edit: also, best for what? SQLite is, in a way, the perfect DB is you don't
have concurrent writes, and I've worked with vector databases (think bitmap
indices on steroids) that totally outclass eg. Oracle and PostgreSQL for
analytics-heavy workloads

~~~
andruby
That used to be the case, but it's not anymore. Postgresql has logical
replication built-in since Postgresql 10 (released 2017-10-05). It's really
simple to run these days and the docs are great [0]

[0] [https://www.postgresql.org/docs/12/logical-
replication.html](https://www.postgresql.org/docs/12/logical-replication.html)

------
Beefin
Any love for NoSQL here? Many have all the same capabilities as Postgres/major
SQL tech but have a format that removes the need for an ORM.

------
polskibus
It would be if it had real clustered indexes like MS SQL, ie. Tables as btrees
that are maintained with each update.

~~~
ants_a
Good news is that the table access method API introduced in version 12 makes
it possible to build b-tree based tables as extensions.

------
jbergens
For scaling I assume some cloud solution or very distributed solution is
better than PostgreSQL. For example Azure SQL Database Hyperscale or
CockroachDb.

[https://redmondmag.com/articles/2019/07/03/microsoft-
buildin...](https://redmondmag.com/articles/2019/07/03/microsoft-building-
biggest-cloud-database.aspx)

------
throw0101a
Is there a Galera-equivalent for Postgres?

There's BDR, but it's closed source for recent version Pg AFAICT.

------
csours
How do you manage your PostgreSQL deployments?

------
rv-de
Of those several worlds? Is that a subtle joke? I don't get it ...

------
zerubeus
I'm a simple man, I see PGSQL in a HN post, I upvote, I don't even read the
article

------
redis_mlc
Ironically, the blog post shoots itself in the foot by starting with Postgres
security.

Postgres has very poor security compared to MySQL, and in fact, I tell
companies implementing compliance policies to shift to MySQL.

[https://www.cvedetails.com/metasploit-
modules/vendor-336/Pos...](https://www.cvedetails.com/metasploit-
modules/vendor-336/Postgresql.html)

The reasons are:

\- Postgres' grant model is overly complex. I haven't seen anybody maintain
the grants correctly in production for non-admin read-only users. By contrast,
MySQL's are grants are simple to use and simple to understand.

\- Postgres' COPY FROM and COPY TO have been used to compromise the database
by copying ssh keys to the server, amongst other things.

\- Postgres' version of upsert allowed any command to be run without checking
the permissions. So the vaunted "software engineering" behind Postgres is not
that solid.

\- Currently Postgres is subject to around a dozen metasploit vulnerabilities
that any script-kiddy can execute.

The simple fact is, if you use Postgres, you almost certainly have a security
compliance problem.

I could make the same arguments about replication, or online schema changes,
multi-master writes, or any enterprise database feature.

Some constructive advice to the Postgres developers is to take a week and add
grant commands to limit COPY FROM and COPY TO, and look at the metasploit
options and see what can be done ASAP.

I'd appreciate if you're itching to write a hasty response that you actually
check your facts first.

If you're thinking, "How is it possible that everybody else is wrong about
Postgres being the best?", just remember the decade of Mongo fanboism on HN. I
cringed during that era, too.

Source: MySQL and Postgres DBA.

~~~
ilogik
does running Postgres on AWS RDS mitigate some of these problems?

~~~
DeathArrow
I don't know about AWS but our Postgres instances are accessible only through
internal network connections. I think we are safe.

------
jonnypotty
Measurably? Im confused.

------
jakearmitage
I get all the comparison, but Redis? I still use Redis as a cache on top of my
PostgreSQL queries. PostgreSQL simply can't be as fast as a K/V like Redis.

~~~
smt88
> _I still use Redis as a cache on top of my PostgreSQL queries_

Why? Postgres has a "Redis cache" (in-memory query cache) built in already[1].
Your application layer doesn't have to worry about query caching at all.

1\. [https://www.postgresql.org/docs/current/runtime-config-
resou...](https://www.postgresql.org/docs/current/runtime-config-
resource.html#GUC-SHARED-BUFFERS)

~~~
jakearmitage
[https://www.peterbe.com/plog/redis-vs-postgres-blob-of-
json](https://www.peterbe.com/plog/redis-vs-postgres-blob-of-json)

~~~
ants_a
While I have no doubt that Redis is faster for slinging blobs around, that
blog post is not great comparison of the technologies. Both Django ORM and
Python psycopg2 driver are not performance oriented tools.

------
cosmiccatnap
It's missing the part at the end that says "for general use" Plenty of noSQL
and even MySQL/mariaDB features beat psql in the right problem space.

------
qilo
Technically, PostgreSQL is a DBMS, not a database. I.e. “PostgreSQL is the
worlds’ best DBMS”, but not “best database”. The problem is, if you call the
DBMS a database, so what do you call the _database_ (collection of data,
tables, etc.)?

~~~
dcwca
Is this really a problem?

~~~
qilo
When I knew nothing about databases, and was trying to read stuff on the web
about those, it was very confusing trying to understand what the author talks
about when he uses the term _database_ , is it about a DBMS (software), or
it’s about a database (data). So, yeah, it’s a problem, at least for
beginners.

------
mekster
When I learned you cannot change the order of columns in PostgreSQL, it
sounded like one of the smarter but slightly weird kid in a class room.

On the other hand, MySQL, who's not as bright as the smarter kid, is easy to
talk to and feels friendlier and is generally the more popular kid.

I do appreciate the strictness of PostgreSQL but if I see small weird stuff, I
tend to pick the one that is easier to get along with (meaning, more resource
found on the net.)

Also to mention that MySQL is also getting 'brighter" since version 8.

~~~
thequux
I'll admit to not being an expert in databases, but I can't figure out why the
order of columns in a database would be relevant, unless you're doing `select
*` and indexing the resulting columns by number rather than name, which I've
always found to be fragile to the point of being useless. What am I missing?

~~~
barryp
It doesn't matter for coding or functionality, but it is nice when you're
manually eyeballing the results of 'SELECT * ...' statements or browsing
tables with something like Navicat, or looking at schema dumps - to have your
fields ordered in some way that makes sense to you, rather than strictly in
the order they were created.

Sure, you can list the fields in the order you want in a SELECT statement, but
that's tedious - it's handy to have something reasonable in the table
definition.

There's a wiki page on the Postgres site:

[https://wiki.postgresql.org/wiki/Alter_column_position](https://wiki.postgresql.org/wiki/Alter_column_position)

talking about workarounds and a plan from 2006 on how they might implement
that feature.

------
me551ah
Having migrated a postgres database to dynamodb, I would say that I'm never
going to use postgres again.

The problem with all SQL databases is that they are too easy to query and use.
You add all kinds of select queries, joins and foreign keys and when traffic
hits scramble to make it scale. NoSQL is hard to design but you can atleast be
sure that once traffic hits, you don't have to redesign the schema to make it
scale.

~~~
hnarn
> You add all kinds of select queries, joins and foreign keys and when traffic
> hits scramble to make it scale. NoSQL is hard to design but you can atleast
> be sure that once traffic hits, you don't have to redesign the schema to
> make it scale.

Surely this depends on how you set up your SQL database to begin with? I'm not
familiar with NoSQL, so can you explain why "schemas" aren't necessary and
scaling happens automatically?

~~~
me551ah
You could design a SQL database to be denormalized from the start, but then
you are losing many of the advantages of a SQL database.

I never said that schemas aren't necessary, just that using a strict NoSQL
database forces you to think about scaling constraints early. This
avoids(atleast partially) a schema redesign later.

~~~
g4nt1
Ah yes! Optimize early, that's what my teachers always told me to do. /s

~~~
lmm
Code is easy to change. Data schemata much less so, particularly in
traditional SQL databases.

------
throwaway_pdp09
"In the age old argument about the imperitive vs. the declarative5 programming
models, it occurs to me that declarative programming is just imperative
programming in a thin disguise. Each declarative token in a database query
language ultimately maps to one or several algorithms which apply the
declaration in imperitive terms"

Oh, profound, profound!

It occurs to me the author's a n00b.

Anyway, I do MSSQL. Until recently CTEs in PG were an optimisation fence. That
would have slaughtered performance for my work. Not that I'm knocking PG,
'best' depends upon the problem.

~~~
mulmen
MSSSQL people seem really proud of their CTEs. What is it that makes CTEs in
MSSQL so great?

I wonder if that really disqualifies Postgres from the same task. Are MSSQL
CTEs just the hammer for your proverbial nail? Can you use a different
approach in Postgres to solve the problem by leveraging its strengths?

My day job is a lot of Redshift. We use CTEs and temp tables depending on what
we need. It’s based on Postgres but not really comparable when talking about
performance optimization.

~~~
throwaway_pdp09
Here's the last line of my post: "Not that I'm knocking PG, 'best' depends
upon the problem" I did not want to get into a war with people emotionally
involved in their favourite programming tool. The article said 'best'. By that
measure PG was worse. Point is it doesn't matter, it's about what problem you
are trying to solve and at what price. In many cases PG is best for that - do
you understand? I knew people would see it as an attack on PG. It's so
childish and predictable. I solve business problems, not cheerlead for a
particular RDBMS.

As for what made CTEs so great in MSSQL, I just explained in my first post.
Look for 'optimisation fence', and understand that a) PG has changed and b)
was not a personal attack.

> I wonder if that really disqualifies Postgres from the same task.

If you have 100GB of data being materialised as a temp table under the hood
instead of having the predicates being pushed down, what do you think?

> Are MSSQL CTEs just the hammer for your proverbial nail?

Jesus, they're just a tool to get a job done. As for is there a way round it,
probably, it just means more work for the programmer, and more stress for the
SQL optimiser which will at some point fail as complexity climbs.

> We use CTEs and temp tables depending on what we need

same.

~~~
mulmen
It was an honest question, professional to professional. I did not interpret
your post as a personal attack nor did I intend my post as an attack on you.

The last time I used MSSQL in anger I was an intern and Michael Jackson was
alive. It’s been a while.

My question is: “what is it about your work that makes (or made) MSSQL a
better choice than Postgres or something else, specifically because of CTE
differences.”

~~~
throwaway_pdp09
Sorry. When you said "really proud" and "so great", it was perhaps an emotive
choice of words.

In the end the semantics of CTEs should be much the same throughout. There may
be small differences, and perhaps larger ones such as being able to update
through a CTE, something like

    
    
      with x as (...)
      update x set ...
    

I don't know if PG supprts this, and it wouldn't break my heart if it didn't.
It would be easy to work around.

Also IIRC PG has 'materialized' and 'recursive' keywords. No biggie.

So the semantics are substantially the same. Difference is, how the optimiser
treats it. That means you will get the same results back but the time
difference may be enormous. This explains it:
[https://paquier.xyz/postgresql-2/postgres-12-with-
materializ...](https://paquier.xyz/postgresql-2/postgres-12-with-materialize/)

In my previous work, the predicate pushdown (which is the posh term for the
optimiser rewriting the query as the above link demonstrates) made CTEs
usable. Without them, performance would have destroyed the company.

Can we get round it? Mostly, yes, but it would have been more work for us
(meaning more human hours) which the DB optimiser should have saved us from.

The only unique thing CTEs bring to the party is recursion. Otherwise they are
no more than a convenience, albeit a great one.

HTH

EDIT: MSSQL has a good optimiser when it works. I recently wrote a trivial CTE
and had it take a minute to run, WTF? Looked at the query plan, something was
very wrong there, pulled out the CTE part and put it into a temp table, ran in
one second. 60x speedup. No idea why, was not impressed though.

