
PostgreSQL 9.6 Released - sheff
https://www.postgresql.org/about/news/1703/
======
fabian2k
Just from reading the documentation, the full text search features on Postgres
already look pretty powerful. And it is encouraging that they are actively
being worked on. I'm wondering how this compares to a dedicated search engine
like Solr or Elasticsearch.

Are there huge differences in performance, features or search quality? At
which scale does using Postgres for full text search still make sense?

~~~
brightball
Having used all 3, Postgres search is my go to for most use case simply
because I don't have to deal with managing deltas to an outside system and
keeping things in sync. The search features are powerful and fast and PG's
ability to combine multiple indexes in search results make it trivially easy
to include a bit of full text search in a query right next to geographic
distance filters or other conditions. You can also combine multiple types of
searches on the fly if you're feeling whimsical.

IMO, the only time to reach for an outside system is when the data isn't being
written to PG first (like log ingestion with elastic search) or when search is
such a central part of your app that it mandates a separate dedicated system.

~~~
nnutter
Are there any good options to support logic (and/or) and facets/fields with
Postgres? We started using ES basically just for the "free" query language.
(Obviously we would want something that is safe from sql injection.)

~~~
anewhnaccount
SQL supports logic. Either escape manually, use the templating in your driver
or use an ORM.

~~~
nnutter
So, "No".

~~~
eyelidlessness
I'm confused. Seems the answer is "yes"?

------
snowwolf
Please can the Postgres team put some major focus on completing logical
replication [1]. It's the missing piece to making upgrading across major
versions painless and quick on large databases so that we can take advantage
of all these nice new features. We're on a Heroku's hosted Postgres service so
can't install the pglogical extension.

1\. [http://blog.2ndquadrant.com/why-logical-
replication/](http://blog.2ndquadrant.com/why-logical-replication/)

~~~
mslate
I don't think you would be able to take advantage of logical replication on
Heroku Postgres regardless--they don't allow you to replicate to your own
instances, only other Heroku-hosted instances.

This makes migrating off Heroku for Postgres a PITA and requiring down-time.

~~~
snowwolf
True, that would be an extra bonus if Heroku started allowing replication to
non Heroku instances, but as long as they support logical replication to a
Heroku Follower instance then you can upgrade to new major versions with near
zero downtime - set up logical follower running latest postgres version and
then promote to master once it has caught up. Currently you can't have a
follower that is a different version to master - meaning an upgrade requires
either a full backup and restore to new version resulting in significant
downtime if you have a large database or using the pg_upgrade utility which is
generally not recommended as it is not guaranteed to work.

------
ignoramous
A tangential question:

Everyone speaks about InnoDB and how performant and reliable it is... and
multiple firms even use it as a KV-store (Uber/Pinterest/AWS) bypassing MySQL
entirely. I have never heard much about storage engines in Postgres, why could
this be so?

Wikipedia has a (stub) article on InnoDB, but nothing on Postgres' storage
engines... just wondering why that is.

~~~
rwultsch
The PG storage engine is not particularly awesome. It is basically COW (with
exceptions) and compaction (called vacuum) has been quite painful for a long
time. Every release it is supposedly fixed, but people keep complaining. This
not to say PG sucks, their optimizer knows far more about their data than
InnoDB and PG can perform far more types of execution plans.

We (Pinterest, I wrote most of the MySQL automation) make heavy use of MySQL
replication which is vastly simpler to manage than PG. All queries still flow
through SQL and unlike PG, we can force whatever execution plan we need. We do
lots of PK lookups, and InnoDB is really good at that. In InnoDB all the data
is stored in the PK while in PG it is just a pointer.

~~~
dhd415
>>In InnoDB all the data is stored in the PK while in PG it is just a pointer.

This is just a consequence of the PK being a clustered index in InnoDB which
has both pros and cons. One of the big cons is that all of the columns of the
PK are implicitly added to every secondary index as the row identifier. That
isn't a big problem if your PK is a single column int, but if it's multiple
columns, that often results in unnecessary bloat in your secondary indexes.
Ideally (as in, dare I say, MS SQL Server), you'd have the option of a
clustered or non-clustered PK for your table so you could choose the optimal
index structure for your workload on a per-table basis.

~~~
rwultsch
If you don't want a clustered index in InnoDB you can define the primary key
as an auto incrementing uint.

~~~
ngrilly
Yes, you can, but it doesn't change the fact that you still have a clustered
index (an index organized table), which is great for PK lookups, but bad if
you do a secondary indexes lookup (because you need to lookup through 2
B-trees instead of 1). There is real, and well-known, tradeoff here.

~~~
rwultsch
You missed the point of my post. You are going to have one of the two issues,
either looking through two index or indexes including the a large PK. At least
with InnoDB you can make the choice. The strategy I suggested gets you the
desired outcome of not including a large PK in all secondary indexes.

~~~
ngrilly
> The strategy I suggested gets you the desired outcome of not including a
> large PK in all secondary indexes.

For an application in which most queries need a secondary index lookup, using
heap organized tables is more efficient because the database needs to traverse
only one B-tree (for the secondary index) that gives the physical position of
the row in the heap. When using index organized tables, the database needs to
traverse 2 B-trees (the secondary index first, then the primary index). Making
the primary key short by using an auto incrementing integer helps, but doesn't
remove this overhead.

------
malisper
> Index-only scans for partial indexes

This one is huge for my company. Almost every single query of ours could use
an index-only scan, but the planner would never choose to perform one because
of the weirdness around partial indexes. We expecting a several x speedup once
we upgrade to 9.6. All the need to improve now is a way to keep the visibility
map up to date without relying on vacuums.

~~~
snuxoll
I don't see ever going away from using vacuum to maintain the visibility map,
but hopefully the changes in 9.6 will make it a non-issue on large tables.

~~~
anarazel
> I don't see ever going away from using vacuum to maintain the visibility map

I don't think that's that unlikely to change. There's two major avenues: Write
it during hot-pruning (which is done on page accesses), and perform a "lower
impact" vacuum on insert-only tables more regularly

> but hopefully the changes in 9.6 will make it a non-issue on large tables.

You mean the freeze map? That doesn't really change the picture for regular
vacuums, it changes how bad anti-wraparound vacuums are. The impact of the
table vacuum itself is most of the time not that bad these days (due to the
visibility map), what's annoying is usually the corresponding index scans.
They have to scan the whole index, which is quite expensive.

------
jbkkd
Congratulations to the PostgreSQL Global Development Group on a much-
anticipated release.

Curious about this:

> parallelism can speed up big data queries by as much as 32 times faster

Why would it be only 32 times faster? The sky's the limit if there aren't
major bottlenecks on the way.

~~~
olavgg
I tested parallel queries on PostgreSQL 9.6 on a few TBs of data, 5 billion
rows on an older dual Xeon E5620 server. I also striped 4 Intel S3500 800GB
drivers with ZFS and enabled LZ4 compression which has a compressratio of 4x.

For a sequential full table scan I could process about 2000MB/s of data(only
125MB/s was read from each SSD), I was limited by CPU power.

Anyway, same query took about 25 minutes on PostgreSQL 9.5 and now it was down
to 2minutes and 30 seconds. For comparison, SQL Server 2012 spent 7 minutes on
the same dataset on the same hardware.

~~~
greggyb
Would you be willing to re-run that with SQL Server 2016? A Dev license is
free, and there's been a lot of relational engine optimization since 2012. I'd
be curious to see what the latest release can do compared to Postgres' latest.

I realize I'm asking a stranger on the internet to do something for free for
me. If you don't have time or inclination to do this, no worries, but it seems
like you've got a nice setup to be able to play with this. I'm sure I'm not
the only one curious to see such a comparison.

~~~
olavgg
I've tried SQL Server 2016, no difference.

~~~
greggyb
Thanks, this and your other response are very useful!

------
qaq
Congrats on great release. With availability of E7-8800 v4 based servers (up
to 192 cores in a single box) PG can cover a huge number of use cases without
complicated setups.

------
mgberlin
Does anyone know when this will be available on AWS RDS?

~~~
aidos
On that topic - what's the general feeling about RDS?

I'm running pg on ec2 with a hot standby slave. I need the postgis extension
but am not doing anything particularly esoteric. Ideally I'd like to have the
certainty of aws handling backups for me.

I was researching moving to RDS today and would love to hear thoughts on
whether it's a good general solution or not. What happens about downtime
during upgrades or swapping instance sizes?

~~~
luhn
> What happens about downtime during upgrades or swapping instance sizes?

This is one of my favorite features of RDS: You can set a maintenance window
and have the option to not have changes take effect until that window. So if I
want to upgrade Postgres or change the instance size, I set it up and the
downtime happens when I'm fast asleep and nobody is using the site.

I also think (but not 100% sure) that if you have Multi-AZ enabled, changes
are done by upgrading the slave, failing over, and then upgrading the ex-
master, so downtime is limited to the failover period.

~~~
aidos
Ah ok. That's useful info about the multiAZ setup - I'll have a look into
that.

In my case, we now have customers around the world so we don't get the "night
time" luxury. Part of the work I'm now completing is to split the system into
an accounts db and customer data db. I was thinking to dip my toe in the water
by just moving the account db to RDS to see how it goes.

------
Roboprog
Is it just selection bias from posted links on HN, or has the PostgreSQL team
been doing many (feature) releases lately?

Sounds good!

~~~
pgaddict
There's still only one major PostgreSQL release per year. There were a few
posts about cool stuff built on top of PostgreSQL, a few posts about progress
of the 9.6 development (e.g. when the parallel query got committed) etc.

~~~
anarazel
> There's still only one major PostgreSQL release per year.

Well, due to the delayed 9.5 release (January 7th), there have been two this
year ;)

~~~
pgaddict
Well, that really depends on where exactly you place start of a year ;-)

Chinese New Year was February 8, 2016. Orthodox New Year was January 14, 2016.
So it's 2:1 for me.

~~~
zejn
Well, to the best of my knowledge I think PostgreSQL currently only supports
Gregorian calendar system ... ;)

------
gtrubetskoy
Anyone know the state of BDR in 9.6?

[http://blog.2ndquadrant.com/bdr-is-coming-to-
postgresql-9-6/](http://blog.2ndquadrant.com/bdr-is-coming-to-postgresql-9-6/)

~~~
okket
See discussion from 3 days ago (69 comments):
[https://news.ycombinator.com/item?id=12576116](https://news.ycombinator.com/item?id=12576116)

TL;DR: It is not in mainline, but it does not need a patch anymore. You need
to bring your own conflict resolution logic.

~~~
pgaddict
Or design the application so that there are no conflicts (e.g. modifying
different subsets of users on different nodes).

~~~
jimktrains2
Which is a form of conflict resolution. It requires the application to be
aware of the datastore.

I wonder if, since BDR is just a plugin now, a plugin that used strong
consistency guarantees could be built using the same changes that were
required for BDR.

------
tmaly
I am interested in the full text search as well as

Index-only scans for partial indexes

------
n4nagappan
Does Postgres offer search based on tf-idf?

~~~
ris
Yes and it's quite flexible in doing so
[https://www.postgresql.org/docs/9.6/static/textsearch-
contro...](https://www.postgresql.org/docs/9.6/static/textsearch-
controls.html#TEXTSEARCH-RANKING)

------
Chayanon1981
kk

