
PostgreSQL 9.4 – What I was hoping for - craigkerstiens
http://www.craigkerstiens.com/2014/02/15/PostgreSQL-9.4-What-I-Wanted/
======
silvestrov
I'd rather have a solid codebase that's easy to maintain and has no quick
hacks. Better take the time to do it correctly than having features that
sometimes doesn't work.

PostgreSQL didn't become PostgreSQL by doing it the MySQL way.

~~~
redstripe
But the crappy codebases already exist outside of postgres anytime people have
to roll their own upsert implementation.

The fact that it's hard is exactly why it should be in the database.

~~~
cookiecaper
I don't think there's any contention at this point that it's a useful feature
to have in the database. There just seems to be some disagreement on the finer
points of the implementation. It'll get worked out, and Postgres will make it
through another release with or without upsert.

~~~
jeltz
Indeed, I followed the discussion and it was almost entirely about which
implementation would best handle concurrency concerns.

------
bostik
Logical replication. Please let that be true.

I didn't know live data migration from one database version to another had a
name, but I have been waiting for it for years. (Since 7.3, in fact.) And if
I'm interpreting this right, that feature could also allow for a fairly
straightforward way to move data from a running instance to a new one.

Perhaps not yet in the first release, but eventually that too could work.

~~~
twic
> that feature could also allow for a fairly straightforward way to move data
> from a running instance to a new one

That's already relatively straightforward with PostgreSQL's existing
replication. This has worked since 9.1:

[http://www.rassoc.com/gregr/weblog/2013/02/16/zero-to-
postgr...](http://www.rassoc.com/gregr/weblog/2013/02/16/zero-to-postgresql-
streaming-replication-in-10-mins/)

It involves bouncing the master, but it doesn't involve extended downtime. Or
is that not what you meant?

However, the 'logical log streaming replication' that is being implemented
looks _even better_. A small but useful thing is that it works at database
granularity, rather than having to replicate the whole installation. The huge
thing is that because it works with logical rather than physical
representations of changes, it allows replication into a database which is
also being updated locally - slaves can be read/write rather than read-only.
That itself is not immediately exciting, but the logical consequence of this
is that two databases can both accept changes and replicate to each other -
'bi-directional replication', or what in MySQL is called 'multi-master
replication'. The holy grail of database clustering!

~~~
bostik
> _The holy grail of database clustering!_

Thank you for pointing this out, I would have missed it.In fact, thank you
everyone who has answered with helpful technical details.

I have a couple of use-cases in mind where logical streaming replication could
solve a number of problems. A personal project would benefit from a database-
level setup due to easier data migration, a professional one would benefit
from bi-directionality and logically straightforward clustering. (I also have
some future designs in mind where replicating only subsets of a database would
be VERY useful.)

Owing thanks to the replies, I now have some extra reading to do. :)

------
chandrew
I used postgres for the first time last week, and was wondering, "WHERE IS
UPSERT?" Ended up going with a "UPDATE table SET field='C', field2='Z' WHERE
id=3; INSERT INTO table (id, field, field2) SELECT 3, 'C', 'Z' WHERE NOT
EXISTS (SELECT 1 FROM table WHERE id=3);" (as seen on
[http://stackoverflow.com/questions/1109061/insert-on-
duplica...](http://stackoverflow.com/questions/1109061/insert-on-duplicate-
update-postgresql) \- although, not the chosen answer). Anyone come up with
better or their own solutions?

~~~
np422
I prefer to use CTE in postgres to do upserts, this is a non complete example,
but you get the general idea:

    
    
      WITH upsert AS (
        UPDATE some_table 
        SET attrib = 'foo'
        WHERE id = 123
        RETURNING *
      ) INSERT INTO some_table ( id , attrib )
        SELECT 123 as id, 'foo' as attrib
        WHERE 123 NOT IN (SELECT id FROM upsert );

~~~
jeltz
This does not handle concurrent inserts though unless you rerun the code on
unique violation.

~~~
np422
Correct, but if I write the upsert in that way - I actually understand what
I'm doing... :)

If this is standard upsert query in your app, you should turn it into a
procedure and parameterize, catch exceptions anyway.

Or you could mess around with transaction isolation.

------
justintocci
It doesn't matter what feature's you add to postgres, it will never be less
appropriate for a project than another database that has some magic feature.

The reason people choose another database is because they favor an
architecture where all the things that should be done in the database get done
in the middleware. Features are irrelevant since the goal is do everything in
the middleware anyway.

Once a person converts to the idea that its faster, safer and more reliable to
use the database for what it was designed to do then they have left the jungle
and found the road. And all roads lead to postgres.

I really don't care one iota about any of these features. They are all obscure
and I avoid obscure features in deference to the guy that has to read my code
next (mostly me).

------
fleetfox
JSON in postgres is just a container type isn't it? Why is everyone expecting
it to become document storage?

~~~
sehrope
> JSON in postgres is just a container type isn't it?

At the moment JSON is stored in a text field so accessing a JSON field
involves parsing the entire text value. If instead the JSON is stored in a
parsed binary format then you can more efficiently access individual fields.

If you're only accessing a small set of known fields you can work around the
issue by using plv8 to create function indexes on the fields that you'll be
using. This doesn't work for arbitrary expressions though. If you want to
filter a WHERE clause based on a non-indexed field the entire JSON text will
need to be parsed.

Binary JSON storage makes all of this much faster with no change on the user's
side. It's all transparent and just faster!

In theory it could also reduce storage space for tuples. Duplicate field names
need not be repeated. I don't think it's in scope in the Postgres JSON
improvements but it's a possibility.

> Why is everyone expecting it to become document storage?

Document storage in a relational database is really useful and there are
plenty of use cases for it.

The standard example I use for JSON ( _or hstore_ ) usage for Postgres is an
audit table. You'd have all the usual audit fields ( _who /what/when_) but
you'd also have a "detail" field with event specific details. Using hstore or
JSON for this is perfect. Improved support for JSON makes it much easier to
provide event specific search.

~~~
gdulli
> If you're only accessing a small set of known fields you can work around the
> issue by using plv8 to create function indexes on the fields that you'll be
> using.

You don't need plv8 to create an index on a JSON field, it's supported
natively.

~~~
sehrope
Yes my mistake on that. I'm mixing up 9.2 ( _where you had to do use plv8_ )
and 9.3 ( _which has the native operator_ ). The rest of the comment still
stands though, a parsed representation makes the operator faster and more
efficient.

------
atesti
Isn't there an in place update of an existing database without dump and
restore now for quite some versions
([http://www.postgresql.org/docs/9.3/static/pgupgrade.html](http://www.postgresql.org/docs/9.3/static/pgupgrade.html))?
A tool which keeps the relation files, but builds new system catalogs around
them? It had the impression that this tool would be very fast, but a bit
risky. The linked article makes it sound like there is a need for local
replication in order to do an update without (much) downtime. Is this just a
(less risky!) alternative to the available in place update?

------
gleenn
Upsert is really exciting, as the author mentions, because it is a huge thing
keeping people from choosing Postgres instead of MySQL.

~~~
gaadd33
Really? Why is that a primary show stopper when there are numerous solutions
to it that live on the database side as a stored procedure?

~~~
pornel
Yes, for me this is a showstopper. All MySQL projects I've worked on have
REPLACE INTO and/or INSERT … ON DUPLICATE.

Postgres' equivalents of UPSERT that I've found either only work correctly
with single-row inserts or are inefficient and complicated.

It's really hard to sell Postgresql to developers of a MySQL project when a
simple 1-line query explodes into a complicated stored procedure for
apparently no good reason.

~~~
__david__
> Yes, for me this is a showstopper. All MySQL projects I've worked on have
> REPLACE INTO and/or INSERT … ON DUPLICATE.

Curious. What is your use case?

From personal experience, I've used "ON DUPLICATE", but we ended up dumping it
and making our API better/more secure. This was even before we switched to
Postgres.

~~~
pornel
Counters where the set of things you count is open-ended (e.g. words in a
naive bayes classifier):

    
    
       INSERT INTO word_counts(word) VALUES('hello'),('world') 
          ON DUPLICATE KEY UPDATE cnt = cnt+1;
    

Lists where you track 'last seen':

    
    
       REPLACE INTO last_login(user_id, date) VALUES(123, NOW());
    

With Postgres' INSERT … RETURNING this would be awesome:

    
    
       INSERT INTO log(date, user_agent_id) VALUES(now(), 
          INSERT IGNORE INTO user_agents VALUES('Mozilla') RETURNING id)
    

This way I could store any long string only once, and do it atomically in one
query.

------
dlau1
Two of those features:

Logical Replication

Upsert

-Would be HUGE for postgres

