

What's Coming in PostgreSQL 9.5 - thomcrowe
https://www.compose.io/articles/coming-in-postgresql-9-5/

======
bmh100
Let me just take a moment to point out how important row-level security is as
a concept. That features allows one to create essentially a secure analytics
data model by tying security business logic into the values of a table. For
every query, just join to that security table, and now the database is
flexible and secured.

Example:

Tie a salesperson to an order placed, and use the order table as the primary
fact table. For all reporting and visualizations that can be tied to orders,
all one has to do is incorporate an association to that table. Worrying about
which customers, products, time periods, etc. a salesperson can see are
automatically handled by that association.

I am not saying that PostresSQL implements what I am describing, but this
example can be expanded by creating an intermediate table with many-to-many
associations. E.g., this table might have one row for each salesperson's
access and one row for each salesperson under a supervisor. Once again, a
centralized location for controlling access throughout the entire analytical
data model. It is a very useful tool that I have relied upon extensively.

~~~
emidln
Meh, you can already do this with a query AST. For every request, grab the
user's data permissions represented in the same query AST and then AND them
together. Compile your query AST into whatever search technology (I've seen it
done with ES, Postgres, MySQL, Mongo, Solr, and Rethinkdb in the last couple
years). If you're being super fancy you can even use your query ast to match
on a document stream in real-time by compiling to some actual programming
language and checking things on whatever document stream.

~~~
pgeorgi
The difference is that postgres can enforce this for arbitrary queries.

This doesn't matter in the typical webapp where all accesses to the DB happen
through the same database user id, but when actually using the user system of
the DB, it allows for fine grained access control to a common data set.

The closest you have without explicit RLS support is to create a view for each
user. RLS generates per-user views on demand under a common name.

~~~
darrhiggs
I use schemas[0] for this. Could someone explain what advantage is gained from
RLS in comparison, either to views or schemas?

[0] [http://www.postgresql.org/docs/current/static/ddl-
schemas.ht...](http://www.postgresql.org/docs/current/static/ddl-schemas.html)

~~~
bmh100
I am very familiar with RLS, but not schemas. Could you provide an example of
how you would use schemas? If not too much trouble, could you also use my
salesperson example from my other comments?

------
rdtsc
BRIN (Block Range) Indices look really interesting.

Instead of storing the whole B-Tree (and spending time updating it) just store
summary of ranges (pages).

This for example, would be great for a time series database that is write
heavy but not read as often.

I found these benchmarks here explaining the differences:

[http://www.depesz.com/2014/11/22/waiting-for-9-5-brin-
block-...](http://www.depesz.com/2014/11/22/waiting-for-9-5-brin-block-range-
indexes/)

\---

Creating 650MB tables:

    
    
       * btree: 626.859 ms.
       * brin: 208.754 ms
    

(3x speedup)

Updating 30% of values:

    
    
       * btree: 8398.461 ms.
       * brin: 1398.711 ms.
    

(4x speedup)

Extra bonus:

    
    
       * size of btree index: 28MB
       * size of brin: 64kb
    

Search (for a range):

    
    
      $ select count(*) from table where id between 600000::int8 and 650000::int8;
        * btree between: 9.574 ms
        * brin between: 21.090 ms
    
    

\---

~~~
saosebastiao
Yep, this is a big deal, for me at least. It basically makes it almost cost-
free to add indexes to tables that are write heavy.

I've always felt like Bitmap Indexes were a killer feature, and never
understood why they weren't used more in databases. If you have a low
cardinality column (anything suitable for an enum), your indexes become
incredibly fast and cheap.

~~~
pjungwir
There was an effort to build bitmap indexes for Postgres a few years ago.
There are details in the mailing list archives. It looks like it was almost
completed! It's high on my list of things to tackle if I ever get time to
start contributing, but maybe someone else will get to it first.

~~~
facetube
Is that different than the bitmap index access method, which IIRC landed in
8.1 or so, and is used to combine results of multiple index scans?

~~~
pjungwir
Yes, totally different thing. :-) A great book on query plan stuff is here:

[http://www.amazon.com/PostgreSQL-High-Performance-Gregory-
Sm...](http://www.amazon.com/PostgreSQL-High-Performance-Gregory-
Smith/dp/184951030X)

------
mixmastamyk
Love me some postgres. If I could be so bold as to ask for a feature or two,
it would be great if the initial setup were easier to script.

Perhaps this is outdated already but we have to resort to here-documents and
other shenanigans to get the initial db and users created. Is there a better
way to do this?

Next would be to further improve the clustering, to remove the last reason
people continue to use mysql.

~~~
Roboprog
Here is my script to create a test instance in about 5 seconds (albeit without
any tables define yet, just space for them, with a running server on that
space, and an account to create and use the tables in the DB):

(puts tablespace in folder under current directory)

<code>

#!/bin/sh -x

# create and run an empty database

# note: fails miserably if there are spaces in directory names

PGDATA=`pwd`/pgdata

export PGDATA

# make it if not there

mkdir -p $PGDATA

# clean it out if anything there

/bin/rm -rf $PGDATA/*

# set up the database directory layout

initdb

# start the server process

nohup postgres 2>&1 > postgres.log &

sleep 2

# create an empty DB/schema

createdb edrs_test_db

# create a user and password ("demo") to use for connections

psql -d edrs_test_db -c "create user guest password 'guest'"

ps auxww | grep '[p]ostgres'

echo run tail -f postgres.log to monitor database

# vi: nu ai ts=4 sw=4

# __* EOF __*

</code>

"edrs" is the name of an app - sub in something more applicable. I'm running
this on OXS, but should work on Linux as well.

~~~
AlterEgo20
Why `nohup postgres 2>&1 > postgres.log &` instead of `pg_ctl start`? You
could also use `pg_ctl -w start` to wait for proper server startup instead of
`sleep 2`

~~~
Roboprog
Why? Ignorance. I'm used to running postmaster from years past (when not
starting from an /etc script). Postmaster is now just called postgres, and it
works for a disposable test instance setup.

I'll look into that, though, as it sounds like the right thing vs the sleep
hack.

------
bkeroack
Nobody has mentioned jsonb partial updates yet. This is huge and goes further
in superseding MongoDB use cases. Previously you would have add your own
locking mechanisms (or use SELECT ... FOR UPDATE) so read/modify/update could
be performed atomically. Now it will be built in.

~~~
jrochkind1
You already could do that with postgres hstore, I think, but postgres hstore
is limited to a flat list of key/values, not nested data structures like json.

I've been wishing for a while though that Rails ActiveRecord would support the
atomic partial update operations inside hstore that postgres already does.

([http://www.postgresql.org/docs/9.0/static/hstore.html](http://www.postgresql.org/docs/9.0/static/hstore.html))

------
jtwebman
Do you guys think row-level security will eventually replace the crazy logic
we have to add to our systems normally to allow for this? I can think of many
places this might help a bunch if combined with SET SESSION AUTHORIZATION
command.

~~~
brlewis
I thought you could do that now by making a view and setting security on the
view. Oracle allows for that sort of thing.

~~~
jtwebman
True, I have seen Microsoft SQL systems like that as well.

------
lephyrius
No, partial updates of materialized views it seems like.

~~~
pilif
nope. but since 9.4 you can at least update them without an exclusive lock on
them.

------
Roboprog
Cool stuff for multi-tenant DBs! Aside from the obvious row level security,
tenant ID makes a nice BRIN key for some tables, I suspect.

------
therealunreal
It sounds like BRIN could replace partitioning on some cases. Am I right?
Assuming you have a huge log table, partitioned by week, would this be a
better fit?

~~~
saosebastiao
For a log table or anything immutable and write heavy, definitely.

That being said it might be difficult to know when you won't get any benefit
out of it unless you have control or knowledge of how rows are laid out in the
table space. For example, deleting some rows based off of a fairly random
criteria may make Postgres insert into those spaces on subsequent writes
(after a vacuum), which could "pollute" the block ranges with non-ordinal data
and make the block ranges less targeted.

------
ExpiredLink
Upsert, oh well. CRUD becomes CRUDUM: Create, Read, Update, Delete, Uperst,
Merge.

Instead of 4 orthogonal concepts we now have 6 overlapping. Because the
majority voted for it. That's progress!

~~~
andrewl-hn
Well, philosophically, CRUD is a lie. You only ever need two operations: READ
and UPSERT.

Create with upsert and Delete by upserting "deleted = true" flag.

~~~
bmh100
That "deleted" flag is extremely useful in data warehousing and OLAP
applications. I wish every table had a "deleted" column and an "updated"
column.

~~~
Todd
They do in my schemas :)

One extra tip, which I have found useful, is to make the deleted column a time
data type (just like created and updated), but nullable. That way, your
Boolean check just needs to change to an IS NULL check, but you get the
additional 'when' information without using an extra column.

~~~
pjungwir
That is the normal pattern in Rails apps using the `acts_as_paranoid` or
`permanent_records` gems (`deleted_at` to match `created_at` and
`updated_at`). But I often also have `deleted_by_id` to capture Who, and I
wonder if I shouldn't just have a separate `deletions` table with the who/when
and other context, and then `deletion_id` on the record.

And then I wonder if I should track updates too. There are auditing solutions
to record all that, but the ones I know are (rightly) not really designed for
building application logic on top of.

The idea of a relational schema having some kind of temporal dimension letting
you get at changes is something that's been on my mind a lot lately.

~~~
mason55
> _There are auditing solutions to record all that, but the ones I know are
> (rightly) not really designed for building application logic on top of._

Yes, a big problem with table-level audits is that you lose all kinds of
information about the other entities in the system. Sure, now you have an
audit log of when a row was changed, but you don't really know anything about
the state of all the other pieces of the database at that time, so you can't
really usefully reconstruct what the entity looked like at the time it was
modified.

In theory you could parse through the whole audit log to reconstruct the state
of the DB but in practice it gets very complicated.

~~~
bmh100
A better solution is to perform row-level snapshots with a compressed storage
format, such as a column store. I maintain a database which takes monthly
snapshots of data and supports an application that allows period vs. period
comparisons of aggregates or even individual rows. In my case, I use
snapshots, but more space efficient (at the cost of computation) would be to
only store changed records, then dynamically determine which data to show
based on the desired periods and sorted the row changes.

~~~
pjungwir
I think what mason had in mind, which I agree is a major pain point, is when a
central table "owns" records in other tables, e.g. a `book` might have several
rows in `pages`. I want to say "give me edition 3" and get not just the book
at that point but all its pages too. Tracking changes to the book is not so
hard, but reconstructing it with all its child records is a pain.

~~~
mason55
Yes exactly. Piecing together the state of all the foreign tables across the
system at a specific point in time is difficult/painful.

This is one place where document stores really shine as you generally keep
everything in a single place. When you update a document you don't have to
worry about the values of all the foreign keys, you just save the current
version which contains all your values.

