
BRIN Indexes in Postgres 9.5 - craigkerstiens
http://dev.sortable.com/brin-indexes-in-postgres-9.5/
======
malisper
To the authors:

You should try _set track_io_timing = on_ before running the query. That will
show you the actual amount of time spent on i/o. Additionally, if you are
using RAID, and haven't tuned it yet, you should try increasing the
_effective_io_concurrency_ parameter.

~~~
dfsegoat
This is fantastic. Thanks.

RE: effective_io_concurrency --- any idea how this should be tuned for SSD or
PCIe flash? The manual only gives an example of how to tune for RAID (by
number of drives).

~~~
doh
My understanding is that the effective_io_concurrency represents the number of
data drives that partcipate in the i/o system excluding the parity drives.

From my experience, for each SSD drive you can set it between 5-8 (if there
are in raid0).

BTW in 9.6 the effective_io_concurrency [0] can be different for each table
space (especially useful if you have multiple drives of a varying quality
[ssd, spinning] in one server).

[0] [https://www.postgresql.org/docs/9.6/static/runtime-config-
re...](https://www.postgresql.org/docs/9.6/static/runtime-config-
resource.html)

~~~
pgaddict
It's somewhat related to the number of data drives, but as you point out it's
a bit more complicated in practice - it's not just the number of drives. For
example due to parallelism built into SSDs (each SSD is actually a stack of
smaller drives). For rotational devices it's complicated by TCQ/NCQ,
optimizations that require a queue of commands for each device to actually
work.

On modern SSD, values 5-8 per device are definitely on the low side. See for
example this thread from 2014, on (then released S3500 SSD from Intel):

[https://www.postgresql.org/message-
id/CAHyXU0yiVvfQAnR9cyH%3...](https://www.postgresql.org/message-
id/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com)

That shows the performance increases up to effective_io_concurrency=256 (of
course, in reality there may be multiple queries running on the system,
sharing the I/O paths).

~~~
doh
Very interesting.

I did some testing a while ago and didn't see much gain past 30 on 4 SSD
disks. Should do another round of testing. Thank you for sharing.

~~~
pgaddict
It's quite difficult (or impossible) to get a single "right" value, as it
depends both on the workload and hardware (say, whether you have SSDs attached
directly or through a RAID controller).

------
tofflos
Some months ago I played with BRIN to determine whether we could use Postgres
as a time series store. I started writing a letter to the Postgres mailinglist
but never got around to sending it. I'll post it here in case anyone is
interested:

1\. Background

I've been playing a bit with large sets of time series data on PostgreSQL 9.6
beta 1. I have two tables containing about 5,000,000 "meters" and 120,000,000
hourly "readings". The 120,000,000 "readings" represent one day's worth of
data and the active set of data would be 90 days.

CREATE TABLE meter ( id integer NOT NULL, version integer, name character
varying(255) NOT NULL, primary key(id) );

CREATE TABLE reading ( instant timestamp without time zone NOT NULL, value
integer NOT NULL, meter_id integer NOT NULL, primary key(meter_id, instant) );

I'm aware that there are many other ways to model time series data but I have
a preference for this one because it's simple, it maps seamlessly with our
development tools (Hibernate ORM), and the data is easy to process with
analytical queries.

(In the end we settled on a different model because this solution was not
performant)

2\. Findings

The table "reading" is 5068 MB and the primary key constraint creates a
multicolumn BTREE index which is an additional 3610 MB. Unfortunately no other
index type is available for use as primary key. It would be nice to be able to
use a slower but more space efficient (224 kB) BRIN index instead. Another
nice option would be an index organized table.

~~~
CuriousSkeptic
Due to the large per-row overhead in PostgreSQL tables you should probably
investigate other ways to store the table data first if you are trying to
preserve space.

In one application I achieved a lot savings by transposing the time series,
and meters, to columns (arrays also have some overhead, so go for columns) and
manually partitioning along one axis to avoid storing one key. Perhaps not
applicable in this case, but someone might find it useful, I had a more or
less fixed set of meters and times.

Ex

    
    
      CREATE TABLE reading_20160917 (
        meter_pos int PRIMARY KEY,
        "meter1 @ 00:00" smallint not null,
        "meter1 @ 01:00" smallint not null,
        ...
        "meter2 @ 00:00" boolean not null,
        "meter2 @ 01:00" boolean not null,
        ...
        "meter3 @ 00:00" pg_catalog.char not null,
        "meter3 @ 01:00" pg_catalog.char not null,
        ...

); ```

Some meta-programming is required to make that manageable though but
PostgreSQL is quite flexible in that regard, so with the proper set of views
and functions you can have a more naïve view of the data for querying.

If that approach won't do I would look into foreign data wrappers, something
like CitusDB perhaps, or just go for an actual time series store.

~~~
tofflos
In the end I did what you are suggesting but implemented it using the
PostgreSQL ARRAY datatype. It was easy support with Hibernate and the data was
usable for analytical queries. Disk usage dropped down to roughly 1 GB.

------
fabian2k
Having clustered tables seems to be a prerequisite for using this kind of
index. How does this work in practice?

The CLUSTER command is a one-time reordering, and I assume a seriously
expensive operation. But it doesn't affect any later insertions. So would one
just run a CLUSTER periodically, or are there better ways to do this?

~~~
cldellow
Right, it is a prerequisite, and there are some complexities in practice.

You're right -- CLUSTER is a seriously expensive operation and locks the
entire table, to boot. So we didn't really do that. Instead, we used pg_repack
which is a great tool to do online clustering without the locks.

Additionally, we use pg_partman (another great tool!) to partition our data by
month. Once a given month is finished, we can ensure the partition table for
that month is ordered optimally and then never worry about it again.

This particular table is mostly appended to, so it's not so bad. If you were
updating or deleting records, you'd have to run pg_repack more aggressively
(or take similar actions) to get good performance.

------
lucb1e
Very interesting article. We had a database class last semester which hardly
mentioned anything new, making me believe that I generally knew most things
there were to know about databases. I'm sure there will be people here saying
this doesn't even go that deep, but this reminded me how much more is going
inside a database.

It also makes me think database technology is less dead than it seems. Hot
topics these days are things like redis and mongodb, and even nosql is already
old news and on its decline (when looking at typical HN front page topics
anyway). Yet there is still so much tech put in good old relational databases.

------
Spidler
But you cannot cluster on a BRIN index, and isn't cluster an operation that
you need to redo regularly?

I thought so, since while the cluster flag is set on an index, it won't
guarantee future writes to be clustered.

Since you can issue the cluster command to update the clustering.

So, I wonder how you are to maintain the setup.

~~~
cldellow
As others have mentioned, if your workload is mostly appends and the data is
appended in the correct order, you don't need to redo the clustering. In our
case, the clustering was a one-off to get the data in the right shape.

(Although if you do need to redo clustering, look into pg_repack.)

------
sneak
Are they stored in PAGEs?

~~~
jakub_h
This took me a while to get. But at least I'm not apparently a completely
hopeless case...

------
ddebernardy
You can probably pre-compute part or all of those aggregates using triggers,
and then run those queries in ~200ms or less with a plain B-tree.

~~~
cldellow
Yeah, that's what we did in the part we said "Pretend this is where we
stopped" :) We pre-computed a few major dimensions and choose which table to
query based on which columns are being requested. BRIN indexes were still a
win vs B-tree, though.

We keep being impressed by Postgres--we keep being able to flog it harder and
harder instead of moving to something heavier duty.

------
mozumder
Is BRIN Index the same as ATM Machine?

Article didn't really explain the acronym.

~~~
dimfeld
Block Range Index. Here's the official docs:
[https://www.postgresql.org/docs/9.5/static/brin-
intro.html](https://www.postgresql.org/docs/9.5/static/brin-intro.html)

------
mrits
"B-tree indexes have entries for each row in your table, duplicating the data
in the indexed columns" \-- not really

~~~
Jweb_Guru
They actually do work like that in Postgres, and in general for secondary
indices. Like, I'm positive there are some btree variants that deal with
duplicate data better, at the cost of slower writes or more locking (just an
example), but we're talking about normal btrees.

~~~
CodeWriter23
In his book "An Introduction to Database Systems", DB2 Architect CJ Date
describes an index design that compresses indexes of similar strings. So if
entries in the index block start with the same first few letters, the first
entry has the complete string, and subsequent entries use the format: <count-
of-matching-characters> <remaining-characters-that-are-different-than-
previous>. The algorithm burns some CPU to conserve I/O.

For example:

0 let 3 ter 6 s 4 uce

For the words: let, letter, letters, lettuce.

I can't speak to whether Postgres uses that index format; I haven't checked.
Just saying an index as described above is a textbook idea that's had already
been around for a long time when I learned it some 30 years ago.

I also want to share, Date's book is the best book I've ever read on
information processing theory. It brought several foundational ideas into my
skill set.

~~~
Jweb_Guru
Sure, there are lots of indices that format things in different ways (sounds
like delta encoding). Just not Postgres btrees.

