

PostgreSQL indexing in Rails - ricny046
http://rny.io/rails/postgresql/2013/08/20/postgresql-indexing-in-rails.html

======
mapgrep
This is an incredibly suboptimal approach.

Postgresql (and even Mysql for that matter) lets you explicitly declare
foreign key references when you create/alter a table and then the database
will enforce integrity of those references for you. Which is great because
when someone writes some kind of code outside of Rails to work with your db,
and that code has bugs, and those bugs impact data integrity, there is a hard
stop at the DB layer preventing dangling/invalid references that will blow
something up later.

A side effect of creating these explicit foreign key relationships is that any
needed indices are created on both sides of the key. [UPDATE: Nope, I'm wrong,
see below.]

Here the author has a Rails setup that fails to declare foreign key
constraints when it's making a table with relations. You can see this in the
DB description of the `products` table which references the `categories`
tables via a plain int called `category_id`. As a result of not having an
explicit foreign key constraint this table also has no index on `category_id`.

So given a table with poor data integrity at the SQL/db level and lacking an
index as a sympton of this problem, the author treats the symptom and advises
the creation of an index on `category_id`, leaving the real problem woefully
intact. And that real problem, to be clear, is the fact that the database is
not being used properly but rather treated as a relatively dumb place to dump
columnar data; things that Postgres gives you for free are set aside and
pseudo-duplicated in Rails code.

Now I'm not blasting the author because I don't know if this is a tactical
issue or an issue with Rails itself. Does Rails not allow foreign key
constrains in the migrations?

Whether this is a flaw in Rails or how it's being used, this is a bad solution
to the problem. The RDBMS is your friend, use it. (And if you're not going to
use the integrity constraints of the DB why are you using Postgres instead of
say BerkeleyDB or a loosely configured MySQL or Mongo or whatever?)

~~~
electrum
Rails migrations do not support creating foreign keys in the database, but you
can do it with this library:
[https://github.com/matthuhiggins/foreigner](https://github.com/matthuhiggins/foreigner)

~~~
bratsche
Anyone know why ActiveRecord doesn't just include this functionality already?
Or does anyone have any idea if it's planned to add?

~~~
tominated
I was searching for the same thing myself recently, and it seems that DHH is
against having any sort of logic in the database - he believes that all
validations and constraints should be in the application instead of being in
both (which I believe is the correct way).

~~~
xentronium
The problem is that sometimes you simply can't do it properly in application.
Uniqueness is the most obvious example, although not the only one.

~~~
tominated
I probably phrased that incorrectly. I definitely think that you should have
validation logic at the database level. I've had a couple of major issues in
the past (while still learning web dev) where lack of indexes/constraints has
pretty much ruined my database when I tried use them in production

------
sandGorgon
So ... this might sound like a really dumb question but I come from a less-
than-illustrious MySQL background and have built up "rules of thumb" over the
years - especially around master-slave replication and security. Do note that
I'm not specialized in devops, but just doing something good enough at a
startup stage - for a database it means that downtime is acceptable in
exchange for data safety.

I'm really, really unsure about how to use postgres in such a way: most posts
that I read are around these really complex setups with zero downtime or these
Postgresql-quickstart guides (which I'm not looking for) or Heroku howtos.

Does anyone have a non-devops set of Postgres best practices that could be
used to get your transactional MVP up and running on Softlayer ?

~~~
hahainternet
I don't know Softlayer too well, but Postgres is better every day.

1\. Install Postgres 9.3, the version after the replication fix.

2\. Tune Postgres until it reaches a reasonable speed (lots of tutorials
available, primarily just assigning it more RAM)

3\. Create a second instance, do a backup, restore backup on second instance

4\. Turn on streaming replication, ensure WALs are being received by second
instance

5\. If you're paranoid, use synchronous replication, which won't finish a
transaction until it's been committed on the primary and a streaming
secondary.

I'm not our DBA, but I believe this covers the primary setup. You will have to
test whether Postgres correctly moves into a new history chain on migration as
support for this via WALSender is pretty new I believe.

~~~
rosser
I am our DBA, and, though brief, this is a reasonable list.

A caveat, though: be careful with synchronous replication; it actually
_increases_ your chances of having an outage. If your slave goes down with
sync rep, the master will no longer accept writes. (To mitigate that risk,
have multiple slaves. But that can further increase the latency sync rep
introduces, as now the master has to wait for _all_ the slaves to report back
a successful write.)

~~~
hahainternet
Good point, I believe if you list multiple sync rep targets it only requires
one of them, but I would have to go check that properly :)

Still, Postgres in a hot standby configuration is relatively simple, still
high performance and an actual real database that doesn't need tens of grand
in licensing. You really can't go much wrong with it.

~~~
sandGorgon
thanks! just to be clear, you do mean host standby with streaming replication
right ? And why did you specifically recommend 9.3 - I cant google for
anything specific to standby fixes.

The authentication defaults for postgres and mysql are vastly different - I am
always tempted to move all authentication to md5 (pretty much the same as
mysql). Am I doing it wrong ?

~~~
rosser
You want either 9.3.2 or 9.2 >= 9.2.5. Before either of those, there was a bug
in the streaming replication code such that, on failover, the last few
outstanding transactions on the master might not be replayed on the slave,
resulting in data loss.

And, no, you aren't doing it wrong. You generally want to use MD5 auth.

~~~
hahainternet
The reason I recommend 9.3 specifically is that when a migration occurs,
Postgres switches to a new historical branch. As of 9.2 this change gets
written only to the WAL directory, and you must use something like Ceph /
Gluster in order to have these migration files available to each replicating
server.

In 9.3 at some point this is slated for inclusion in the WALSender. This
allows you to do full streaming replication without the requirement for a
cluster filesystem to hold your binary logs.

~~~
sandGorgon
but this is bleeding edge stuff ( never mind that it will work perfectly given
past experience).

What did hackers (not whole development + ops) teams do before 9.3.2 ? with
all due respect, is this the reason why startups still default to mysql over
postgres ?

~~~
hahainternet
Postgres historically has been tricky to configure with 'proper' replication.
This is partially because replication is not a remotely simple problem to fix
as it appears. The reason people still default to mysql is simply because it
occupies more mindshare and is marginally easier to set up. I still don't
think it even allows such things as DDL rollbacks so it's hard to talk about
it next to Postgres when you're talking about safety.

------
losvedir
I tried adding a partial index the other day but Rails didn't seem to
particularly like it. Are the examples here from Rails 4? (I'm on 3.2.16).

The issue I had was that, while the partial index syntax worked and actually
did create it, it wasn't reflected anywhere in the generated schema. So I was
worried that future developers wouldn't realize there was a partial index (we
use schema.rb as the canonical reference of DB state), and if we ever get rid
of migrations by condensing them into one big one we might miss that there
should be a partial index on the column.

And changing the config to generate schema.sql instead of schema.rb didn't
help. It's broken with the latest version of postgresql...

~~~
ricny046
The article covers only Rails 4.

~~~
losvedir
Ah, bummer. Anyone have any luck getting partial indexes to work in Rails 3.2?

~~~
netghost
You can always execute arbitrary SQL in migrations. For instance:

    
    
        def up
          execute('ALTER TABLE people ADD INDEX ...')
        end
    

The one thing to keep in mind though is that if you use ruby schema format,
your tests won't pick up those execute statements. In that case it's best to
either use the sql format or re-run migrations in the test environment to set
up the test db.

------
ufmace
Interesting, I was just puzzling over a postgresql indexing question myself.
The generated migration to create one of my tables automatically generated
indexes on the foreign keys. However, after further considering my model, I
want to make the index on one of these foreign keys unique. Is there any way
for a migration to alter the index, or do I just have to delete the old index
and create a new one? Based on what I've seen so far, it looks like I have to
delete and recreate, but I wouldn't mind confirming that.

~~~
brasetvik
You mean on the from side? Yes, just drop the existing index and create a
unique index. You can do that within the same transaction.

Postgres already requires the pointed to relation to have a unique index:

    
    
        => create table foo (id int); -- No unique index here.
        CREATE TABLE
        => create index foo_id on foo(id);
        CREATE INDEX
        => create table bar(foo int references foo(id));
        ERROR:  there is no unique constraint matching given keys for referenced table "foo"

------
pnathan
Question about pg/rails approach...

This seems to tie the database closely to the ORM. What if there are multiple
applications using the schema? Wouldn't it be better to have the database code
& controlling system separate?

I've found it more convenient to have a set of SQL files for specifying tables
& indexes and to have the app not control the database schema directly
(Perhaps I just don't see the clear advantage, or perhaps Rails tooling is so
good it obviates my concerns. :-) ).

~~~
josephlord
Rails makes it easier to do migrations and roll them back if necessary but you
can dump the SQL from Postgres easily and you can choose whether Rails stores
the schema in SQL or in its own format. I use SQL as at least in 3.2 Rails'
own format didn't support all the features of Postgres (partial indexes etc.)

When doing a new deploy you shouldn't usually go through the migrations but
just load the schema.

------
stplsd
Ok, obviously I never done web development, but why you would want declare
schema from rails? And ActiveRecord is some kind of ORM (the buzzword I was
hearing lately, but never actually care to fully understand it)?

Now I am reading book High Performance MySQL: Optimization, Backups, and
Replication (yes, I know this post about PostgreSQL) and there are so much
stuff to know, so this ORM thing seems like a toy.

So what is wrong to creating schema with plain old (database specific) SQL?

~~~
gerjomarty
Rails creates the database tables for whatever ORM you're using using its
Migrations. Rails keeps track of which migration files have been run.

The usual syntax for creating these migration files is in Ruby, as this makes
them database-independent, arguably more easily readable, and in newer
versions of Rails, reversible.

Of course, you can put plain old SQL in these migrations if you want to as
well.

------
phsr
You can also create indexes concurrently, using `algorithm: :concurrently`
when creating the index[1]. You can't create concurrent indexes within a
transaction, so be sure to call `disable_ddl_transaction!` in your migration
definition.

[1]:
[https://github.com/rails/rails/commit/2d33796457b139a58539c8...](https://github.com/rails/rails/commit/2d33796457b139a58539c890624591c97354d334)

------
asdasf
Postgresql advice from a mysql user. Foreign keys don't need indexes on the
from side, and they should be pointing to a primary key on the to side so that
is already covered. In mysql you likely want an index on the from side because
you are probably joining on it and mysql only knows how to do nested loop
joins. But postgresql is a real database, so we have hash joins and merge
joins, which don't need indexes on the join conditions, rather on the where
clause (like you would want anyways). [http://use-the-index-
luke.com/sql/join/hash-join-partial-obj...](http://use-the-index-
luke.com/sql/join/hash-join-partial-objects)

~~~
philwelch
In the case that you use your database for more than just a persistence
backend to your Rails app (blasphemy, I know, but odds are your Rails app is
generating transactional data that's worthy of analysis you need real SQL for
and not just ActiveRecord), indexing foreign keys is still worthwhile because
sometimes you'll join the same foreign key in two different tables without
ever needing to join in the table they normally map to.

~~~
asdasf
What you describe doesn't need an index on the foreign key, that's the point I
was making.

~~~
philwelch
If you designate it as a foreign key in the schema it'll just use the PK index
even if you don't join that table in at all? Wait, how does that even work?
Does the PK index include the foreign key rows then? That's an awesome
feature.

