
PostgreSQL Schema Design - leandot
https://www.graphile.org/postgraphile/postgresql-schema-design/
======
nemothekid
One thing that trips me up about postgres are schemas. What is the conceptual
difference between a "schema" and a "database"?

~~~
combatentropy
In PostgreSQL, the hierarchy is: host > cluster > database > schema > object.

By host, I mean a server. A host can have many database clusters. Usually it
has just one. To have more than one, you would have to have more than one
PostgreSQL instance running, each listening on a separate port. The default
port is 5432. I don't recommend more than one cluster per host. I just mention
it as possible.

A cluster can have many databases.

A database can have many schemas.

A schema can have many objects. By object, the most familiar is a table, but
there are other kinds of objects: views, functions, custom types, sequences.
An object cannot exist directly in the database. It must be part of a schema.
The default schema is called "public". Traditionally in other databases, there
was a schema for each user. So if jdoe logs in, his default schema is also
called jdoe. In fact in other databases this is the only schema a user can
have. You cannot make more schemas and name them whatever you wish.

The advantage of a schema over a database is that you can make a query that
uses objects in different schemas.

    
    
      select *
      from schema1.table1
          join schema2.table2 on table1.col = table2.col
    

If the tables were in different databases, then you could not combine them as
easily. I think you would have to resort to Foreign Data Wrappers.

I have gotten a long way, with many applications over many years, with one
host, one cluster, one database, and many schemas.

~~~
megous
> I don't recommend more than one cluster per host. I just mention it as
> possible.

I actually use this as a cheap data backup mechanism. I have a primary
database cluster on SSD, and two replicas on two separate harddrives. All
running as three postgresql clusters on the same machine.

The harddrives run intentionally on different filesystems, so that filesystem
bug will not eat all my DB data. (in case someone wants to suggest raid ;))

~~~
manigandham
This seems complicated and would still fail if something happens to the host
hardware itself right? What about projects like wal-e/wal-g that continuously
archive to cloud storage?

[https://github.com/wal-e/wal-e](https://github.com/wal-e/wal-e)

[https://github.com/wal-g/wal-g](https://github.com/wal-g/wal-g)

~~~
mbrameld
It would only fail if the host hardware failed in a way that damaged all three
discs, which seems unlikely. It's for data backup, not uptime redundancy.

~~~
bzzzt
A power surge caused by a lightning strike would kill your master as well as
your backup. Besides that, do you also keep all your WAL logs so you can roll
back an accidentally dropped database?

~~~
megous
A power surge here where I am may kill the power supply. It's low probability
that it would kill everything.

Anyway, I don't fear the lightnings/storms/power surges where I am as much as
I do the inevitable failures of the storage devices, or kernel bugs.

This is just a part of layered protection I have. Offline backups are nice,
and I have them too, but they are out of sync all the time by definition. So
unless absolutely needed, having a real-time synchronized replica is much more
prefered.

------
uhoh-itsmaciek
>Warning: Be very careful with logging, while we encrypt our passwords here it
may be possible that in a query or server log the password will be recorded in
plain text!

I was glad to see this noted, but surprised this is the extent of the advice
and that sending passwords to the database in plaintext is still recommended.
Are there more fleshed out best practices around avoiding logging pitfalls of
doing this?

~~~
vbezhenar
Hash password in your application. Probably it's not compatible with that
postgraphile thing, it's more general advice.

~~~
ComputerGuru
And if you’re using a language that supports it, pass passwords around in a
custom data type/wrapper around string that does not leak its contents when
your platform equivalent of .toString() is called (and override the JSON or
slog conversion interface).

------
awinter-py
is the PG built-in role system flexible enough to do more sophisticated forms
of RBAC? (groups, shares).

Is there a performance hit?

Have always liked the idea of graphile & wanted to try PG built-in roles, but
wasn't sure whether the feature set was robust / whether the DB communicates
properly about errors.

~~~
tensor
Yes, but you need to model it in the database. You don't even need to use the
built in role system, you can use a user id that you pass into your database
session.

Basically, create three tables: user, user_role, role_permission. Each user
can have one or more role, and each role has one or more permissions.
Permissions could be things like "view_admin_panel" or even granular like
"view_project_with_id_5".

Then, you can create a row level security policy that does the right look up
in these tables. I've not run this in a production system yet, but did
successfully build out a proof of concept that worked. Performance seemed
reasonable.

~~~
socceroos
Yep, I made permissions tables per-entity too with simple `CRUD` access a bit
like RWX (it was an API). It's definitely fast enough. I was handling ~5000
concurrent users with queries returning in the ns range. To be fair, I skipped
RBAC checks if the entity was marked as 'public'.

------
Dowwie
Misleading title. This is a GraphQL thing.

~~~
BenjieGillam
“While we will discuss how you can use the schema we create with PostGraphile,
this article should be useful for anyone designing a Postgres schema.”

