
PostgreSQL beginner guide - lukasbar
https://knowledgepill.it/posts/postgresql-basics-guide/
======
2mol
So this is cool, and can serve as a good cheatsheet for beginners.

However, the question where I struggled the most: how the hell do I set up a
secure Postgres instance on some cloud VM or server? (I _really_ love those
2.50 bucks per month instances)

I spent a fair amount of time reading up on it, but it's still really easy to
make a mistake in your `pg_hba.conf` or somehwere else. I remember disabling
password based login, and just authenticating over SSH, and my server still
got compromised after a day or two. 100% my fault - I think it was related to
`COPY FROM/TO` being able to run arbitrary commands because I didn't
understand that `postgres` is regarded as a superuser by the database.

I've been using managed services like Heroku Postgres and RDS ever since.

Point is: it would be of huge value to have a clear (but complete) overview of
how to configure a Postgres instance so that it's secure in the basic sense.
If that isn't possible (or very hard) then please let's collectively tell
beginners to just use managed instances.

~~~
chousuke
I'm not sure where you went wrong; by default, Postgres _is_ secure; the
configuration listens on localhost and even if you change that, does not allow
network access to the superuser. Distributions may ship with less secure
configurations but I'm not aware of any that do.

I'm also pretty sure that the upstream documentation warns against using the
superuser for application access, so if you just create a regular database
user protected by a reasonable password it will be as secure as any database
exposed to a network can be; of course, exposing databases to the internet is
something to be avoided in the first place.

non-mobile EDIT:

The above ignores TLS, which is generally a good idea if you want to make
something accessible over the network.

A guide for beginners might be useful, but if you work with these things, it
may be useful to learn how to approach security in general so that you will be
able to learn how to secure anything, or at least know when you don't know
enough.

In general, installing services securely requires the administrator to
understand how the service is accessed by legitimate users and whether in
doing so is potentially exposed to external access. If you just google for
how-tos you're quite likely to find lots of bad advice that skips security
considerations and takes you from A to B the fastest route.

The effort required is entirely dependent on your requirements; in most cases,
avoiding exposure to the internet, patching your software and using strong
passwords is enough, as it stops nearly all low-effort automated attacks.

For starters with any network-exposed service, you should understand that not
exposing it to the internet in the first place means that the rest of your
security measures will be challenged less; so if you can, limit access to
internal networks and specific hosts with firewalls and ACLs.

If you have to expose a service to the internet, then you need authentication
and authorization; anyone will be able to connect to the service, but the
service should challenge them to identify themselves using secure credentials.

Once the user has access to your service, you'll want to limit what they can
do with it. This requires reading the manual.

Lastly, you'll generally want to keep your software up-to-date; unpatched
software may have bugs that allow attackers to bypass some of the security
measures you have set up.

~~~
istjohn
> A guide for beginners might be useful, but if you work with these things, it
> may be useful to learn how to approach security in general so that you will
> be able to learn how to secure anything, or at least know when you don't
> know enough.

Any suggestions on how to get started with this?

~~~
chousuke
That's a tricky question. Personally, I find that while reading about
something is a good way to gain a vague understanding of how something works,
to actually integrate it, I need to actually get a system set up and then just
dig around and see how things interact.

What could work is something akin to a Wikipedia dive. Pick a system to set up
and make note of as many concepts along the way as you can; For example,
setting up a Postgres database involves networking (what does a "listen
address" actually mean?), different kinds of authentication (eg. pg_hba md5,
trust, and peer authentication options), OS users _and_ database users (easy
to get those two confused), among other things. One could also wonder why
there's a database superuser, and what makes the system such that using it for
application access it is a bad idea?

Then try thinking about potential ways of how all these things could interact
to break a system. This can give you a lot of insight into how to secure
systems against attackers.

For example, setting up Postgres to not authenticate users is not necessarily
detrimental to the overall security of a system if everything is local and
single-user, but makes the system extremely weak if _any_ component on the
host is exploitable from the outside.

So even if you really don't want to use passwords, you should at the very
least use peer authentication where the database allows only specified OS
users to connect such that any attacker must at least be able to access the
system as one of those users.

Lastly, if you put a non-authenticating system on a network, it should not be
surprising that anyone who happens to be able to poke at your network address
can just waltz in without resistance. You'll at the very least want strong
passwords, possibly with brute force detection to detect people trying to
guess credentials. The IPv4 internet is only about 4 billion addresses, and
automatically scanning through them for listening services is routine for
attackers. :)

~~~
treszkai
> Then try thinking about potential ways of how all these things could
> interact to break a system.

This is where your advice goes south, IMHO. The sysadmin can't think of every
possible way his system can get compromised -- they are not paid for that,
black-hat hackers are.

Instead of collecting settings from a hundred places that must be set
correctly for production, and inventing ways the system can get compromised,
the safe settings could be provided in one place: ideally in the default
settings so they are impossible to miss, or in a list, such as Django's
deployment checklist [1].

Not saying that Postgres doesn't provide such a list, although googling for
"checklist site:postgresql.org" only resulted in a mailing list reply [2],
with some points not trivial to follow. Please comment below if you know an
official one.

[1]:
[https://docs.djangoproject.com/en/3.0/howto/deployment/check...](https://docs.djangoproject.com/en/3.0/howto/deployment/checklist/)
[2]: [https://www.postgresql.org/message-
id/D960CB61B694CF459DCFB4...](https://www.postgresql.org/message-
id/D960CB61B694CF459DCFB4B0128514C202FF6542@exadv11.host.magwien.gv.at)

~~~
chousuke
Of course you don't need to think of _every_ possible way to compromise a
system; just enough to reasonably protect against your threat model given the
resources you have. If you have a complicated system that requires "hundreds
of settings", then you just have to put more effort in making sure you don't
miss anything.

I'm pretty sure most security breaches are caused by _really_ basic
configuration mistakes (or process failures). following a checklist can
definitely be effective, but if you don't actually understand why you're
configuring things as you are, you're likely to make mistakes elsewhere.

------
hliyan
The very first sentence: "By default after _instalation_ and _creting_
database cluster PostgreSQL will _listner_ only on localhost."

Bit worried about the quality of the content, based on the quality of the
proofreading.

~~~
jmcqk6
>Bit worried about the quality of the content, based on the quality of the
proofreading.

I don't know why this feeling persists. The ability to proofread or write free
of spellings is a skill completely unrelated to the underlying content.
Complaining about the spelling just seems childish at this point. We live in a
world full of mistakes and problems. Dealing with life despite of them is a
fundamental life skill.

~~~
ramraj07
There's definitely a middle ground here. I have been in academic contexts
where every sentence has to be absolute perfection before we submit for
publication. Thats overkill, for sure, but there is merit to proofreading as a
sign of legitimacy of the argument itself. Some dude literally wrote this text
and didn't even re-read the post before publishing. Clearly, he's either the
greatest expert at technological things that it's second nature to him even
more than LANGUAGE, or he just did a shoddy job on both sides. Unless I see
the name to be Linus Torvalds or something, I'm just going to assume this post
is low quality in general.

------
autotune
Can anyone recommend an “advanced” PG guide? I’m looking to learn all the ins
and outs of Postgres.

~~~
throwaway8941
[http://www.interdb.jp/pg/](http://www.interdb.jp/pg/)

[https://www.postgresql.org/docs/current/internals.html](https://www.postgresql.org/docs/current/internals.html)

------
wilsonrocks
I get that this is for learning purposes, but doesn't this tutorial start by
making your database insecure?

~~~
orev
If you need remote access to the DB, then you need remote access. Most people
in a normal setup would not have the database server ports directly exposed to
the Internet, but if you have the perspective that all servers are in the
cloud, then you would also be expected to know how to keep them secure.

But making it listen on an external IP does not inherently make it insecure,
as long as you also have other firewall and user account controls.

~~~
MaxBarraclough
I don't know much about Postgres specifically, but what I'm seeing in the
tutorial does seem insecure.

> If you need remote access to the DB, then you need remote access

That's not a defence for teaching insecure practices. It's possible to
configure Postgres for secure remote access.

Using md5 for password security seems like a red flag, and sure enough, the
Postgres docs remind us that md5 should no longer be considered secure. [0] I
don't think there's any verification of the server's fingerprint, either.

There's really no excuse here, as there's a quick and easy way to do it: SSH
tunnels. [1]

> Most people in a normal setup would not have the database server ports
> directly exposed to the Internet

Not a safe assumption. Instances on Linode and, iirc, Digital Ocean, are not
behind a cloud firewall, all ports are wide open to the Internet. (I think
that's a bad move on the part of Linode and Digital Ocean, but that's not the
point.)

> if you have the perspective that all servers are in the cloud, then you
> would also be expected to know how to keep them secure

If you already knew how to securely configure Postgres, you wouldn't need the
tutorial.

> making it listen on an external IP does not inherently make it insecure, as
> long as you also have other firewall and user account controls

You still need to configure the crypto properly.

[0] [https://www.postgresql.org/docs/current/auth-
password.html](https://www.postgresql.org/docs/current/auth-password.html)

[1] [https://www.postgresql.org/docs/current/ssh-
tunnels.html](https://www.postgresql.org/docs/current/ssh-tunnels.html)

------
avthar
This is a good beginners' resource.

For those looking for similar Postgres related resources, I've found this
handy Postgres Cheat Sheet [1] to be really useful.

[1] [https://postgrescheatsheet.com](https://postgrescheatsheet.com)

------
devwastaken
Is there a good way to embed postgres in applications? I've preferred sqlite,
but would like to just use postgres everywhere, but requiring users to install
it as a service and maintain that globally just makes it unreliable for
embedding.

~~~
tuatoru
No.

------
smashah
Related side note/question. I was on a project recently where postgres was
used with a NodeJS JavaScript server app. It seems a bit backwards. Like you
have the flexibility of a loosely typed server app but all the annoyance of a
RDBMS. It just made development a nightmare. The 'strictness' if the system
was just backwards.

Surely NoSQL + typescript is better for development (and potentially for
performance).

Thoughts?

~~~
yokaze
I see it the other way around. Code is fluid, data persists. So I would expect
more rigor in the design of the schema then anywhere in the code. And
hopefully with some fore-sight, you won't have to change it every other day.
Migrating the data requires also some reflection.

No amount of work of coding will fix your data, if it is a mountain of
inconsistent garbage.

~~~
smashah
I guess with experience comes the ability to correctly design the schema from
day one. The project was just getting off the ground so schema was evolving
constantly. If only there was such a thing as pgconfig.json which allows me to
configure how strict I want the db to be - a sliding scale between NoSQL to
SQL.

~~~
yokaze
No one gets the schema right from the beginning. Waterfall is out of fashion
for a reason. And there are schema migration frameworks out there for that
purpose.

If you care for the data you store, you'll have to take care of your data.
ACID and the "rigidness" of schemata are tools for that, and to keep the
complexity in check. It is much easier than accruing technical debt by having
unstructured data and having later to figure out to make heads and tails of
what you and your colleagues did at some arbitrary earlier time. (Not that
crappy SQL designs don't have that problem).

If you don't care for your data, as you are in the beginning, why don't you
create the DB then from scratch? You can use various editors to create the
schemata from tools you are more comfortable with.

Don't work against the tool, and find the slider in your head to adjust your
way of working.

~~~
lowwave
I strongly disagree with this. Getting the schema right, reduces code writing
and code maintenance.

