
Postgres data types you should consider using - okket
https://www.citusdata.com/blog/2018/08/29/datatypes-you-should-consider-using/
======
combatentropy
Another cool thing about Postgres is each table and view you make becomes a
type.

Suppose you have a table of fruits:

    
    
      create table fruits (
          name text primary key,
          color text references colors,
          season text references seasons
      );
    

(There are other tables at play, colors and seasons). Anyway, now if you make
a function to find you some fruits in season:

    
    
      create function get_fruits_in_season(season text)
      returns fruits as
      'select * from fruits where season = $1'
      language sql;
    

Do you see that I just made a function whose return type is "fruits"? In this
example, it's a bit overkill, because all you would really probably want is
the name, and so would return "text." But it has been useful for me in real-
life examples that are too complicated right now for me to spell out.

Documentation: [https://www.postgresql.org/docs/current/static/sql-
createfun...](https://www.postgresql.org/docs/current/static/sql-
createfunction.html)

~~~
michelpp
Great tip, minor point, did you mean 'returns setof fruits'? Otherwise your
function will only return one row. Opinions vary but i think its another good
argument for singular named tables as well: 'returns fruit' and 'returns setof
fruit' reads better to my eyes.

~~~
CuriouslyC
Just to emphasize the parent's point about singular vs plural, there are a lot
of reasons to go with singular:

* Singular is much friendlier to find/replace, due to the inconsistency of pluralization.

* A good practice is to name association tables by concatenation of the tables being associated (when the association doesn't correspond to a previously named entity). teacher_class is a much better name than teachers_classes, and definitely avoid teacher_classes.

* As you move up the stack, structs, interfaces and classes have strong singular conventions. If you spend much time at higher levels, the habits you develop can lead to little bugs when working deeper in the stack.

------
Sharlin
A nifty thing we use in our current project is the ltree extension[1]. It
allows representing hierarchical relations and making hierarchical queries in
a regular flat table structure.

[1]
[https://www.postgresql.org/docs/10/static/ltree.html](https://www.postgresql.org/docs/10/static/ltree.html)

~~~
chrstphrhrt
Love LTREE! Used it for domain label hierarchies (multiple roots - an
"ontology") for a data science team to be able to train many different models
with shared/overlapping label sets (an ensemble of models for various
domains). It makes queries to get training data powerful since you can ask for
"all descendants of X" or "ancestors of Y where Z".

------
ainar-g
The reason I don't use SQL enums is that I often can't tell exactly, how many
and what values there will be when designing the database schema. Over a
period of a few months an "enum" can grow from three to a dozen values, and
writing an SQL migration each time is kind of a PITA, to say nothing about
merge conflicts. In comparison, writing a validation on the application side
is easy.

Am I missing something?

~~~
combatentropy
I have long wanted to use an enum but never have, despite using Postgres for
more than a decade.

    
    
       > In comparison, writing a validation on the application side is easy
    

But then you went there. I was going to say, instead of enums I always end up
just make a table of the choices, because invariably I will want to store one
or more columns of information about the choice. Anyway, a foreign key, as
opposed to an enum.

Writing validation on the application side is easy up front, but over the
years I have been moving more and more smarts to the database:

\- it's often less typing in SQL than your procedural language of choice

\- some constraints are easy in a database but 1,000 times harder in a
procedural middle layer (uniqueness, ACID transactions, etc.)

\- it's more efficient, because it saves the transit of data that your
application-layer validation often needs from the database to do its own thing
anyway

~~~
setr
I think you forgot one of the more important points against application-side
validation: any future applications would have to also implement the
validation or your data potentially corrupts. Applications that also include
possible meatbags hand-modifying values for some "special" scenario (CEO
order, emergency situation, etc)

The DB is a gigantic bag of data: if its got anything useful in it, then
_everyone_ will eventually want to put their grubby little hands all over it

~~~
derefr
In my experience, the key tenet for good database architecture is to assume
that _someone else_ is going to write a program, without your input, that's
going to shove data into your DB's tables, without the involvement of your
application layer. If the data that ends up in the database is wrong in such
cases, then _your application_ will be left holding the bag when it sees that
data.

Much better to just tell your DBMS the rules you want enforced, such that when
that ne'er-do-well (which might be, say, future you) attempts to insert crazy
data, it just doesn't work.

------
Yahivin
No mention of citext :\

[https://www.postgresql.org/docs/9.1/static/citext.html](https://www.postgresql.org/docs/9.1/static/citext.html)

------
glancast
For those (like me) who are wary of enums, it's very simple to create a tiny
table with one column of distinct string values. Future-proof and serves the
same function if you add not-null foreign key constraints to your dependent
tables.

~~~
jdfellow
I may be mistaken, but I believe under the hood Postgres enums work in a very
similar way to this, only the strings are interred instead of duplicated
across your tables.

~~~
glancast
Yes, that's true. I'm paying the small duplication cost in exchange for
removing enums' cumbersome limitations. Not saying I _never_ use them, but I
usually end up regretting it when I do :)

------
meddlepal
Personally, I'm on the fence with enum types... I've just seen too many times
when someone thought an enumeration was final and then later it had to be
modified or we needed alternative ways to reference it other than a plain
string conversion.

~~~
fake-name
Uh..... `ALTER TYPE enum ADD VALUE 'new_value';`?

The only potentially problematic thing is the fact that it's not
transactional.

------
michelpp
HyperLogLog, also by citusdata (no affiliation), is a great data type for
statistically approximating counts of distinct values very quickly within an
adjustable range of error:

[https://github.com/citusdata/postgresql-
hll](https://github.com/citusdata/postgresql-hll)

This has been a great secret weapon for me in a few big counting problems.

------
protomyth
Ranges are interesting. I had an app once that dealt with biodiesel. Each
gallon of biodiesel needs a serial number. I cannot remember all the rules,
but you assign the numbers sequentially at the origin. So, a 10,000 gallon
truck would have XXXXX00001 - XXXXX10000. When you offloaded into a bigger
container it would have those numbers plus any ranges already in the tank.
Loading would entail picking an amount from the ranges, reducing the range in
the tank, and assigning it to the truck. We had routines for all this
including consolidating ranges (two trucks dumping into a tank with sequential
ranges would create a big range) or subtracting from ranges. A truck could end
up with a collection of ranges. It was all arbitrary in a way and weird. I
really hope they do it differently now.

I wonder how hard it would be to write that in PostgreSQL these days using the
range types?

