Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] PostgreSQL Best Practices (speakdatascience.com)
71 points by lemonwaterlime 1 day ago | hide | past | favorite | 45 comments





shallow AI ass article, i’m betting i’d get the exact same if i’d ask an llm to write me an article about postgres best practices

Wonder if there is a way to use that kind of technique as a filter.

General AI on non-objective ("best" is undefined here and for what usecase/priorities?) broadly covered topics like this instance is mostly just a regression to the mean with bias bleed in from other knowledge graphs (eg, trying to use correct grammar/tense (linguistics) in place of endianness(compsci)). As we traverse further into the depths of the dead internet theory, and more AI slop pollutes the internet (and in turn/tandem, poorly curated synthetic datasets), there is some inevitable Ouroboros style reinforcement here too.

So a simple filter in the sense of "omit anything too similar to X" would just omit the mean result within your given deviation. It's effectively asking, "What are some truly insane ways to use PostgreSQL", which is an interesting thought experiment, though if it actually produces useful results then you've basically just written a unit-test (Domain test?) for when AI slop evolves into full on AI jumping the shark.

If you're doing it based on cross-linking (source-citing), you're basically doing Page-Rank for AI.

If you time gate familiarity to posts only up to the NLP/General AI explosion in 2022 or so, well that might still be useful today, but for how long?

If you were to write a "Smart" filter, you're basically just writing the "PostgreSQL Best Practices" article yourself, but writing it for machines instead of humans. And I don't know what to make of that, but frankly I was lead to believe that if nothing else, the robopocalypse would be more interesting than this.


> Here are recommended naming practices:

Recommended by whom, based on what?

Everything in this article seems to be blogspam, with zero links, sources, or rationales for the decisions provided.


Regarding streaming replication, these days you might want to also consider a “disaggregated storage-compute” setup, like AWS Aurora, or Neon (which is open-source I believe).

Re partial indexes - those are great, though you have to be careful to keep them updated as your queries evolve. E.g. if you have a partial index on a set of enum values, you should make sure to check and potentially update it whenever you add a new enum variant.


The access control portion is not clear to me: `current_user_organization_id()` where is this reading from? We tried using `set_config` https://www.postgresql.org/docs/current/functions-admin.html and it was incredibly slow, it was not designed for how we were using which was included connection pooling and each query ran a difference "current_user_id" set_config.

After asking myself the same (i guess) it seems like setting a “session” beforehand is the key https://www.postgresql.org/docs/current/sql-set.html

Plural table names are silly, I will die on this hill.

hills

I'm amazed that this has reached the almost-top of HN.

It's a very confused article that (IMO) is AI Slop.

1. Naming conventions is a weird one to start with, but okay. For the most part you don't need to worry about this with PKs, FKs, Indices etc. Those PG will automatically generate for you with the correct syntax.

2. Performance optimization, yes indices are create. But don't name them manually let PG name them for you. Also, where possible _always_ create the index concurrently. This does not lock the table. Important if you have any kind of scale.

3. Security bit of a weird jump as you've gone from app tier concern to PG management concerns. But I would say RLS isn't worth it. And the best security is going to be tightly control what can read and write. Point your reads to a read only replica.

4. pg_dump was never meant to be a backup method; see https://www.postgresql.org/message-id/flat/70b48475-7706-426...

5. You don't need to schedule VACUUM or ANALYZE. PG has the AUTO version of both.

6. Jesus, never use a DEFAULT value on a large table that causes a table rewrite which will cause you downtime. Don't use varchar(n) https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use...

This is an awful article.


> 4. pg_dump was never meant to be a backup method; see https://www.postgresql.org/message-id/flat/70b48475-7706-426...

Those are very recent emails that don't say anything about what it was originally meant for, just that nowadays there's better options.

Though really the part they quote called it a backup tool, that does imply it was originally meant to be a backup method.


> I'm amazed that this has reached the almost-top of HN.

The article is objectively bad; the subject is clearly something a lot of people think about, myself included.


> don't name them manually let PG name them for you

If you care about catching stuff like unique constraint violations in your code then you should know what the indexes are named.

Always name your indexes and constraints. They're part of your code. How would you catch an exception that you don't know the name of?


You don't need to know the name; I prefer to go by the error code PG spits back at you. In this case 23505.

https://www.postgresql.org/docs/current/errcodes-appendix.ht...

That is much more flexible than hard coding the name.


If you have multiple unique constraints in the same table then it will spit back the same error code. I want to know the name of the specific constraint that was violated.

Always name your indexes. Always name your constraints.


I mean if you have that many violations that points to other problems, but sure if you are in that situation I can see how catching by name might be helpful. It's not a panacea.

It's not that hard to just name the things you care about instead of relying on auto-generated stuff from the database. PG lets you do this easily.

> if you have that many violations that points to other problems

No, it doesn't. This is a sophisticated design choice. You might be out of your depth here.


Okey dokey mate, you do you. Have a great one.

This is a great book that explains PG in a step-by-step way. I learned a ton from it

https://theartofpostgresql.com/


I'm aware of it thanks.

Also useful for migrations, especially if you support multiple DB backends.

I’m struggling naming my views. Any recommendations? Same naming as tables?

For what it's worth. I like to call out views in the naming.

I'll typically use a "vw_" prefix prior to a name which would be similar to table naming. "v_" is also an option, but I also write a lot of DB functions and stored procedures and with all the types, tables, and column names that get used, indicating variables and parameters can be helpful... so "v_" indicates a variable name in the function context which is why views get "vw_".


I do this but at the end: some_table_vw

1980 called, they want their Hungarian Notation back.

Mid-range one-liners don't really make convincing arguments, though do they? I appreciate this comment may have been made more in service of a certain smugness and self-satisfaction rather than any real or concrete complaint... but you do you.

On the flip side... perhaps using techniques from the 80s with a programming language and paradigm also from the 80s... and that hasn't changed all that much... isn't all that crazy.

At the end of the day: the evaluation of a given technique such as I described, should be measured against if it makes the development experience simpler and more comprehensible. Being able to instantly tell if a given reference in a procedural query comes from a table a view, a variable, or a parameter, I find helpful since we're typically using many names from these different origins and contexts and frequently side-by-side, such as in a complex query driving the procedural code.

Of course, If you don't like that, or simply don't find it sufficiently "modern" or "fashionable" (my God, what might others think!?)... again, I invite you to do you.


That's an easy swipe to make. What would you suggest instead?

A prefix/suffix for views is maaybe acceptable. But definitely not within database functions and procedures.

> But definitely not within database functions and procedures.

I generally find Hungarian notation silly, however, there's a compelling argument to have them in functions and procedures — perhaps more than for views. The type of the variable matters just as much as where/how it was declared, such as in/out params.

That's the 'steelman' argument for them, as I understand it.


Ah, naming things. Something about one of the hard problems in computer science...

I think consistency is more important than perfection. If you use something like vw_ as recommended in the sibling comment, that's fine, then try to apply it to all views. (Without being overly strict; sometimes there are good reasons to defy conventions. They're conventions, not laws!)

Just keep in mind that a view is both code and a data contract just like a table definition is. All the usual best practices around versioning, automated deployments, and smooth upgrade paths apply. As soon as an application or downstream view, function, etc relies on that view, changing it risks disruption or breakage. Loud breakage if you're lucky, silent data corruption if you're not!


Why is snake_case recommended? What advantage(s) does it provide?

It's pretty typical to signal word boundaries in some way.

Thinking about the most general norms across database vendors, consider a table name for User Posts: You could write that as UserPosts, but in a lot of databases you're going to end up with userposts, since typically SQL and identifiers are naturally case insensitive; you can preserve case by quoting that "UserPosts", but now you have extra characters just to get the case sensitivity. Finally, you can use snake case: user_posts... true you still end up with extra characters, but ultimately I think a little less shifting. Snake case also keeps you in line with much historical precedence in writing names for the database as well.


Of course, but using one delimiter/convention for more than one things makes it hard/impossible to disambiguate them. With foo_bar_baz, you can't easily tell what's what. `"FooBar_Baz"` or maybe `foo_bar__baz` is clear.

One reason to stick to a restricted naming scheme for tables is that table names cannot be parametrized. This means you need to implement escaping manually in tools/scripts where table names are not hardcoded. Snake casing creates word boundaries that do not rely on casing and thus do not need to be quoted, there for your manual escaping/sanitization is greatly simplified (lowercase ascii characters, numbers and underscore). I’ve found this reduces the number of footguns by one in databases maintained by a team with varying degrees of familiarity with RDBMS.

Postgres stores and refers to your tables in lowercase (even though you can reference them case insensitively) so naming conventions that involve case don't really work.

You just need to quote your identifiers with double quotes, e.g.:

    SELECT * FROM "MyTable"
You can even use reserved SQL keywords in table/column names as long as they are double-quoted.

With quotes you can even use spaces in table and column names.

Some ORMs will use a "camelCaseNameInQuotes", in which case PostgreSQL will preserve the case (as will other databases), and manual querying will require quotes everywhere.

Alternatively, you can use camelCase without quotes in your SQL, and ignore the fact that Postgres will lowercase it.


You will find major performance improvements by using `include` with your indexes. For instance, on something trivial like counting votes on a blog post:

    create table post_vote (
        post_id bigint not null references posts (id),
        user_id bigint not null references users (id),
        amount smallint not null,

        primary key (post_id, user_id) include (amount)
    );

    select sum(amount) from post_vote where post_id = :'post_id';
Upvotes are 1, downvotes are -1. This will be a nearly instantaneous index-only scan summing the amount values.

Downside being that updates to amount are more costly and more storage required for the index. In your example though, it is probably a good optimization since how often do you change your up/down votes.

I just wanted to point out that you really have to think about these types of optimizations because they might make one query faster but slow everything else down to the point where you are spending more time overall and using more storage.


> Downside being that updates to amount are more costly

PG always does a delete + create for updates. It will have no additional overhead at all


You are updating both the data and the index when updating amount instead of just the data. That is twice the number of IO's.

It's the same IOs. PG does it regardless. Both the data and index

I believe they're confusing a single index with more columns vs adding additional indexes. Additional indexes would be slower to update.

I suppose additional columns could be slower to update too (more data structure traversal), but I'm guessing not by nearly as much as an additional index.


TIL. Thanks!



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: