General AI on non-objective ("best" is undefined here and for what usecase/priorities?) broadly covered topics like this instance is mostly just a regression to the mean with bias bleed in from other knowledge graphs (eg, trying to use correct grammar/tense (linguistics) in place of endianness(compsci)). As we traverse further into the depths of the dead internet theory, and more AI slop pollutes the internet (and in turn/tandem, poorly curated synthetic datasets), there is some inevitable Ouroboros style reinforcement here too.
So a simple filter in the sense of "omit anything too similar to X" would just omit the mean result within your given deviation. It's effectively asking, "What are some truly insane ways to use PostgreSQL", which is an interesting thought experiment, though if it actually produces useful results then you've basically just written a unit-test (Domain test?) for when AI slop evolves into full on AI jumping the shark.
If you're doing it based on cross-linking (source-citing), you're basically doing Page-Rank for AI.
If you time gate familiarity to posts only up to the NLP/General AI explosion in 2022 or so, well that might still be useful today, but for how long?
If you were to write a "Smart" filter, you're basically just writing the "PostgreSQL Best Practices" article yourself, but writing it for machines instead of humans. And I don't know what to make of that, but frankly I was lead to believe that if nothing else, the robopocalypse would be more interesting than this.
Regarding streaming replication, these days you might want to also consider a “disaggregated storage-compute” setup, like AWS Aurora, or Neon (which is open-source I believe).
Re partial indexes - those are great, though you have to be careful to keep them updated as your queries evolve. E.g. if you have a partial index on a set of enum values, you should make sure to check and potentially update it whenever you add a new enum variant.
The access control portion is not clear to me: `current_user_organization_id()` where is this reading from? We tried using `set_config` https://www.postgresql.org/docs/current/functions-admin.html and it was incredibly slow, it was not designed for how we were using which was included connection pooling and each query ran a difference "current_user_id" set_config.
I'm amazed that this has reached the almost-top of HN.
It's a very confused article that (IMO) is AI Slop.
1. Naming conventions is a weird one to start with, but okay. For the most part you don't need to worry about this with PKs, FKs, Indices etc. Those PG will automatically generate for you with the correct syntax.
2. Performance optimization, yes indices are create. But don't name them manually let PG name them for you. Also, where possible _always_ create the index concurrently. This does not lock the table. Important if you have any kind of scale.
3. Security bit of a weird jump as you've gone from app tier concern to PG management concerns. But I would say RLS isn't worth it. And the best security is going to be tightly control what can read and write. Point your reads to a read only replica.
If you have multiple unique constraints in the same table then it will spit back the same error code. I want to know the name of the specific constraint that was violated.
Always name your indexes. Always name your constraints.
I mean if you have that many violations that points to other problems, but sure if you are in that situation I can see how catching by name might be helpful. It's not a panacea.
For what it's worth. I like to call out views in the naming.
I'll typically use a "vw_" prefix prior to a name which would be similar to table naming. "v_" is also an option, but I also write a lot of DB functions and stored procedures and with all the types, tables, and column names that get used, indicating variables and parameters can be helpful... so "v_" indicates a variable name in the function context which is why views get "vw_".
Mid-range one-liners don't really make convincing arguments, though do they? I appreciate this comment may have been made more in service of a certain smugness and self-satisfaction rather than any real or concrete complaint... but you do you.
On the flip side... perhaps using techniques from the 80s with a programming language and paradigm also from the 80s... and that hasn't changed all that much... isn't all that crazy.
At the end of the day: the evaluation of a given technique such as I described, should be measured against if it makes the development experience simpler and more comprehensible. Being able to instantly tell if a given reference in a procedural query comes from a table a view, a variable, or a parameter, I find helpful since we're typically using many names from these different origins and contexts and frequently side-by-side, such as in a complex query driving the procedural code.
Of course, If you don't like that, or simply don't find it sufficiently "modern" or "fashionable" (my God, what might others think!?)... again, I invite you to do you.
> But definitely not within database functions and procedures.
I generally find Hungarian notation silly, however, there's a compelling argument to have them in functions and procedures — perhaps more than for views. The type of the variable matters just as much as where/how it was declared, such as in/out params.
That's the 'steelman' argument for them, as I understand it.
Ah, naming things. Something about one of the hard problems in computer science...
I think consistency is more important than perfection. If you use something like vw_ as recommended in the sibling comment, that's fine, then try to apply it to all views. (Without being overly strict; sometimes there are good reasons to defy conventions. They're conventions, not laws!)
Just keep in mind that a view is both code and a data contract just like a table definition is. All the usual best practices around versioning, automated deployments, and smooth upgrade paths apply. As soon as an application or downstream view, function, etc relies on that view, changing it risks disruption or breakage. Loud breakage if you're lucky, silent data corruption if you're not!
It's pretty typical to signal word boundaries in some way.
Thinking about the most general norms across database vendors, consider a table name for User Posts: You could write that as UserPosts, but in a lot of databases you're going to end up with userposts, since typically SQL and identifiers are naturally case insensitive; you can preserve case by quoting that "UserPosts", but now you have extra characters just to get the case sensitivity. Finally, you can use snake case: user_posts... true you still end up with extra characters, but ultimately I think a little less shifting. Snake case also keeps you in line with much historical precedence in writing names for the database as well.
Of course, but using one delimiter/convention for more than one things makes it hard/impossible to disambiguate them. With foo_bar_baz, you can't easily tell what's what. `"FooBar_Baz"` or maybe `foo_bar__baz` is clear.
One reason to stick to a restricted naming scheme for tables is that table names cannot be parametrized. This means you need to implement escaping manually in tools/scripts where table names are not hardcoded. Snake casing creates word boundaries that do not rely on casing and thus do not need to be quoted, there for your manual escaping/sanitization is greatly simplified (lowercase ascii characters, numbers and underscore). I’ve found this reduces the number of footguns by one in databases maintained by a team with varying degrees of familiarity with RDBMS.
Postgres stores and refers to your tables in lowercase (even though you can reference them case insensitively) so naming conventions that involve case don't really work.
Some ORMs will use a "camelCaseNameInQuotes", in which case PostgreSQL will preserve the case (as will other databases), and manual querying will require quotes everywhere.
Alternatively, you can use camelCase without quotes in your SQL, and ignore the fact that Postgres will lowercase it.
You will find major performance improvements by using `include` with your indexes. For instance, on something trivial like counting votes on a blog post:
create table post_vote (
post_id bigint not null references posts (id),
user_id bigint not null references users (id),
amount smallint not null,
primary key (post_id, user_id) include (amount)
);
select sum(amount) from post_vote where post_id = :'post_id';
Upvotes are 1, downvotes are -1. This will be a nearly instantaneous index-only scan summing the amount values.
Downside being that updates to amount are more costly and more storage required for the index. In your example though, it is probably a good optimization since how often do you change your up/down votes.
I just wanted to point out that you really have to think about these types of optimizations because they might make one query faster but slow everything else down to the point where you are spending more time overall and using more storage.
I believe they're confusing a single index with more columns vs adding additional indexes. Additional indexes would be slower to update.
I suppose additional columns could be slower to update too (more data structure traversal), but I'm guessing not by nearly as much as an additional index.
reply