If I may hijack the thread with some more general complaints though, I wish the Postgres team would someday prioritize migration. Like make it easier to make all kinds of DB changes on a live DB, make it easier to upgrade between postgres versions with zero (or low) downtime, etc etc.
Warnings when the migration you're about to do is likely to take ages because for some reason it's going to lock the entire table, instant column aliases to make renames easier, instant column aliases with runtime typecasts to make type migrations easier, etc etc etc. All this stuff is currently extremely painful for, afaict, no good reason (other than "nobody coded it", which is of course a great reason in OSS land).
I feel like there's a certain level of stockholm syndrome in the sense that to PG experts, these things aren't that painful anymore because they know all the pitfalls and gotchas and it's part of why they're such valued engineers.
With MongoDB, you're guaranteed single-prior-version replication compatibility within a cluster. This means you spin up an instance with the updated version of MongoDB, it catches up to the cluster. Zero downtime, seamless transition. There may be less than a handful of cancelled queries that are retryable but no loss of writes with their retryable writes and write concern preferences. e.g. MongoDB 3.6 can be upgraded to MongoDB 4.0 without downtime.
Edit: Possibly misinformed but the last deep dive we did indicated there was not a way to use logical replication for seamless upgrades. Will have to research.
All in all, my impression of MongoDB now is that they're one of those "fake it till you make it" success stories, where they leveraged their popularity into enough momentum to fix most their major problems.
For any component that's viewed as a database (as opposed to, say, cache), data integrity is one of the most important metrics (if not THE most).
In contrast, PostgreSQL data loss bugs are rare - and are treated extremely seriously. Defaults are sane and won't lose data. It's one of the few databases I'm pretty confident that data will be there even if you yank a server power cord mid writes.
Has MongoDB improved? Yes, leaps and bounds(seems to still fail Jepsen tests though). But I can't help but feel that it should have been released as a beta product, instead of claiming it was production ready. It wasn't. Maybe it is now. I'd still evaluate other alternatives before considering it.
That said, one thing that always amuses me is how MongoDB gets mentioned in the same context as PostgreSQL. If PostgreSQL would meet your needs, it's unlikely that MongoDB would. And vice-versa(but maybe something else like Cassandra would).
You can add deep and customized indices as desired, you can easily shard with Citus, and if you want to live without transactions you'll see equally good if not better performance - with the option to add ACID whenever you want. The developer experience argument, where the ->> operator was more confusing than brackets, is now moot.
As a former MongoDB user, there were good synergies between MongoDB and Meteor back in the day, and I loved that tech, but between Materialize and Supabase, you have vastly more options for realtime systems in the Postgres ecosystem.
And the Jepsen tests are part of the core test suite so do you some evidence they are still failing.
It’s so ridiculous and pointless to be rehashing the same issues a decade later.
Actually more a testament to the company that it’s still hugely successful and depended on by some very large applications.
Which decided whether or not to log connection warnings based on Math.random()
b) Based on the JIRA  it was designed to only log 10% of subsequent failures where there is no connection to prevent log flooding. You would still get the initial failure message.
Pretty reasonable technique and hardly the worst code ever.
That's just a bit more than a year ago. Come on.
MongoDB is like Mysqldb. I am so so so tired of hearing "that's been fixed, it's great now", doing a paper-thin dive into things, and seeing there are massive problems still.
I used MongoDB with Spring Data, it is impressively seamless.
It's just that there are way too many people who have sold snake oil for a decade-plus now, and I don't trust what they say anymore, and won't for a long long time.
The only reason it actually was put into production is because we had a vendor requirement on it (and why they thought it was sufficient, I'm not sure).
There's a difference between "not suitable for anything because it's so buggy and there's been so many problems over the years" and "not suitable as a replacement for a real RDBMS for important data". For the former, I think my opinion was possible a little harsh for the current state of it. For the latter, yeah, I'm not going to blindly trust it for billing data and processing yet, that's for sure.
It was sufficient for me to not feel like we were taking on undue risk by using it, and since our use case is not one where we're in major trouble if a problem does come about (restoring from daily backups should be sufficient) and we're not doing anything transactional, that's good enough. As I mentioned earlier, it was a vendor requirement, so we just wanted to make sure it wasn't something that was problematic enough to make us question the vendor's decision making.
The downside being that their reputation is now somewhat charred.
That's not all bad. The same could be said of MySQL. Both DBMS prioritized ease of use over data integrity in the early going.
At this point there's effectively no difference in the durability of data stored in MySQL or PostgreSQL, so it's hard to argue that one or the other made a better choice. They just got there by different paths.
In fact, PostgreSQL is winning back share in part because of licensing. GPLv2 is limiting for a lot of applications, and there continue to be concerns about Oracle ownership. It's also absorbed a lot of features from other databases like JSON support. That's not special to PostgreSQL though. It's been a trend since the beginning for SQL RDBMS and explains why they have stayed on top of the OLTP market for decades.
I would love to hear your personal experience of MongoDB as opposed to reposting old Jepsen reports. Perhaps there is something that we can address in 5.1 that is still a problem?
(I work in developer relations at MongoDB)
It also seems like there was quite a lot wrong even a year ago, quoting from there:
> Roughly 10% of transactions exhibited anomalies during normal operation, without faults.
It's just not a very reassuring response to say "when someone goes to dig a bit and finds a lot of show-stopping bugs, we address those specific bugs quickly".
To me it sounds like the architecture and care just isn't there for a robust data storage layer?
You either mathematically prove that it is correct, or it is wrong for certain.
This sounds like an oddly strong statement to say, but the guy who wrote the textbook that contained that statement went on to dig up trivial looking examples from other textbooks that were subtly wrong. His more qualified statement is that if a professor writing simplified cases in textbooks can't get it right, then the overworked developer under time pressure writing something very complex has effectively zero chance.
The MongoDB guys just don't understand this. They're convinced that if they plug just one more hole in the wire mesh, then it'll be good enough for their submarine.
PS: The professor I was referring to is Doug Lea, who wrote the "EDU.oswego.cs.dl.util.concurrent" library for Java. This was then used as the basis for the official "java.util.concurrent".
Transactions are new to MongoDB and they are not necessary for most. Structure your data model so you only perform single-document atomic transactions ($inc, $push, $pull) rather than making use of multi-document ACID transactions. It's possible, we're doing it for our ERP.
Sharding is something we've intentionally avoided opting for application-layer regional clusters. We specifically were avoiding other complexities related to shards that are not a concern for replica sets. Durability and maximum recovery time during emergency maintenance caused us to avoid them.
It can be said Mongodb is hated ad much as Postgres is loved.
Personally I have no opinion about mongodb.
I have no idea if MongoDB is good or bad at this point, but the comments of "haha it's mongo" are completely devoid of meaningful content and should be flagged.
PostgreSQL introduced JSONB GIN index support sometime around 2015, making Postgres a better fit for most JSON-based applications than Mongo.
There are valid use-cases for mongo but for vast majority of things, you're better to start with postgres. And I say that as an early adopter - I really wanted mongo to succeed but it just failed all of my expectations. All of them.
BTW: this is post about postgres.
Most of the ones for MongoDB are from a decade ago and not at all relevant today.
It is possible since PG10
Interesting. I can't wait to see how PG12 influences future offerings from the cloud providers for more seamless major version upgrades.
I'm not gonna choose MongoDB if I need a relational model… even if it offers zero downtime upgrades out-of-the-box.
Also I can't agree at all with "nobody coded it", patches in this area generally welcome. My first real patch for PostgreSQL was about improving migrations. And there has been talk about several of the ideas you propose, but nobody is working on any of them right now. So I mostly think it is a lack of resources in general.
Its hard to complain about OSS without attacking the creators, I tried to do that right buy clearly I failed nevertheless :-) Thanks for your contributions!
Some people see politeness or compassion as weakness, and ttherefore use terms acknowledging it as pejoratives. The funny thing is, they think this makes objects of derision of their targets, and don't realise that it's themselves it does so.
BS, have you never worked with mysql or sqlite?
It’s absolutely impossible that you’re not trolling with such a statement.
It's a joke that you have to make a offline dump and import for upgrades with pgsql.
I’ve never had it fail for me. :/
There's a huge collection of tricks out there that just need to become formal features. The trick I'm working with today is adding a check constraint with NOT VALID and then immediately calling VALIDATE because otherwise it takes a very aggressive lock that blocks writes. That could easy become ALTER TABLE CONCURRENTLY or something.
I'm working through this blog post  now, but there is still a bit to be defined (including a dependency on heroku's support team) to get this rolling.
Why the migration is required? Heroku postgres doesn't support logical replication, and logical replication is required for any ELT vendor (Fivetran, Stitch, Airbyte) to use Change Data Capture to replicate data from postgres to snowflake (with replicating deleted rows efficiently).
Note: I've also read this ebook , but this approach requires downtime.
Note 2: I reached out to heroku support and asked if logical replication was on their short term roadmap. They said they've heard this quite a bit, but nothing tangible is on the roadmap.
If anyone has any thoughts on the above migration, I'd be all ears. :)
My best advice is to automate the whole thing. You can automate it with the Heroku and AWS CLIs. Test on your staging site until you can run through the whole process end to end a few times with no interruptions.
If you have any significant amount of data where you're worried about a migration, stay far away from hosted postgres offerings. You'll never get your data out without significant downtime.
Do you have any resources which talk through the read/write/backfill approach?
Here's what I found so far:
Hopefully it's somewhat helpful!
You could possibly hack together some form of higher-layer logical replication via postgres_fdw and database triggers. A comment ITT references this as a known technique.
Heroku's own Kafka seems to have slightly more native support than if you use a 3rd party like Confluence.
We've not yet tried any of this, but it's been bookmarked as a possible solution to explore.
I thought about using Debezium and Kafka to roll my own micro-batch ETL solution, but listening to this podcast made me walk away slowly: https://www.dataengineeringpodcast.com/datacoral-change-data...
Disclaimer: I work on Debezium
The main part I reconsidered based on was the level of effort taking the data from kafka and landing into snowflake, especially around handle postgres schema changes safely. I also have no experience with kafka, so I'd be out of my depth's pretty quickly for a critical part of the architecture. He also expressed the need for building quality checks into the kafka to snowflake code, but those details were a bit sparse (if i recall correctly).
Note: all of the above are probably outside the scope of debezium. :)
Note 2: your article  on using cdc to build audit logs w/ a "transactions" table blew my mind. Once I listened to your data engineering podcast interview , I knew there was some implementation of "event sourcing lite w/ a crud app" possible, so I was excited to see you had already laid it out.
Adding to your list of options that still require _some_ downtime: we used Bucardo  in lieu of logical replication. It was a bit of a pain, since Bucardo has some rough edges, but we made it work. Database was ~2 TiB.
I don't know if I’m a PG expert, but I just prefer “migration tool” to be a separate thing, and for the DB server engine to focus on being an excellent DB server engine, with the right hooks to support a robust tooling ecosystem, rather than trying to be the tooling ecosystem.
But, realistically, pg_upgrade's functionality would need to be integrated into the server itself if we're ever going to have zero-downtime upgrades, right?
I don't know how other RDBMSs handle this, if at all
I used and abused it pretty hard at my previous gig and now it's hard to imagine ever living without it.
At my gig before THAT, we had MySQL and schema migrations were so very very painful.
A thing I'm interested in is a 'simple' replication setup to reduce single points of failure. We currently use Galera with My/MariaDB/Percona and it's quite handy for HA-ish needs: we can have two DBs and the garbd running on the web app server.
Pointers to tutorials for Debian/Ubuntu to accomplish something similar would be appreciated. (We run things on-prem in a private cloud.)
Also PG is one of the few that support schema/DDL statements inside a transaction.
The mere presence of it on the dev workflow keeps your developers thinking about these types of issues as well, which goes a long way.
> PostgreSQL 14 extends its performance gains to the vacuuming system, including optimizations for reducing overhead from B-Trees. This release also adds a vacuum "emergency mode" that is designed to prevent transaction ID wraparound
Dealing with transaction ID wraparounds in Postgres was one of the most daunting but fun experiences for me as a young SRE. Each time a transaction modifies rows in a PG database, it increments the transaction ID counter. This counter is stored as a 32-bit integer and it's critical to the MVCC transaction semantics - a transaction with a higher ID should not be visible to a transaction with a lower ID. If the value hits 2 billion and wraps around, disaster strikes as past transactions now appear to be in the future. If PG detects it is reaching that point, it complains loudly and eventually stops further writes to the database to prevent data loss.
Postgres avoids getting anywhere close to this situation in almost all deployments by performing routine "auto-vacuums" which mark old row versions as "frozen" so they are no longer using up transaction ID slots. However, there are a couple situations where vacuum will not be able to clean up enough row versions. In our case, this was due to long-running transactions that consumed IDs but never finished. Also it is possible but highly inadvisable to disable auto-vacuums. Here is a postmortem from Sentry who had to deal with this leading to downtime: https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...
It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound event, and as with every PG feature highly granular settings are exposed to tweak this behaviour (https://www.postgresql.org/about/featurematrix/detail/360/)
When I was running my first Postgres cluster (the reddit databases), I had no idea what vacuuming was for. All I knew was that every time it ran it slowed everything down. Being dumb, I didn't bother to read the docs, I just disabled the auto vacuum.
Eventually writes stopped and I had to take a downtime to do a vacuum. Learned a few important lessons that day. I also then set it up to do an aggressive vacuum every day at 3am, which was the beginning of low traffic time, so that the auto-vacuuming didn't have as much work to do during the day.
It's a bit more subtle than that: each transaction that modifies, deletes or locks rows will update the txID counter. Row updates don't get their own txID assigned.
> It looks like the new vacuum "emergency mode" functionality starts vacuuming more aggressively when getting closer to the wraparound
When close to wraparound, the autovacuum daemon stops cleaning up the vacuumed tables' indexes, yes. That saves time and IO, at the cost of index and some table bloat, but both are generally preferred over a system-blocking wraparound vacuum.
MS SQL also has snapshot (and read committed snapshot) isolation levels. These are much more like the Postgresql isolation levels - in fact Postgres only has two 'real' isolation levels, read committed and serializable, you get upgraded to the next higher level as permitted in the spec.
In snapshot isolation instead of taking a lock SQL Server copies the row to a table TempDB when it would be overwritten, additionally it adds a 14-byte row version to each row written. There's a lot of detail here: https://docs.microsoft.com/en-us/sql/relational-databases/sq...
This is also why MS SQL maintains a clustered index on the main table - the main table only contains the latest globally-visible version, so it can be sorted. Postgres stores all versions (until vacuum removes dead rows), so the main table is a heap, only indexes are sorted.
Since details matter, there's a post that explains it far better than I could:
Oh, and C codebases make such changes far harder than more rigid newtyping in e.g. Rust, which is why I assume no one made the necessary patches yet.
It's amazing what improvements we can get through public collaboration.
Also great is the parallel query support for materialized views, connection scalability, query pipelining, and jsonb accessor syntax.
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
sudo apt install postgresql-13
sudo service postgresql stop
sudo mkdir -p /secure/pgsql13/data
sudo chown -R postgres:postgres /secure/pgsql13/data
sudo -u postgres /usr/lib/postgresql/13/bin/initdb -D /secure/pgsql13/data
sudo rm -rf /var/lib/postgresql/13/main
sudo ln -s /secure/pgsql13/data /var/lib/postgresql/13/main
sudo chown -hR postgres:postgres /var/lib/postgresql/13/main
sudo ln -s /etc/postgresql/9.5/main/postgresql.conf /var/lib/postgresql/9.5/main/postgresql.conf
sudo chown -hR postgres:postgres /etc/postgresql/9.5/main/postgresql.conf
sudo -u postgres time /usr/lib/postgresql/13/bin/pg_upgrade --old-bindir /usr/lib/postgresql/9.5/bin --new-bindir /usr/lib/postgresql/13/bin --old-datadir /var/lib/postgresql/9.5/main --new-datadir /var/lib/postgresql/13/main --link --check
sudo -u postgres time /usr/lib/postgresql/13/bin/pg_upgrade --old-bindir /usr/lib/postgresql/9.5/bin --new-bindir /usr/lib/postgresql/13/bin --old-datadir /var/lib/postgresql/9.5/main --new-datadir /var/lib/postgresql/13/main --link
sudo service postgresql start 13
sudo -u postgres "/usr/lib/postgresql/13/bin/vacuumdb" --all -j 32 --analyze-only
sudo -u postgres ./delete_old_cluster.sh
Is there a reason to do this rather than putting the key in /etc/apt/trusted.gpg.d/?
So trusted.gpg.d/ is not the recommended method. For more information, see the official Debian wiki which states "The key MUST NOT be placed in /etc/apt/trusted.gpg.d" https://wiki.debian.org/DebianRepository/UseThirdParty
> The key MUST NOT be placed in /etc/apt/trusted.gpg.d or loaded by apt-key add.
And yet the snippet that I quoted has the latter command.
Further: the link has the example "[signed-by=/usr/share/keyrings/deriv-archive-keyring.gpg]". Perhaps it's my BSD upbringing showing through, but shouldn't only/mostly OS-provided stuff generally live in /usr/share? Shouldn't locally-added stuff go into /usr/local/share? Or perhaps creating a /etc/apt/local.gpg.d would be appropriate?
EDIT: Looks like pg_upgrade supports directly upgrading multiple major versions. So maybe just use that if you can afford the downtime.
While schema changes are probably not much of a problem because that's something you can easily prevent during migration, sequence usage is because once you fail over to the logically replicated secondary, you will have to reset all sequences to their current values or all further inserts into tables with sequences will fail.
the other option, of course is to not use sequences but rather use UUIDs for your primary keys, but those have their other collection of issues
Without downtime / near-zero downtime is more interesting though. Since this is an old version, something like Bucardo maybe? It can keep another pg14 instance in sync with your old instance by copying the data over and keeping it in sync. Then you switch your app over to the new DB and kill the old one.
Newer versions make this even easier with logical replication - just add the new DB as a logical replica of the old one, and kill the old one and switch in under 5 seconds when you're ready.
IMO, that counts as "near-zero downtime".
I wish future versions of PostgreSQL will have some backwards compatibility for old system/data tables/datastructures and be able to do live migration when running a newer release.
pg_upgrade will not work if the internal data structure for your data changes. It only recreates the system tables. https://www.postgresql.org/docs/14/pgupgrade.html
It doesn't automatically re-create indexes (to take advantage of new features) for example, but that is likely something you don't want to do right away (incurring extra downtime) when you are doing an upgrade anyways. You can easily just REINDEX after the upgrade is complete.
According to Robert Haas, the zheap project is dead.
Congrats on the 14.0 release.
The pace of open source has me wondering what we'll be seeing 50 years from now.
Also there's Pinecone (https://www.pinecone.io) which can sit alongside Postgres or any other data warehouse and ingest vector embeddings + metadata for vector search/retrieval.
i.e. in a dense vector every value in the vector is stored
whereas sparse vectors exist to save space when you have large vectors where most of the values are usually zero - they reconstruct the full vector by storing only the non-zero values, plus their indices
"a dense vector is the opposite of a sparse vector"
Elastics search, with PostgreSQL's everything else sounds like a perfect match, but I've never seen it used anywhere which is kind of spooky to me.
Do you know of any non-toy use cases I can look at?
Also probably one of the most knowledgeable people about both Postgres and Rust I've ever met.
He makes his living primarily from supporting enterprises using ZomboDB (I think), not sure how much he can say, but it's worth asking.
Makes me wish I had a big deployment so I could justify sponsoring them.
I paid in cat pictures for now.
So far I think I'm going to go with Milvus, ideally I'd just have a foreign data wrapper for Milvus or FAISS.
• Up to 2x speed up when using many DB connections
• ANALYZE runs significantly faster. This should make PG version upgrades much easier.
• Reduced index bloat. This has been improving in each of the last few major releases.
• JSON subscript syntax, like column['key']
• date_bin function to group timestamps to an interval, like every 15 minutes.
• VACUUM "emergency mode" to better prevent transaction ID wraparound
I came from MySQL and so I’m still just excited about the basic stuff like authentication and policies, but I really like how they’ve also integrated storage with the same permissions and auth too.
It’s also open source so if you can to just host it yourself you stil can.
And did I mention they’ll do your auth for you?
We're about to wrap up Row Level Security on our real-time APIs too - soon you'll be able to use Policies universally on all APIs.
You also get a feeling about a company pretty quick from their docs (I think it’s the best way to judge a company) and I have to say I had the same feeling as using Twilio or Cloudflare for the first time - so you’re in good company.
I feel like I live in a cave because I haven't quite understood what problem Supabase/Firebase is solving for.
We give you a full PG database for every project, and auto-generated APIs using PostgREST . We configure everything in your project so that it's easy to use Postgres Row Level Security.
As OP mentions, we also provide a few additional services that you typically need when building a product - connection pooling (pgbouncer), object storage, authentication + user management, dashboards, reports, etc. You don't need to use all of these - you can just use us as a "DBaaS" too.
I think it's fantastic to make deploying existing software/tools easier, and people are definitely willing to pay for the ease, curious though - what prevents the Postgres team from taking supabase contributions (since it's Apache 2.0) and including it in core Postgres?
Absolutely nothing - we are open source and we would encourage it. However we're not modifying Postgres itself at this stage, we're just providing tooling around it. We have hired developers to work specifically on Postgres but we will always attempt to upstream (if the PG community wants it) rather than maintain a fork
I prefer to use companies where you're using their bread and butter service. That way you know they won't suddenly lose interest in it, which google has a reputation for, especially anything to do with google cloud.
I just find what Supabase it trying to do matches up exactly with what I wanted - it solves 90% of my developer headaches and all works together nicely.
Plus I just get good vibes from Supabase in general.
DBeaver is adequate, but not really built with Postgres in mind.
Before that, or when I'm in a rush to just get something done as I adjust to Beekeeper, I use Postbird (an Electron app) [https://github.com/Paxa/postbird]
I love Postgres and started Beekeeper because I wanted 'Sequel Pro for PSQL on Linux'. We have a lot of users so seems like we're doing something right.
Very happy for new PSQL.
If you have questions, let me know!
And azure data studio now has some pg support: https://docs.microsoft.com/en-us/sql/azure-data-studio/exten...
Personally I've landed on just using pgcli.com (and pg_dump etc).
edit: it's a cross-platform tool, supporting Linux, Mac and Windows (32/64)
It looks good, performs very well, and supports WAY more than just pg. It’s an indispensable part of my dev tooling. Bought it what last year, year before maybe? Very happy customer here, zero problems, all gravy.
I would also easily make accidental GUI changes that I could only revert by reloading original settings.
A less galling example was the introduction of a highlight that marked the line in a query where an error occurred. It was a bright blue color, hard on the eyes, almost completely obscuring the text. It’s a comparatively minor issue, but illustrative of how the tool’s usability was steadily declining.
We usually write our DB migrations in VS Code along with any other code changes, but the PG support in VS Code seems to be lacking. Just some naïve validation of PostgreSQL code inside VS Code would be awesome!
I've used a lot of very very expensive ones that aren't as polished as pgAdmin4.
The whole "native app with a web interface" thing is... probably not anybody's idea of "ideal," but I 100% respect it. It's free and it's multiplatform. Choices and compromises were made to achieve that and I don't know that I would have done differently when faced with the same constraints.
That all being said, TablePlus is 80 - 90% of that and the fact that it also works with multiple databases is huge. I also love the vertical tabs for multiple connections, editor sidebar pane for single records and tabbed queries vs a single query pane.
Both are fantastic, but if I was postgres only I'd probably stick with postico. As-is I just paid for both.
(Don't upvote me, I just googled it)
I gathered release dates a few weeks ago because I was curious. There aren't that many data points, but 150 days might be a good guess
13.1 was released 2020-11-12
13.2 was released 2021-02-11 https://www.postgresql.org/docs/13/release-13-2.html
AWS supported 13 as of 2021-02-24 https://aws.amazon.com/about-aws/whats-new/2021/02/amazon-rd...
From the few days I have explored it, it is absolutely incredible, so congratulations for the work done and good luck on keeping the quality so high!
- Grab a good book or two. The Art of PostgreSQL is one. There are _many_ others. Take a bit of time to find out what suits you. I typically like reading a more theoretical one and a more practical one (with actionable, concrete advice, actual code in it and so on).
- Get a nice tool/IDE/editor plugin that can interactively work with databases, create them, query them etc. Almost all of the editor utilities that you expect from general purpose programming also applies to programming with SQL.
- PostgreSQL is a very powerful, primarily relational database, but there is very little it cannot do. For example it is perfectly feasible to implement a document (JSON) store with it, and convenient to boot. There are also great extensions such as for temporal tables.
- There is a ton of advice, discussion etc. to find, especially around data modelling. Word of caution: Some of it is religious, at least slightly. The book I mentioned above too to a certain degree, but it is still a very good book. Realize that a relational DB can give you a ton of stuff in a performant, declarative manner. But not everything should live in the DB, nor is everything relational in the strictest sense. Be pragmatic.
Source: I've been diving deeper into this topic since a few years and still am. The more I learn and are able to apply, the more I understand why the relational paradigm is so dominant and powerful.
I hope you have the same kind of experience and enjoy Postgres!
As far as actual advice...
1. The key concept of an RDBMS is that each table typically specifies relationships to other tables. So in order to query data you will typically be joining multiple tables. Sometimes new DB programmers get a lot of enlightenment from the simple Venn diagrams that illustrate the basic types of JOINs: https://www.google.com/search?q=inner+join+diagram ...IMHO, once you get this (fairly simple) paradigm the rest is easy.
2. Database constraints are your friends. At a minimum you specify the types of the columns, obviously, but you can go much farther - specifying NOT NULL constraints, foreign keys, and more complex check conditions. Do as much of this as feasible at the database level. Otherwise it is something you need to code and enforce at the application level, where it will generally be less performant and more prone to coder error. Additionally, any constraints not enforced at the application level will need to be implemented across multiple applications if more than one app uses the database.
The second one above is an example of something that sounds boring and restrictive but really frees you up to do other things as a developer. About a decade ago, the "trend" was to treat RDBMSs as "dumb" storage and do all that stuff at the application level, in the name of being database-agnostic. Opinions remain divided, but I think that was an objectively bad trend. For one thing, folks noticed they hardly ever needed to suddenly switch databases, but there are other reasons as well.
Otherwise maybe try it with a toy project.
Honest question: how often do you switch databases?
I've never really found myself wanting or needing to do this.
Only time I could really see myself wanting to do this is if I was writing some kind of commercial software (eg, a database IDE like DataGrip) that needed to simultaneously support various multiple databases.
Yet now we are, because some core customers demand MSSQL support...
Of course the db we're using supports all kind of non-standard goodness that has been exploited all over. Gonna be fun times ahead...
(I don't mean that sarcastically. I truly wish you luck.)
Some companies end up with a mix of different DBs and it can help to consolidate to share expertise or resources.
Though at this point both have grown much closer together in capabilities and performance.