Love the idea of having APIs flow out of a single set of schema definitions. The Rails style of speccing a model, migrations, and controller/serializer or graphql types feels overly verbose and repetitive.
To me the biggest thing these groups could do to speed adoption is flesh out the feature development / test story. For instance, the postgraphile examples have development scripts that constantly clear out the DB and no tests. Compared to Rails, it's hard to imagine how you'd iterate on a product.
Are there other reasons this hasn't seen more widespread adoption? Is there some inherent architectural flaw or just not enough incremental benefit?
Also it doesn't have to be all or nothing. For example, there might be an API endpoint needed where there are multiple mutations needed inside a transaction but we just route between that custom backend and postgrest using nginx.
Other things we use:
- dbmate for migrations: https://github.com/amacneil/dbmate
- postgrest-client, a library I developed for constructing type-safe queries to Postgrest in Elm: https://github.com/alex-tan/postgrest-client
In many situations, some kind of formal migration management is obligatory and my guess is that with PostgREST you will often want to test your migrations on a current database snapshot before deploying it in production...
That's like one of the oldest RDBMS best practices: no direct to table access, with every application (or class of business user with direct access to the DB) having access through a tailored selection of views so that apps/users are largely isolated from DB changes.
> and a separate schema per API version containing views
You'd probably only need a separate schema for semver-major API versions, minor versions would necessarily be supersets which could be accommodate by backwards-compatible schema extension.
> and dealing with the pain of rewriting the views for all the supported API versions when you need to change your data model.
With a well-designed normalized relational model, most changes to the base model are adding attributes or tables, which will have zero impact on view definitions to maintain an existing API. The next most common is factoring an existing attribute out to a different table because what was conceived of as a 1:1 relationship becomes 1:N, which requires adding a join to view defs where that attribute is involved.
It’s like, no matter how low-level you go, at some point someone had a problem with components being tightly coupled and introduced an indirection layer “for next time”.
The same as read operations, with views, which can be, in Postgres, automatically updatable if they are a simple thin layer over base tables, and otherwise can be made updatable by means of appropriate triggers defining the meaning of INSERT/UPDATE/DELETE on them just as the view definition defines the meaning of SELECT.
My understanding is there is no direct-to-table access with PostgREST; you're only ever interacting with views. Keep your view api versions in separate schemas. If you change your tables, yes, you'll need to update your views, but I don't see how that's much different than having to modify ORM code to accommodate schema changes.
Not sure I understand how any other approach to provisioning an data API is going to be able to avoid requiring a data model.
As for postgrest and graphQL, queries and data transformations are proof that your DB schema shouldn't be forced the the stack and onto the clients. I would argue that graphQL strived for the exact opposite of single schema and tends to 'clients should choose their own API' approach.
Still there are a tone of applications for which just the basic types are good enough and the server is just a thin wrapper.
In the end the main requirement for efficiency using it is you skill with not basic SQL/plpgSQL (stored procedure, trigger functions, views and knowledge about the effects of the used isolation level). All well documented in postgres.
Another way I've heard this phrased is "It's much easier to hire JS devs than PL/pgSQL devs."
Worse is better. Cheap JS devs to solve everything with heaps of imperative code is cheaper than elegant solutions.
Even long term this is true. Tech debt with code that people can understand and modify is cheaper than tech debt with an elegant DSL that require learning and mental ability beyond if then else.
I've recently seen an uptick in "Postgres-native" stuff being built, though. So maybe there's hope for the future.
I'm a fan of the concept and would love to have a middle layer as thin as postgrest, but authentication/authorization has been my roadblock. I still use node as an app server because of passport. I had not heard of the postgrest starter kit before today and will check out https://docs.subzero.cloud/athentication-authorization-flow/
If they correct the typo before I get to reading it, it will be https://docs.subzero.cloud/authentication-authorization-flow...
If however you do only want to authenticate the users yourself, the login function can be very simple 
Even row-level access control, while not widespread any way, is often not enough. Complex validation in SQL can be done, of course, but SQL isn't exactly meant for these things, IMHO.
Maybe I just don't like the idea of putting all my application logic into SQL statements.
What I like about Django's ORM, for example, that you can declare a field to be an IP address, and this not only does the validation (which SQL can do easily, I am sure), but this declaration also carries over to HTML forms, admin interface, REST API serialization and HTML Templates.
It’just the tools you linked didnt prioritize the development workflow from the point of view od a backend developer (code in files, git, tests, migrations) but from the frontend developer perspective (ui to create tables)
Very very old videos but you can get the core idea
I just define my model classes. Graphql resolvers look like regular rest endpoints. Everything else is generated from there and I even get compile errors for my apollo client queries too.
I really love the typesafety of it all!
If you are actually changing variable names then your client would fail regardless of what you're using
For the record, ActiveRecord models do not specify, for example, table schemas - they are loaded from the database in the runtime, and migrations are not really necessary - they provide schema versioning with a rollback possibility, something that AFAIK the databases don't usually offer. Rails (and Ruby) is actually so non-verbose and non-repetitive, that a lot of people is criticising it for the very reason, usually complaining about it being too "magical".
But stand-alone migration tools like Sqitch do.
Postgrest was better, but we had trouble getting it to work with JWT, and have a reasonable permissions model.
Postgrest was DAMN fast though, it was really nice to work with.
In the end we had to write our own.
We recommend every production app to setup Allow Lists to prevent arbitrary querying.
Additionally, if you are using a lossy intermediate query representation instead of plain SQL, you will end up losing a lot of he database performance and find yourself doing complex joins on your front end.
But as I said, there are trade offs. I didn’t say it’s always a bad idea.
Thanks a lot for sharing.
This approach had widespread adoption in the 1990's, most notably via PowerBuilder.
> Is there some inherent architectural flaw or just not enough incremental benefit?
Yes, there is an inherent architectural flaw with two-tiered systems in that they force a system to choose to place complexity either in the GUI or in the RDBMS. This often leads to "Jenga Programming" maintenance techniques which cause the code base to take a life of its own.
It turns out that this does not scale.
0 - https://en.wikipedia.org/wiki/PowerBuilder#History
PowerBuilder didn't encourage you to know your DB ins-and-outs and use all of its features. It was DB agnostic and through its DataWindow you got a RAD way of doing CRUD for a desktop app but you still needed to add application logic in PowerScript.
PostgREST on the other hand pushes you to know PostgreSQL.
A more suitable analogy would be to something like Oracle Apex.
My intent was not to equate PostgREST with PowerBuilder, but instead to use the latter as a well-known example of a two-tier system.
> PowerBuilder didn't encourage you to know your DB ins-and-outs and use all of its features.
While PowerBuilder was "DB agnostic", as you identified, non-trivial applications relied heavily on RDBMS logic (stored procs, views, etc.). So while PB may not have "encouraged" RDBMS specific implementations, those which I encountered were never without.
Granted, the client connection protocol vastly differs between the two offerings. Yet the system architecture is shared IMHO.
Even here  i count 4 places to put logic into besides frontend
The first sentence of site reads thusly:
> PostgREST is a standalone web server that turns your PostgreSQL database directly into a RESTful API.
If there is a database on one end and a client on the other, with no intervening system between them, that is the definition of a two-tiered system.
The project home page declares this to be its raison d'être.
> ... you can very well have logic in the proxy layer and then have logic below the db layer ...
Why have a "proxy layer" to "then have logic below the db layer"? Why not just let PostgreSQL manage the data and have...
wait for it...
A system which defines the application logic? I mean, that is what you described in a roundabout way.
> If i apply the same logic (two tier bad) that i can say every other way of building an api is two tier since you have the frontend and then the rest of the code in php/node... the db would not really count as a later since in those scenatios ppl use it as a dumb store
No, what you describe as "an api" (which assumes an external client that interacts with it) and "then the rest of the code" (which is colloquially known as application logic) using PostgreSQL (what you call "a dumb store"), is known in the industry as a "three-tiered system."
Not a "two-tiered" system.
This is because there are three significant actors involved in system operations:
1 - something that requests the service provided (browser, phone, desktop binary, etc.).
2 - something that provides the service (application logic).
3 - something that persists relevant state as determined by #2.
See what happened there?
Three distinct things; a client, something that does stuff, and something that stores stuff.
The Django ORM is used pervasively throughout django and third party libraries. You can compose and reuse querysets in Templates, Form validation, authorization and so on. You can create a CRUD admin interface and a REST API with almost no extra configuration.
It also has a very mature migration system, meaning you can have your migrations in code, and apply them in production or in multiple environments.
One problem I expect with PostgREST (haven't used it yet) is that validation and authorization may suffer. At the very least this will have to be configured either as SQL data or as SQL stored procedures, with SQL not being the strong suit of most frontend people.
They probably have safeguards against SQL injections through their API. Logic validation will be harder, and Postgres only supports rudimentary permissions, so all of this has to be implemented in stored procedures, also.
My main concern with PostgREST is that the database is stateful and persistent, and that you have to put more code in there. Which means every deployment is going to be a migration, no matter what. You better don't forget any of the state changes you made in your development database. Version management will certainly be harder.
Please don't get me wrong. I am sure PostgREST has perfectly valid usecases, and I like Postgres a lot. I am just dropping a few thoughts on why I would prefer a traditional application between database and frontend.
Similarly after a week of vuejs I couldn't stop thinking redux/vuex should be an embedded sqlite ...
I think it's because PostgreSQL got major features for web development — like JSON and RLS — not too long ago and most developers are just catching up.
JSON was added back in 2013 and RLS back in 2016.
Separately, I've got a fairly advanced setup of schema/seed lifecycle flow in a Graphile-powered application that will likely be open-sourced at some point. The simplicity of the architecture makes these things relatively easy to spin up from scratch.
Give it a try.
(Im the author)
- There is no ORM that you can control. It doesn't mean that there is no transformation needed though. It just simply means that you lose control over it. All the work which were once under your control now is handled by the database layer. Of course you can go to that layer and change the logic of the framework yourself (by sending strangely formatted HTTP requests according to the doc), but then you gain nothing by using this framework. Of course this problem stems from the same fact that there are no separation of concerns. So basically the framework is simply ignoring the fact that there is an inherent object relational impedance mismatch. Of course the authors are aware that there is a mismatch but they simply sweep it under the rug by saying that:
"The PostgREST URL grammar limits the kinds of queries clients can perform. It prevents arbitrary, potentially poorly constructed and slow client queries. It’s good for quality of service, but means database administrators must create custom views and stored procedures to provide richer endpoints."
- No testability: pl/pgSQL is a really poor language to write unit/integration tests for, mostly because it's ancient and it's procedural and there are no native frameworks to do so. Of course you can use pl/Python and others, but then again, why would you when you can write the same thing in Python without embedding the code in Postgres, and the Python ORM would take care of having the same performance on the database level for you.
- Lack of tools for migrations, etc. The list goes on.
> There is no ORM that you can control
You don't need an ORM to control your database. If you treat the database as the source of truth(this would be the architecturally defined place) you can do data transformations with plain and short SQL. The great thing about this is that you can leverage PostgreSQL rich functionality and extension ecosystem and you don't have to worry if your ORM supports a pg type or if you need a plugin for that.
> by sending strangely formatted HTTP requests according to the doc
The "strangely formattted request" wouldn't be so strange if you note that PostgreSQL cast operator is "::"(e.g. `select col::json from table`).
> Python ORM would take care of having the same performance on the database level for you.
That's just false. ORMs have to point to the lowest common denominator of databases. They don't use the full potential of pg(things like LATERAL joins, json_agg, etc). PostgREST queries are specifically tailored for pg and they're being constantly improved.
> No testability
I've shared this before, but by using pgTAP you can get native SQL tests.
HNers have shared migration tools in other comments.
One of the most significant architectural defects of this approach is by demanding the persistent store provide functionality beyond solely managing data, it both conflates disparate concerns (business logic + data management) as well as unduly complicates system evolution.
> If you treat the database as the source of truth(this would be the architecturally defined place) you can do data transformations with plain and short SQL.
And if you treat the database as the source of truth for managing data, then the idea of making it an application engine would be anathema.
We argue that all "business logic" is in fact "data logic" and they're the same concern. There are several comments on this post on how to evolve this approach(only use VIEWs/sps in a single schema and expose this to clients).
> And if you treat the database as the source of truth for managing data, then the idea of making it an application engine would be anathema.
Why? Could you elaborate more on that?
> Why? Could you elaborate more on that?
Your previous declaration of:
> We argue that all "business logic" is in fact "data logic" and they're the same concern.
Is both the source of my position and your question.
For at least the last 50 years of which I am aware, the idea of "data rules all" has inextricably lead to migrating system logic closer to where the data resides. In the last 35-ish years, this meant defining application logic in stored procedures executed in an RDBMS (such as what you describe).
This architectural philosophy is commonly know as "client/server", or "two-tier systems," and has been studied extensively (as I am sure you are aware). What makes this relevant to the argument found in the "Motivation" section on the PostgREST site is:
Using PostgREST is an alternative to
manual CRUD programming.
The PostgREST philosophy establishes
a single declarative source of truth:
the data itself.
PostgREST is a standalone web server
that turns your PostgreSQL database
directly into a RESTful API.
Now, consider a different system architecture commonly known as "three-tier" or "multi-tier." The philosophy of this type of system is largely defined by separation of concerns. This is to say, have each tier be responsible for a single "layer" involved in system definition.
In multi-tier systems, the persistent store (RDBMS) is solely responsible for ensuring data integrity. It may, in fact, service disparate components as well as disparate components may use more than one persistent store. An example of the latter is PostgreSQL and Redis deployed in concert. Programs, components, and/or systems which interact with a persistent store are responsible for providing the behaviour expected by their clients whilst the persistent store is expected to enforce rules regarding the relationships it manages.
Since the persistent store is only asked to fulfill a data integrity contract, having it also attempt to be an application engine would clearly conflate concerns, limit what persistent stores could be used, as well as result in assumptive implementations. Hence anathema to those who would treat the persistent store (database) as the source of truth strictly for the one concern which it was designed to address:
There is a great talk that @felixyz shared regarding the shortcomings of the Persistence Layer approach and the advantages of Processing Engine approach(which we at PostgREST embrace).
In summary, having the "business logic" away from the db results in worse performance: more network roundtrips, ineffective cache utilization, wasted CPU cycles in traversing your programming language VM(plus all libs) code paths. Turns out this actually puts more strain on the DB and makes your whole system scale worse.
> Hence anathema to those who would treat the persistent store (database) as the source of truth strictly for the one concern which it was designed to address: Managing data.
I think marrying these concerns and having them in a single place is more of a benediction for OLTP systems development and maintenance.
After all, PostgreSQL has gotten more powerful features over the years and it has evolved to do more than data management(LISTEN/NOTIFY, JIT, etc).
I really recommend watching that talk. Btw, slides here.
No IDE support, no debugging support, poor language choice. It's just simply screaming NO NO NO NO NO NO NO on every level of the framework.
Not to mention that these days it is more important than ever to make the code also executable in cloud environments like GCP Cloud SQL or AWS RDS.
> You don't need an ORM to control your database.
You are actually wrong about that and you are basically the living proof that I was right. You didn't even try to address what I was writing about object relational impedance mismatch and inheritance.
> you can leverage PostgreSQL rich functionality
I think this is a slightly outdated world-view. As I mentioned above these days the most important driving factors are not what a specific DB vendor supports but mostly what the cloud providers do. And many of the cool things are only supported by plugins in Postgres anyway.
> you can do data transformations with plain and short SQL
You guess what? You can do the same in ORMs too! In any language which has a Postgres DB driver! No one is stopping you from that.
Can you elaborate on that? PostgreSQL is an object-relational dbms btw, it supports inheritance. Though I don't know exactly what impedance are you referring to since PostgREST doesn't use an ORM.
That's what VIEWs are for! Well, one use-case of VIEWs, anyways.
There's nothing wrong with the schema as the API since you can use VIEWs to maintain backwards compatibility as you evolve your product.
Put another way: you will have an API, you will need to maintain backwards compatibility. Not exposing a SQL schema as an API does not absolve you or make it easier to be backwards-compatible.
You might argue that you could have server-side JSON schema mapping code to help with schema transitions, and, indeed, that would be true, but whatever you write that code in, it's code, and using SQL or something else is just as well.
Both automatically updatable views (supporting insert/update/delete by default because their structure provides a clean mapping to the backing table) and views made updatable through instead-of triggers exist in Postgres.
CREATE TRIGGER <name>
INSTEAD OF INSERT OR UPDATE OR DELETE
ON <view-name> ..
More realistically, stored procs would do the CUD parts.
You can, and there is no reason you shouldn't.
> More realistically, stored procs would do the CUD parts.
Why are stored procs more realistic?
A procedure call is explicit, a trigger is implicit. You don't call a trigger, it just happens as a side effect of something else. People tend to forget implicit things. Suddenly you notice that something is acting strangely or slowly in your application. You can look at your functions and procedures and try to find the problem. But if your application has triggers all over the place, how do you know what is going on? A trigger can change a dozen rows, which in turn can change other rows, so changing a single row can fire thousands or millions of triggers. Also, triggers are not fired in a particular order, the database is free to change the query plan according to what it thinks is best at the moment, so triggers are not deterministic. Triggers can sometimes work and sometimes not.
Almost everything that can be done with a trigger can be done with a procedure, but explicitly, deterministically and in most cases without side effects.
You call an instead of trigger implementing updatability of a view, or the select query defining a view, just as much as you call property setters or getters in OOP languages.
With the DB triggers, as in many OOP languages (C#, Python), this is an implementation detail obscured from the calling site, which is good for loose coupling, modularity, etc.
Your objections, while IMO still overblown, have relevance to some uses of triggers (they are particularly applicable to AFTER triggers and BEFORE triggers other than those implementing constraints, but least applicable to INSTEAD OF triggers implementing view updatability, which is what we are discussing here.)
Good design is obvious and orthogonal. If you write setters in an OOP language in such a way that they do surprising things, i.e. not just setting a value, then I would call that bad design.
> which is good for loose coupling, modularity, etc.
What do you gain by using triggers in this case? All you get is mental overhead, because whenever you use DML you have to keep in mind that there might be a trigger hiding somewhere that does strange things.
If you call a procedure instead, you make it clear that you want to do more than just a simple update or insert.
> [...] it is possible for the method call to make use of concurrency and parallelism constructs [...] to do a unknown number of things in an unknown order
Why would I want this? I want my code simple, stupid and obvious, and not convoluted, clever and surprising.
Method dispatch (whether the language be OOP or otherwise) is also "implicit".
Think of triggers as controlled code injection.
That's...not at all guaranteed if it is possible for the method call to make use of concurrency and parallelism constructs. It's actually not at all uncommon for method calls to do a unknown number of things in an unknown order (and it can even have an unknown number of them still in progress when the method returns!)
This whole thing about triggers being implied/hidden is not based on anything other than taste.
Please give an example of a stored procedure that does something else than what the code says it will do.
> This whole thing [...] is not based on anything other than taste.
Good taste: Insert, update and delete doing exactly what the statement says. Procedures doing exactly what the code says.
Bad taste: Insert, update and delete surprising you with magic tricks.
> Please give an example of a stored procedure that does something else than what the code says it will do.
That's not what I wrote. I wrote that the caller of an SP need not know anything about the SP's body any more than the caller of a DML need not know anything about triggers. Of course that is true. And of course it's also true that the caller could know all those details in either case.
DELETE FROM customers WHERE customer_id = 1234;
It turns out our applications has triggers and by surprise the DELETE changed into
UPDATE customers SET contract_status = 'terminated' WHERE emp_id = 1234;
If the developer had instead put the logic into a procedure like terminate_contract(customer_id), we wouldn't have had this problem in the first place.
Your schema is an interface here, so read its docs!
EDIT: BTW, it's quite reasonable to have a policy to not delete, but mark deleted/terminated, rows in certain tables. This is especially important for security-relevant tables where the names of entities are liable to get written down in external (to the DB) ACLs. It's quite reasonable to implement such a policy in the DB as triggers -- indeed, there's no better way to do it, since otherwise a user who doesn't read the docs (like you) but does have privilege, might break the database.
So, actually, I really think you've completely misunderstood. What you thought was a problem was likely a solution to a bigger problem that you were not aware of. If you removed that trigger, you likely broke something. Meanwhile you're convinced triggers are a bad idea and spread that around as gospel.
The statement says DELETE and that is what I want, the rdms executes it and tells me "deleted 1" (yes, it does). What am I supposed to think?
> Your schema is an interface here, so read its docs!
What docs? Apparently triggers are obvious and self-documenting, or at least that's what the guy who wrote them thought.
> there's no better way to do it, since otherwise a user who doesn't read the docs (like you) but does have privilege, might break the database.
Why properly revoke the deletion permission and offer terminate_contract(customer_id) as an alternative if you can use a magic trick instead?
Well, you know about triggers!!! So you're supposed to think that the expected thing happened, whatever the schema designer intended.
Whoever wrote that trigger did it for a very good reason, probably the one I gave earlier (namely: to prevent name reuse, and also because gee, you might create a new contract with that customer later and not want to start from scratch).
> > Your schema is an interface here, so read its docs!
> What docs?
Those written by the people who wrote that trigger, designed the tables, your whole schema.
> Why properly revoke the deletion permission and offer terminate_contract(customer_id) as an alternative if you can use a magic trick instead?
(I assume you meant to write "Why _not_ ...".)
Well, for one, because someone who doesn't understand these things might have enough privilege and ignorance to do a real DELETE and break your data.
Moreover, the logical operation here is a DELETE! Just because you mark the row deleted instead does not make it not a logical deletion.
You're choosing to take one aspect of SQL RDBMSes extremely literally while then also choosing to ignore or malign another part. That's not a recipe for success.
Yes, and I also know about other nefarious things like monkey patching in some OO languages. Doing such things is antisocial behavior and something that 99% of developers don't expect.
> Those written by the people who wrote that trigger, designed the tables, your whole schema.
As I said, there was no documentation.
> you might create a new contract with that customer later and not want to start from scratch
Or you might get sued for non-compliance with the GDPR.
> because someone who doesn't understand these things might have enough privilege and ignorance to do a real DELETE and break your data.
Or you revoke the privileges and create a straightforward procedure. No magic, no surprises. And you can still grant delete privileges to a role that can delete customer records for GDPR reasons.
> You're choosing to take one aspect of SQL RDBMSes extremely literally while then also choosing to ignore or malign another part. That's not a recipe for success.
The difference is that SELECT, INSERT, UPDATE, and DELETE are straightforward features that do exactly what you say unless you use another malicious feature recommended by leading database experts not to use:
avoid all triggers, all of them, they are evil. — Tom Kyte
My advice is to avoid using TRIGGERs — Joe Celko
> As I said, there was no documentation.
Well, there's your problem :)
> Or you might get sued for non-compliance with the GDPR.
Non-sequitur. That's just an argument that your lawyers need to be involved in matters of data retention.
> malicious feature
That's utter nonsense. If it was "malicious", it would have been removed by now. All the evidence says that they are useful.
What next? No foreign keys? Those are typically implemented as internally-generated triggers. That must make them Doubly-malicious, or something.
While I wouldn't use it on base tables, a view that provides a “current” picture where deletes are updates (or even inserts of a deletion event) into a base tabl with (potentially along with others) provides a complete historical picture of the status of the data of which the view presents a current snapshot is a useful approach.
It's not evidence that triggers are bad, it's just bad design.
(An INSTEAD OF DELETE trigger on a view that marks a base table entry as logically deleted, causing the corresponding data to be removed from the view is, OTOH, not bad in the same way.)
> If the developer had instead put the logic into a procedure like terminate_contract(customer_id), we wouldn't have had this problem in the first place.
Or if they put it as an INSTEAD OF DELETE trigger on a current contracts view, with the design described above; conversely a badly-named store proc would have the same basic problem as the trigger at issue here. This isn't a trigger problem.
We have quite large amount of data (100TB+ and trillions of rows at this point ) and no problem with views.
I don't know, maybe it lacks comments ? The code is really not easy to follow if you are not using Haskell 100% of your coding time.
While the library may work well in practice, it's a maintainability red flag and, by using this library, you rely on rare Haskell programmers for the future.
There are also Haskell shops that use it. So, if a bug happens, at worst you'd have to open an issue, offer a bounty and watch it get solved.
> Having a bit of experience with OCaml
When I started as a PostgREST contributor I had zero experience on Haskell(not even a college course). I think the bar to contribution is not that high given that you know the domain — PostgreSQL and REST.
Not to say the codebase couldn't be improved. Commenting and refactoring is happening continuously.
What I feel, as a user, is that most features that I use are already implemented and unlikely to bit rot, since PostgreSQL itself doesn’t change a lot.
Another point is - it's not a library and you are not the one maintaining it :) the same way you are not maintaining, but still using things like postgresql,nginx,redis, rabbitmq.
I bet it's a lot easier to learn haskell and patch postgrest then to know C for 10 years and patch postgresql :)
I don't maintain them because they are written in either C or C++, which has many more practitioners, guidelines and tools to trust.
On the other hand, I can easily see people using postgres' extensions have to quickly make a patch to fix a bug or change a behavior, extensions being smaller.
The issue is that, eventually, you would like to patch, review or audit the extension. All of these operations will require you to find third-party Haskell developers, Haskell auditors, Haskell reviewers, who are rare in the job market, and therefore it represents a risk for your project.
If the core developers stop to maintain the extension, no one else might be available to maintain it, and now you have code debt and code that no one can fix
Really? No one? I learned haskell (my first FP lang) and rewrote the core of PostgREST (in my spare time) in about 6m... so stop scaring people :)
What about group/role based security concepts?
It' very similar to what we were doing with stored procs 15 years ago and just moves the problem from business logic back to database layer. Given the choice, I'd prefer to write constraints in !SQL, personally.
> What about group/role based security concepts?
Yes, those have been standard in RDBMSs for decades.
- something to conveniently version control database objects
- something to conveniently debug stored procedures. Maybe directly from vscode or your preferred editor.
If those two things get solved somehow, pg could be a really awesome application server.
> The second one, you can start here https://www.pgadmin.org/docs/pgadmin4/4.13/debugger.html
I think Starter Kit and the pgAdmin debugger lack in convenience. If you write C# or node js code in your preferred editor, you can debug it there. You can debug your express routes, webapi or resteasy controllers in vscode/vs/eclipse/intellij without leaving the file you later commit to git.
Starter Kit and the pgAdmin debugger are fine tools, but they come nowhere close to how you work with a js, C#, java, python or whatever you like codebase.
The development workflow with stored procedures imho is broken, and I think that is one of the main reasons people do not use them much.
an api backed by postgres+postgrest is 80% tables and views declarations ... how do you debug a view ... it makes no sense. You just define it and say "select * from view" (even from your IDE) and see if you get what you expect, that's why one can do a lot (develop complex apis) with less (limited debug tools)
Compare that to a node application: just open vscode, start editing away and press F5 to test it. If you want to debug it, add a breakpoint and step away. When you are done you commit and push to git.
It should be the same with stored procedures.
so you "might" need stored procedures only for your write part and even then you need them when the input data needs to be split and sent to different tables.
The complex "rules" are nothing more then "constraints" on your data which are split between the columns of the table and become so simple that there is almost nothing to debug.
If it were the case that with postgrest one needs to write complicated stored procedures all over the place you'd be 100% correct. The thing is you don't need them in most cases and when you do they are short simple functions that deal with focused things so there is way less chance to get them wrong.
This has been my experience with using this type of stack for apps like project management/invoicing.... etc (basically basecamp+freshbooks)
If you don't use them clearly you don't need to debug them. That does not change what I said earlier: a difficult development workflow hinders their adoption.
it's true that it's not a polished workflow to debug stored procedures because you you have to jump from your editor to pgadmin but i don't think this is such a big deal for two reasons.
- postgrest architecture is such that you rarely need stored procedures (not just me, all the projects), it's the exact same way ppl use databases without them (they just send queries to the db, same here). 90% of the code in this type of project is table/view definitions (with constraints) and appropriate grant/rls statements. So there is very little imperative code to debug.
- The type of stored procedures used is quite simple, isolated and mostly self contained, a single function, maybe calling some other helper functions (you are always 2 levels deep at most) so it's not like in other envs where you have to follow the code jumps between hundreds of functions and classes.
I read your comments but couldn't find the answer. What did I miss?
your code is in files (so git ...) and you edit files adn save them and the new version is loaded to the database and you can make the new api call and see it in action
I highly recommend reading the source code https://github.com/PostgREST/postgrest
As in, how do you change the data model without breaking clients?
I mean, I agree with the strategy you state about views or stored procedures, but those are just in-database ways of achieving the same kinds of things you might prefer to write in a different language (thus ORM or query engine) because it puts the app or business logic all into the same version controlled system, leverages programming language ecosystems and tools that are often way more valuable than raw database programming (even in Postgres), etc.
Basically, if PostgREST needs you to do the old tricks of views & stored procedures to manage an abstraction layer that safely allows the underlying data schema to change, I just don’t see the benefit over doing this in a much better language ecosystem, like Python, and using much better web server tools to generate the APIs.
PostgREST looks much more useful for quick prototypes, internal use cases where schema breakage might be OK occasionally, or just mirroring & monitoring data as-is for ops and diagnostics. From a performance perspective, it might be fast enough for production, but that’s almost never as big a concern as managing the intermediate abstraction layer and associated app tooling.
Does not look like a good idea for production applications that need an intermediate API layer adapting the data to the use case.
I think everyone agrees in principle that in a service-oriented architecture you need well-defined, safe, hardened interfaces between services. In an ORM world, the assumption seems to be that the database itself isn't really a service with a well-defined interface, but rather a private data store that just accepts whatever SQL you throw at it.
But what if you think of the database itself as a service? If that's the case, then your service interface should definitely not be arbitrary SQL. This is where you introduce views and stored procedures, which change your DB from a private implementation detail that you have to hide behind a service boundary to a service that sets its own boundaries.
In this world, your REST services have an HTTP client to make service calls to each other, and they have a Postgres client to make 'service calls' to your database. PostgREST is just a deterministic proxy that adapts one service protocol to another, the same way you would use grpc-gateway if you had gRPC services that you wanted to call from REST clients.
I don't think PostgREST obviates the need to write intermediate API layers, at least if those intermediate API layers are doing anything interesting. It may obviate the need to write API layers that only parameterize SQL statements and serialize JSON responses. But that's a good thing to obviate.
And yeah, you should definitely version control your DB schemas, views, and stored procedures. We aren't barbarians :)
I try to avoid generalizations about "most projects", because different people have different experiences and it's hard to make a good argument about which case is more typical. At best I think you can lay out the toolbox and explain where PostgREST fits in the toolbox. Whether or not you should use it on a particular project depends on the particular project, and I have no idea what mlthoughts2018 is working on or has worked on in the past, so that's an entirely different question :)
I’m just pointing out that PostgREST’s own tutorials _do not_ say this or even appear to agree with it. I know experienced DBAs & engineers would follow these design ideas, but PostgREST makes it sound like your REST API can literally just be the very table it’s querying from. Hence why I qualified which use cases that might be OK for in my first comment.
As long as the "business" logic is directly related to the data in question, the code in the database will always be faster, shorter (by a lot) and easier to understand (because it's shorter by orders of magnitude).
I know your reaction comes form years of tutorials telling you not to do this, but this was way back when mysql didn't even have views, things are a lot different now, databases are not dumb stores.
Give it 5 minutes and try a small project, you'll be surprised by the power
No, it could be functional, declarative, OO, whatever, depending on the language used. Personally, after a lot of years of Haskell & Scala experience in large companies, I think functional & declarative programming are way overhyped, and these types of designs do not actually offer the benefits they are claimed to. Given this, I see zero reason to care if the abstraction layer is some OOP tool. That is absolutely fine.
> “As long as the "business" logic is directly related to the data in question, the code in the database will always be faster, shorter (by a lot) and easier to understand (because it's shorter by orders of magnitude).”
This is comically wrong. I remember the ~100 line long MSSQL stored procedure just to compute (poorly) the median of a column.
Expressing things in the languages supported with the database (even in Postgres, even with language extensions) requires waaaaay more code than using application tools in the application language.
I remember how much easier our lives became at an old finance job when we finally ported a ton of stored procedures to instead use pandas for math operations in Python. The cost of passing the data through processing jobs that required serializing it to servers where the jobs could run, transforming it, then transporting it back, was absolutely worth it because we could remove thousands and thousands of lines of stored procedures that were incredibly bug prone, impossible to debug, and totally lacking the linear algebra and analytics functionality we needed to expand the app.
Moving the logic out of the database was 100% motivated by making it safer, less buggy, simpler & easier to test code, and gaining expressive functionality totally impossible to express in the database.
On the debuging part it's a bit true, the workflow is not as polished (debuging in db) as oposed to other envs.
About the "fast" part, i would not say "comically wrong", if it were, there'd be no reason for sql beyond "select * from"
It’s extremely hard to manage the Python environment itself with Postgres, as Postgres has to be compiled with it. If you’re working on scientific or analytics applications where you are spinning up a new conda environment all the time, adding & changing package dependencies, etc., and you need to keep your external-to-database Python app logic synchronized with the internal-to-database Python, it’s virtually unusable.
Imagine needing to ship new versions of an in-house machine learning library into the database, test it, update that library’s own dependencies inside the database’s python, etc. It becomes a crazy packaging nightmare very fast.
The plpython extension is basically just a cute toy. Occasionally useful or a small script or single transformation that needs Python standard library functions, but anything beyond that and it becomes very unscalable & unmaintainable very fast.
pg coupled with Nix could potentially solve all of these issues. nixpkgs includes a large collection of python libs that you can use to get reproducible environments.
The problem is not generally making an environment, it’s that different projects need different environments but need the same database.
Yes, until the very first time you'd want to reuse some of the code. From then on, any initial advantage is melting into unmaintainable mess of semi-imperative, semi-declarative copy-paste horror.
By this logic one can say "ruby is bad becasue i can't reuse logic i wrote for the backend in my frontend SPA". This is the type of (data logic) that you should not need to reuse in other parts.
Maybe this comment does a better job of explaing the architecture https://news.ycombinator.com/item?id=21436425
JOOQ mostly prevents SQL syntax errors, column name errors, column type errors, supplying the wrong number of arguments, etc. These become compile-time errors.
With PostgREST and other JSON APIs, you only get run-time errors. And you rely on test coverage to check code correctness.
I prefer compile-time errors to runtime-errors. I find that software utilizing comnpile-time checks is easier to maintain.
Thanks to PostgreSQL transactional DDL you would get all of these errors at creation-time and without any change to your database if any migration is wrong. There's no need for a SQL codegen to get this already included safety.
Btw, PostgREST is not only a JSON API. Out of the box, it supports CSV, plain text and binary output and it's extendable for supporting other media types. If you have to output xml by using pg xml functions you can do so with PostgREST.
"PostgREST: a web server that turns a PostgreSQL database into a REST API"
Seems like if you want a declarative API you might as well do something like a local read replica a la Firebase. Seems like the natural progression of these API as single schema technologies.
Is the main reasons for sticking to REST here compatibility or is there something in the RESTful design we want to hold on to?
Personally for talking with a web front-end I would use Websocket's with long-polling as fallback. And use JSON instead of query-string for querying. It does however require yet an abstraction layer, and is more brittle and less secure then REST.
REST is a school-bus. Other methods are like exotic sports-cars.
In any way, we used this approach in our company dealing with billions of rows of data and this allowed us to scale way past our "weight class".
Not to say though that this solves everything, there are cases where it does not work (as someone commented correctly and gave an example where they needed to use linear algebra over the data)
I had situation where was implementing something quite simple - an URL shortener. I didn't use PostgREST, but I decided to use ORM, because it was simple CRUD operation. It had an option to either use generated url or allow user to specify a custom one. And it worked as expected.
But then once completed I decided to add extra functionality, for example extra statistics, like what IPs were accessing it and how much. Adding expiration times etc.
I realized that ORM encouraged me to implement all of my logic in the application even when I actually would put less load on the database and made things simpler if I would let the database do many things for me and use types and functionality provided to me. I am not taking here about using stored procedures, I could do all operations as 1 at most 2 SQL statements. While ORM had to send multiple. In the end I dropped SQLAlchemy (this was python code) and just use psycopg2 directly didn't even bother with wrappers, just used built-in pools. It was also easier for me to make my code use two endpoints for reading and writing, so I can scale my code better.
I realized that ORM did not save me much code at all, it was the same amount of code with or without it, and without ORM I had greater control of what I wanted to do.
I previously believed that ORM was standing in a way when your application gets bigger, but my belief was ORM was good for small projects. Here I realized that it doesn't bring much benefits even for simple projects.
I think REST interface like this is doubling down on what ORM tries to do. Maybe it could be beneficial in places that don't have libraries to communicate with a database and only can make http requests?
I agree with the other point, this is meant to be used when your client can not ahve direct access to the db (browsers, 3rd parties). If you can connect to the db and use SQL, do that.
PyCharm actually has a database support, you can configure it so it connects to your database, then it will fetch database schema. After that it will recognize your SQL statements in the string, offer auto complete and even take these into account when refactoring your code.
IMO this is how impedance mismatch should have been handled from the beginning.
Is there anything like this for Microsoft SQL Server?
tds_fdw works pretty well for this(I've used it in a project related to open data). Basically, you'd have to map mssql tables to pg foreign tables defined on a pg schema. Lastly expose this pg schema through PostgREST.
For example, recently the NeverSSL project was on the front page. It has been submitted 10 times over the past 3 years (the homepage, not updates), and has been on the front page of HN before.
So they added REST on top of ORM, few more layers of data transformation and even leakier abstraction, so poor dev doesn't have to worry about "low level" SQL.
I lost count of how many different libs/frameworks i saw that exposed CRUD through HTTP, all failed miserably, because it is actually very dumb idea.
Postgrest is more like a compiler, it takes one language as input (REST) and outputs another language (SQL) as output.
It has 0 relation to the ORM concept
Postgrest is not an ORM, it just takes rows from your database and sends them to the client.
It encourages you to write SQL. Write a decent SQL schema. Write SQL views on top of that schema to provide the rows to your API. Even write SQL functions to modify the data in reaction to client calls!
If you hate ORMs and love SQL (specially Postgres, because it's amazing) this is the tool you should be using.
My point is - exposing db over HTTP (REST, GraphQL, ...) is bad architectural design and idea.
You say "expose db" and imply that one would directly give access to the core tables and in this way coupling the api to the data model (one should expose only views and stored procedures) and making the api inflexible.
Or are you saying this is insecure, and if so, can you describe how one would "break" this?
I'm with you that exposing a fairly free-form query language over HTTP seems like something that's gonna bite several someones in the ass pretty hard at some point (GraphQL) but what's that got to do with PostgREST?
That's why one does not deploy postgrest alone but use it in a setting like this https://github.com/subzerocloud/postgrest-starter-kit
where you at the proxy level can say things like "return 400 if the client did not provide at least 2 filters for the table" or "allow filtering on column a and b because they have indexes but not on c"
But again, those are things to consider when you have datasets exceding hundreds of K or millions, you don't need to care about this when you have tables with 10s of Ks
Exposing a poorly-indexed table can happen in any ol' Rails API app with the usual lazy 3-line controller methods bouncing about its codebase—that's the same level of risk in any system using a database, more or less. I don't see PostgREST as being more likely to cause that to happen. GraphQL's trickier, especially from a performance perspective, because the scope of a query is so hard to determine or account for up-front, and it's hard to confidently answer the question "what's the largest possible scope of a request to this endpoint?".
Also GraphQL is a protocol and you can't really say it will have poor performance, it all depends on the specific implementation of a particular schema. You can have a GraphQL schema where each query gets translated into a single sql query  jsut like with postgrest. One thing that is true is that the reference implementation for graphql servers (but only because it has to be general) pushes one in a direction where a single graphql request might generate a storm of sql queries but it's on developer to understand and write proper resolvers.
The whole point of HTTP is CRUD (Create Read Update Delete). PostgREST maps the HTTP Methods POST/GET/PATCH/DELETE to SQL's INSERT/SELECT/UPDATE/DELETE.
The generated SQL statements over HTTP requests are there to help reduce SQL boilerplate but you're expected to write custom views, functions, triggers, constraints, etc and understand "low level" pg.