I feel that by using these types of tools you could really miss out on a fantastic opportunity to create a schema that models the business domain as accurately and elegantly as you want to make it, regardless of the underlying service topology or the databases behind it.
That's what I find the most attractive about GraphQL. Large companies and companies that have grown fast often have all kinds of APIs made at different time by different people with different standards. GraphQL lets you paper over that without breaking existing, revenue making code.
I completely agree. I've posted about this before in discussions about Hasura, but I feel that all these "expose your DB as a GraphQL endpoint" are setting themselves up for a world of hurt down the road.
GraphQL is a fantastic tool to enable an API that maps extremely closely to the front end (check the spec, that is exactly what it was intended to do); instead building it so it exactly matches your DB schema is a huge mistake IMO.
Two points, if your DB schema doesn't model your domain correctly you're doing it wrong to start with. And two, views are typically used to decouple the raw database tables from whatever you want to expose to others, be those other developers or in this case external API users.
Modelling databases this way has fallen out of favour for various reasons, but it becomes extremely useful and relevant again with systems like Hasura.
You can't insert/update/delete data in views in all cases, and they can make debugging performance more challenging, but otherwise nothing. I think they are underused.
I rarely see people using views at all, let alone using them as a test aid (composing big queries from smaller more easily tested views), or using them to define interfaces to an internal data model.
If you really do that, people tend to accuse you of doing RPC and not building a "true" API, because you're creating a lot of special case functions for the remote caller's business case.
That is exactly what GraphQL is designed to do. Taken straight from the spec:
> Product‐centric: GraphQL is unapologetically driven by the requirements of views and the front‐end engineers that write them. GraphQL starts with their way of thinking and requirements and builds the language and runtime necessary to enable that.
The whole point of GraphQL is that since each client asks for exactly the fields they want, having those "special case functions" is not a problem because other clients don't need to ask for those fields.
I was speaking of higher-level abstractions than that.
For instance, do you have your API allow all possible arbitrary aggregations for your data or do you only special case the 3 or 4 that are pertinent to your business model?
Doing the former leads to basically reimplementing SQL in GraphQL. Doing the latter leads to functions like `resolve_month_over_month_profit_and_loss_segemented_by_sector`.
There can be a lot in-between here. You might end up with something like `Report(type: PROFIT_AND_LOSS, unit: MONTHLY, segment: SECTOR)`
Like every tool, there's a sweet spot where it really shines and others where it doesn't fit. If you need to have access to a bunch of disparate datasources it's really nice to have that layer of abstraction.
> Delegate authorization logic to the business logic layer
If you take security or performance debugging seriously, you should never expose database models through APIs directly in a production app.
To illustrate, say you have an Employee model:
query {
employee(userId=uuid) {
name
salary
}
}
Say you add some hacks on top of this to only allow users to query their own employee data, believing this provides adequate security.
The next day, someone creates a Manager object, a relation from employee to Manager, and Manager to employee.
Now, without having consider security for a second, you've granted all employees the ability to query each other:
query {
employee(userId=uuid) {
name
salary
manager {
employees {
name
salary
}
}
}
}
To say that these problems occur in the wild frequently is an understatement. Since these graphql frameworks also expose introspection capabilities, discovering these exploits can be automated using crawlers. If you write a bug like this, and you will, people will find it.
Please, please stop encouraging people to directly expose their databases through an API.
Hasura avoids this issue by having it’s own row level security model. In fact they go one step further in that for any subscriptions, they can run one query for all connected clients by building a temp table with a row per client using their session variables to join against. (They poll for subscription updates since it’s fast and scales well)
Row-level security is not at all dependent upon DB users/roles. Storing app user data in a table for lookup is sufficient. Storing the user info in a session config value as provided by JWT can be even better.
Basic database table replication will suffice for the former. No replication required for the latter.
I agree that this is indeed a problem, but your proposed cure is unnecessarily onerous. At least with Hasura's equivalent product, allowing queries on related objects is an opt-in process for the admin, after each related object is defined.
I think a better piece of advice is: Please stop allowing people to query relational models automatically, and surface a separate locked-down schema for the GraphQL user.
These GraphQL over DB tools have real value. I've moved from using OpenResty to Hasura in order to surface Postgres APIs, and the time saved has been significant.
Let's be clear: many folks today are directly exposing their databases through REST. The API protocol really doesn't matter. CRUD has no affinity for any one technology or methodology.
GraphQL is no more vulnerable to crawlers than any REST server with OpenAPI on it. And yes, introspection can be disabled.
It's not like tools like Hasura, Prisma, or Postgraphile have no security baked into their products, often via a cryptographically signed JWT.
Having a JWT or whatever authn is irrelevant. The problem is an insufficient authz model.
Exposing databases through APIs is not the problem. Exposing relations without authorization is.
Under these frameworks, you can add an innocuous relationship between two models that entirely compromises security without even touching the API code. Not only that, but the graph of relations and their associated ACLs is complex. Every time you add a relation, you need to create a graph of your data model and ensure that it's safe. There are far, far more surface area to make a critical error, allowing attackers to exfiltrate large volumes of your data.
I've written these bugs in similar modeled systems (there were GQL like systems before GQL). I've fixed these bugs. I've caught these bugs in code review.
These bugs are orders of magnitude less likely to happen with a simpler authz model where you don't need to lock down every relation, just the table itself. This is why the GraphQL creators themselves encourage users to put authz at the business layer.
2 out of three examples above literally have no framework authz support. Postgraphile requires setting up row level security policies, meaning you have no control over what layer of code you want authz policies to live; they must be in the database. Even if you are ok with that sacrifice, you still have to find ways to manage this in version control and test, for which there is scarce tooling.
Hasura seems to do the right thing here, provided you opt into it. It's not clear if it allows you to easily version control or test your ACLs.
> And that's all assuming the GraphQL server has a public IP, which is far from a certainty (just like REST).
Security doesn't stop at your VPN. At several hundred engineers, organizations begin implementing internal controls.
In fact, my example was an employee comp manager :|.
> Postgraphile requires setting up row level security policies, meaning you have no control over what layer of code you want authz policies to live; they must be in the database.
I might go ahead and argue that RLS in the database is exactly where security on the data model should go. Past a certain size, many databases have more than one app talking to it. This means N servers that must adhere to the same authorization rules. Hopefully at least they're written in the same language.
Not all restrictions are data-oriented though, and on that note, I agree that the restrictions should live above the database using some schema-stitching or federation.
Version control is a matter of saving the schema changes through migrations and optionally DDL audits through event triggers. I would argue that testing/tooling as you seem to be imagining it is a moot point though. I find it much easier to define table, column, and row restrictions at the DB level and let those constraints bubble up. If a table is unavailable to a set of roles/groups, better to say so in the database so that no query at any higher level could accidentally allow access when it shouldn't have. Row-level security policies don't care if someone is executing a simple `SELECT * FROM example` or a 100-line monster. That row in isolation or as part of a gargantuan query reading/writing through a view or three doesn't matter. The reason Postgraphile relies on PostgreSQL itself for authorization is precisely because PostgreSQL does such a good job of it.
You are correct that simple GUI tools are lacking in this regard, but I'm not sure the use cases I've come across for it fit a GUI tool. For a data platform in a large aerospace company the method of allowing/restricting access was dependent on tagging (both AND and OR). Combine it with some temporal query support, and the RLS policies were essentially code, not just declarative. Then the GUI was the web manager UI (custom). Worked quite well. And while that data platform had some bugs we had to work out as all software does, we never had a problem with the data-level security. We even found some bugs at the app and UI layers precisely because some developers overlooked some corner cases.
App-level integration tests appeared more than sufficient to flush out any problems (or verify correct behavior) in the data-layer security.
Data-level restrictions and query auditing (below the level of an app or server) become even MORE relevant as employee size grows, not less.
Been playing around with GraphQL in the last day or so and just can't see any reason to use it over REST (or REST + an ORM). Am I missing something? Is it just so people don't have to learn SQL?
Client-side shaping of queried data without directly writing SQL against a back-end database, and without having to pre-arrange queried data structure with the back-end.
Wouldn't SPARQL be a better choice for this? It's an actual Web standard, unlike GraphQL so a better fit to general-purpose clients with no "pre-arranged queried structure".
SPARQL is a flat result set, just like SQL. Also so much like SQL, it's not actually an advantage over SQL, just removed from a relational database.
GraphQL is hierarchical. The queries look substantially like the output JSON. SPARQL (and SQL) require training. With a common tool like GraphiQL or Playground, queries are simple enough and abstracted enough (am I querying Redis, PostgreSQL, S3, or some combination?) that clients don't have to know much of anything except what data they want. That's huge.
It puts the onus on server developers to audit their data and security models, but they should be doing that anyway no matter what the stack looks like.
I love SQL. I'll write CTEs and window functions until the cows come home with a smile on my face. But not everyone needs to know SQL. There are many other worthy pursuits.
REST suffers from the N+1 problem and overfetching. Most ORMs do as well. "Oh! You want order info AND product reviews? (At least) 2 more REST calls for you. Oh! You don't need the person's shipping and billing addresses in this case? So sorry, take them anyway, otherwise we would have to make another REST endpoint just for you."
Also affects versioning. In REST you commonly see URLS like "/v1/foo/54321" because if you try to remove or rename ANY PART of the JSON payload coming back, you break a contract. With GraphQL, you make the new name available and mark the old name with a directive like @deprecated. Over time you watch the logs to see if any clients are still using the old name. If so, keep it around and write another blog post asking folks not to use it anymore. If not, just remove it. You're done. No broken contracts. No differently versioned URLs.
GraphQL handles N different clients with a single schema. Mobile app clients, desktop clients, B2B clients, et al get exactly the data they need—no more and no less. REST either shoehorns them all into one or two access models or explodes into an unmaintainable mess of hundreds or thousands of often redundant endpoints.
The biggest difference, the one that GraphQL was really originally built to solve at Facebook, was to allow the CLIENT to determine what data they needed at any given time rather than REST which may be driven by client requests but ultimately is what servers THINK clients will need, and is much harder to change as needs change (and they will change).
I used it once upon a time to essentially allow joins across 3 separate databases that had all been developed/grown independently. Once you have certain kinds of resolvers between types, all sorts of neat stuff is possible. Ultimately, once we had time to update the applications, we merged the databases and dropped the GraphQL app entirely.
EDIT: It was awesome, and I kind of miss it, but it was too much to maintain within our constraints.
Looks like early day project. Had this exact project in mind when working with Postgraphile. JVM tooling around databases are amazing. Using clojure on top of that foundation with honey-sql was something I wish existed. Good luck! I might look into helping build it someday.
I have been planning a graphql adapter for presto, so we can query graphql apis as though they were sql tables.
I guess it is inevitable that somehow some sql database will be exposed via graphql, and someone will use something like presto to make it appear to be an SQL database again...
If all it provided was an API server then it wouldn't really matter. But this is pitched one level down ie: you can integrate this into your existing app as a library and expose it. I don't know what the utility of that is, but I'm kind of assuming it might mean you can add in interceptors and business logic to customise the API behavior which I think is pretty useful.
So the fact that it lets you do this with any JVM language is actually a key property that gives it very wide application.
I understand this stuff is "cool" but this is only one step removed from db->query(input[sql]). I'm not talking about SQL injection either, I'm talking about wholesale data exfiltration of your entire database because you've auto-generated some slick GraphQL API and you don't take the time to think through what should, and should not, be exposed to end users or bother to implement any form of access control.
Security is done via row level access control, a system that is vastly better then the ad-hoc systems people reinvent outside the database for every application.
You also absolutely take the time to think through what should and should not be exposed to end users. Either you creates a namespace for your API with views that expose only the necessary data, or with something like Postgraphile you can specify which functionality, columns, and tables are hidden. Personally I prefer the view approach as it decouples your API from the database implementation.
RLS is one of the shittiest things about Postgraphile and why I decided against for one project and am doing my best to root it out of a current project.
RLS is difficult to unit test, to configure with flags, not portable to secondary stores, and requires knowledge of arcane database features.
When you put some private data on a cache somewhere and query it, now you need to write your access layer twice since the other one is so tightly coupled to the database.
We'll have to agree to disagree. Writing a manual security layer that doesn't understand your data model is one of the worst things you can do. Multiple databases? Now you're also manually doing integrity and probably manually doing joins. That's a ton of extra things to go wrong.
As an example, consider a query that is fairly complex. How exactly are you going to make resuable security tests for it? How do you know that an error in the query won't expose unexpected data, or that a modification to the query that introduces and error exposes unexpected data? Your security now has to be customized to every query like this.
The alternative case is that you only use super simple queries so that you can write resuable security checks. But now I'm going to ask why you are using a relational database in the first place? Because almost certainly you're doing manual joins in this scenario and incurring huge performance hits.
If you are using a relational database to its full extent, there is nothing as secure as built in row level security. You wouldn't dream of doing integrity checks outside the database, why do security?
> If you are using a relational database to its full extent, there is nothing as secure as built in row level security. You wouldn't dream of doing integrity checks outside the database, why do security?
This isn’t exactly PostgreSQL’s fault, but mapping application users to database roles is not as easy in pretty much any framework I’ve worked compared to implementing basic post-retrieval filters.
I absolutely think bringing security into the database is good, but it’s also not straightforward to integrate in even assuming you know how to define the necessary database elements.
If you're trying to map app users to DB users for more than 1,000 users, you're doing it wrong. Roles at that scale tend to be more vague: admin, hr, analysis, etc. Users (and tags) go in a table. Then row-level security authorizes through the user table for individual queries.
Row-level security is absolutely not dependent upon DB roles. Table-level security on the other hand is sufficiently coarse-grained that mapping GRANT/REVOKE to applied roles should be feasible.
If you're punting all of this security to the app layer, that's your prerogative, but don't pretend that it's somehow more straightforward or secure. If it seems simpler at the app layer, you may very well be missing something.
I’m talking more along the lines of “hr user logs in, now you need to SET ROLE for the request and make absolutely sure RESET ROLE is called when the connection goes back to the pool”. A lot of frameworks make this harder than it needs to be.
I think you make some good points here, and RLS is an ideal data security model in many respects.
The parent's point is that tooling around this is weak, which undermines its value. Not being able to review and easily test authz changes will almost certainly lead to issues.
It's yet another attempt to step back to two-tier. (It's the usual thesis-antithesis cycle. First, we had that, then we figured out it doesn't work so we turned towards three-tier then we saw that was pretty tedious so we started to look at two-tier again. But this time in a disguise.)
Not only that, it only lets you query your "root" type by primary key? So it's less functional than the normal interface to the database. I'm having a really hard time understanding the point of this.
That's what I find the most attractive about GraphQL. Large companies and companies that have grown fast often have all kinds of APIs made at different time by different people with different standards. GraphQL lets you paper over that without breaking existing, revenue making code.