I started out thinking that SQL for API queries was genuinely a terrible idea. Datasette supported it because it's designed as an ad-hoc querying tool (which happens to be able to export resuls as JSON), but obviously that was an awful idea for production applications.
I keep on waiting for the obvious downsides to show up, and they keep stubbornly refusing to appear. And meanwhile the rest of the software world is going all-in on GraphQL which seems MUCH harder to work with - and less performant.
I've tried this a couple times in the past. I think the biggest issue I've ever run into is from database modifications. In those cases, instead of merely needing to update my queries in my api server, then I needed to update my client code, and this gets much harder with multiple clients
Of course, this can also be the case when changing the API, but deprecated fields can be faked in that portion of the application to allow existing applications to continue to work.
I think it comes down to where you're managing your database interactions. I don't care about most features in an orm, so I maintain my own that's barely anything more than a SQL generator that won't break every time I change the database.
Overall, I think SQL on the client is excellent - even ideal. And as a matter of fact, I try to develop apis that simply generate a user-specific excerpt of our database, where SQL would be perfect for querying the data locally on the client. Last year I tried running sqlite in the browser for this exact reason, but it was far too heavy.
Anyway, back to the issue - the idea of having to immediately adjust all our client software every time I make a database adjustment is a bit anxiety-inducing. Especially when building fast In a drive toward market fit .
That said, with some great logging that tracks the actual queries and where they come from, it probably wouldn't be so bad. Especially if there's a means to maintain backwards compatibility.
What if the tables your API queried were actually views? You could do the same faking of data during migrations of the underlying data (structure) and the client should not be effected.
What if the "views" your API queries didn't have to be tables? You could just keep "tables" as a backend implementation detail, and 'fake' a different model on top of them that's closer to what the client needs.
I think what's becoming clear in this discussion is that there are some developers (myself definitely included) who would like to use SQL as an interface to plain programming structures outside of actual databases.
I understand postgres provides something like this, but it would be interesting as a library rather than a separate application. I'd love to be able to query hashmaps or arrays of data using SQL.
Maybe this already exists somewhere? My web-search-fu is failing me but I recall reading somewhere that facebook had something like this for querying systems, so maybe such a library exists extracted from that usecase?
> I think what's becoming clear in this discussion is that there are some developers (myself definitely included) who would like to use SQL as an interface to plain programming structures outside of actual databases.
Myself included too. As a side project, I'm currently writing a game (roguelike), where all the game state that would ordinarily be put into an ECS or an object tree is instead stored in in-memory SQLite instance; the game's state management consists of (currently hand-written) SQL queries.
The reason for that is because, in trying to implement yet another version of an ECS-like architecture, I realize I'm essentially reinventing database queries and database indices, so I might as well use the real thing.
For Postgres you mean foreign data wrappers ? Or storing plain (mmappable?) binary data in bytea and creating a pg extension to serialize/deserialize? I'm working on both...
So even with tightly controlled accesses, one ill intentioned client might be able to produce a statement that would take a lot of resources. Even with a tight timeout the database might not be able to handle a sustained amount of computationally hard requests. I think SQL makes the attack surface much broader than a regular REST client. This wouldn't be a problem when the machines facing the client are stateless and able to check the request before accessing the data, whereas databases are traditionally single points of failures and hard(er) to scale.
Now, I'm comparing it with a REST client because I don't know graphQL enough, but it might be possible to do the same attack on a graphQL server?
I still chuckled to the idea, which I think was the primary intention so that was worth it anyway.
The tagline of Datasette is "A tool for exploring and publishing data", to me it makes a lot of sense that SQL (with appropriate restrictions) is a suitable API for the kinds of things you want to do with datasets. This is not really the typical web API usecase, you normally have much more specialized and restricted use scenarios.
I feel like this thread is tongue-in-cheek to make people realize that "run any query on the client side" is a terrible idea, and that's exactly what most GraphQL installations offer.
As a penetration tester though I like GraphQL, it's a very handy data exfiltration engine. By default, there's no authorization, no authentication, and it even comes with introspection. One query and I get your entire schema! This trendy movement will be far worse for data security than SQL injection, unsecured S3 buckets, or misconfigured NoSQL instances.
> As a penetration tester though I like GraphQL, it's a very handy data exfiltration engine. By default, there's no authorization, no authentication, and it even comes with introspection. One query and I get your entire schema! This trendy movement will be far worse for data security than SQL injection, unsecured S3 buckets, or misconfigured NoSQL instances.
All of that has nothing to do with GraphQL though, it's just poor security. You'd have the exact same issue with classic REST, SOAP, or whatever protocol one is using. I imagine the issue is that people are using "convenience" frameworks on top of their GraphQL libraries that do the glue between their databases and GraphQL and these frameworks don't have security as a priority.
Development-wise I'd argue that the less endpoints, the more secure, but it doesn't mean an app should have the exact same GraphQL endpoint for both its admin and its user facing data.
The principle of the least privilege will always apply along with a rigorous security audit, code audit and third party lib audit.
It has a lot to do with GraphQL the ecosystem, though, doesn't it?
SQL injection is a particularly bad problem because concatenating strings is the easy path to getting queries working, and parameterization takes (a tiny bit) more effort.
Similarly, a system such that "by default, there's no authorization, no authentication, and it even comes with introspection" is a system that makes it very easy to do the wrong thing. And the default settings of GraphQL generators tend to be even worse than that.
> SQL injection is a particularly bad problem because concatenating strings is the easy path to getting queries working, and parameterization takes (a tiny bit) more effort.
Also there was a rocky period in the 90s where support for parameters was shaky or slower, which meant entire communities (PHP, ASP, even Java) established a fair amount of shared practice not doing that. Like PHP’s decision to ignore errors and register variables globally, it took many years to get that fixed in tutorials, open source code, bad API design, etc.
Yeah my first encounter with GraphQL was a client going "let's save development time by using graphql!" so I looked into it and it's more fucking work unless your stack has one of those auto-GraphQL-to-and-from-DB generators (they'd made a poor and somewhat obscure choice of DB for the project, over our advice, so there wasn't such a thing in this case anyway) and it happens to suit your needs very well—and even with the benefit of one of those I'd still lose sleep over security or DoS bugs lurking in that 3rd party code. Or in mine.
The cases where GraphQL make sense are the ones where the front end is making tons of intricate, complicated calls. If it’s a simple REST api you’re interacting with it probably doesn’t need GraphQL and might actually be harmed by it.
GraphQL is only interesting for flexibility to fast prototyping/changes on the client side, to me. And it should stay there. Meaning your client JS should have a data layer that can understand GraphQL and translate that to a REST API call and cache the result.
The only extra work compared to REST is writing your query and mutations input and output types. But since any serious REST API nowadays includes swagger, it's actually even if not worse on REST.
Then it's all the same as of how much work. In the server, each REST endpoint is a gql resolver.
As of auto-generation regards: yes, gql maps great to a relational model, so you can save a lot of effort going that route. But you know, there's auto-generation for REST setups too.
- Swagger is not the only API Spec framework. I think rather than serious REST APIs having Swagger, good APIs have (auto-generated) client libs.
- in REST, you have no default way of querying data related to the current path. Of course, you can use frameworks for that (filtering, hydrating references, etc.). In GraphQL, you have entry queries and the rest is graph traversal, and that is basically the default (and of course, it does come with its disadvantages)
- on auto-generated schema: the interesting thing here is not whether you have auto-generated schemas etc. at all, but how well the data relations can be surfaced, explored and accessed on the API level. GraphQL enabled this in a different way compared to REST, and I would dare say it is a more delightful way.
As a client, given a Swagger API Documentation and a GraphQL Schema, I would prefer exploring the GraphQL Schema, just to find out what the API is about.
I agree with this. Working with protobuf and generating the TS/PB files for the client/browser on a Go project felt like this. With some better tooling around that I think it would be a really slick development environment.
While I have also seen many poorly secured GraphQL API:s, this seems like an unfair criticism. To me GraphQL is mostly an alternative to REST, which also has no authorization or authentication by default, that is a orthogonal concern, but you can hardly blame REST for that.
I'm building a service for finding a gym buddy (on hold now) and leveraging exposed endpoints has been great for getting gym locations for various chains. Much better than screen scraping.
Introspection and query building make client apis easy and quick to develop. In production deployments, you can limit introspection and persisted queries are used to ensure known query loads and characteristics. This lets you iterate a lot faster on getting the right set of apis vs trying to do this type of evolution with rest endpoints.
Part of the point is also to weave a cohesive tree of information from possibly diverse data sources, and it’s incredibly good for that.
Actually GraphQL was the thing that made me reconsider if running queries on the client-side was a bad idea or not, which was one of the reasons I started seriously experimenting with client-side SQL.
I absolutely agree that you gotta be extremely careful about security though. And in my experience on large enough teams someone's going to mess that up at some point, so that's a big risk in any client-side querying mechanism.
At least AWS Amplify/AppSync offers authorization schemes down to fields. I had the impression that was the norm, but good to know. I will pitch that in the future.
"Run any query on the client side" with no authorization, authentication or introspection is exactly what SPARQL public endpoints offer. It's a feature, not a bug.
So swagger is bad because "one request and I get your entire schema"?
> run any query on the client side" is a terrible idea
that's a big mischaracterization. A typed query language that lets you tailor your queries a la carte doesn't remotely mean the queries "run on the client side".
No, because in Swagger the first step is carefully designing exactly what the inputs and outputs are of each and every API call. Most GraphQL servers you simply point it at a database and it automatically makes your entire schema, including tables and fields that shouldn't be exposed to end users, readily available.
> doesn't remotely mean the queries "run on the client side".
Sorry, that was poorly written. Obviously the queries are not running on the client side. My point was that the client can provide any query they want and the system will process it.
Most GraphQL servers you simply point it at a database and it automatically makes your entire schema, including tables and fields that shouldn't be exposed to end users, readily available.
You're arguing against auto-generated gql setups, not gql. And it's not even accurate. Anyone that makes that mistake in gql would make it in REST, SOAP or what have you.
Auto-generated REST setups are aplenty too, and lately auto-gen tools offer both gql + rest. PostgREST/PostGraphile. Hasura does both too.
> it even comes with introspection. One query and I get your entire schema! This trendy movement will be far worse for data security than SQL injection
Isn't this also true of SQL injection? Sure, you might need to run multiple queries to retrieve multiple schemas. But SQL injection is just a way to run arbitrary SQL commands, and you can do a lot with those.
You can avoid this by allowing only specific queries in staging and production environments, this way, the introspection query can be disabled too.
Not sure if this violates the GraphQL spec though, because, ideally the server supports introspection, but it is disabled in some cases for safety reasons.
But if your limiting to only a whitelist of queries what's the difference vs a standard API? You might as well then just have a REST endpoint with the query defined on the server if your only allowing certain queries. Get your Javascript dev's to write a Node service or equivalent with the query logic inside it; that way the query logic doesn't need to be replicated per client.
This is where GraphQL shines. You give your developer a schema and "they" choose the queries and mutations they want to make to make their life easy. Also, there's very less friction because the GraphQL schema can be completely documented.
And while going to prod, you just allow the queries that are being used, disable rest.
Yep, you definitely need to implement user authz for _every_ resolver indepedently and safelist/allowlist only those operations you'll use. You probably had to disable introspection in production too.
Out of curiosity, could you name the third party security firm?
I didn't work on the security audit implementation though. I actually forgot what security firm we used, but it was local in our area.
They provided a document detailing all the security exploits they found though.
I don't recall exactly how this is done in graphQL, but i believe we used the context object and made a request to our database to find the users role. GraphQL endpoints have 4 arguments, the 4th one specifies the datagraph payload coming in. I think we blacklisted everything and whitelisted them depending on what the user requested and their corresponding role.
I responded in a twitter thread[1], unrolled here here:
As all good things in life, and programming, this is a tradeoff. GraphQL is better when what you are requesting is best expressed as a tree (or a "graph", though only the DAG variety). This is not always the case, but it very often is when building API:s for production use.
Of course, you can express tree structures in table form, but it is not very convenient for clients to consume. In particular if your client is rendering nested component views, what you want is very often something hierarchical.
Another aspect of GraphQL that is better for us production people is that the performance is more predictable, exactly because the language is more restricted. You can't just join in all the things, or select billions of rows by accident. The schema dictates what is allowed.
Of course, again, it is possible to restrict this in SQL, just configure your schemas, limits etc appropriately, but SQL is anything-goes by default, whereas GraphQL is nothing is allowed by default. Whitelist vs Blacklist.
This said, as a language, SQL is clearly superior. It is the most (only?) successful 4GL (declarative) language. I wish more languages were this well-designed, and that there would be more language innovation in this direction.
The way I see it, GraphQL is a DSL for flexibly requesting hierarchical data from API:s in JSON format, optimized for complex evolving API:s. SQL is a full-fledged generic language for relational data transformation. They have different niches, but SQL has a much bigger one.
> a DSL for flexibly requesting hierarchical data from API:s in JSON format, optimized for complex evolving API:s
Perhaps the same goal could be achieved with more flexibility by issuing CONSTRUCT queries to a SPARQL endpoint (CONSTRUCT requests a custom RDF graph as opposed to a set of variable bindings ala SELECT) and obtaining results in JSON-LD format.
Exactly. I was always interested in all this semantic web stuff and as I have time now, I'm experimenting with Apache Jena. It's super neat! SPARQL endpoints use HTTP per default. "API for free" so to say. Plus you can even use a binary format for receiving the data. Querying the database is three lines of code in Kotlin. Fuseki (the SPARQL Server in Jena) also supports crazy stuff like "recursively traverse this graph by following the specified type of link until you can't go any further". I'm pretty sure this is not possible in GraphQL.
> You can't just join in all the things, or select billions of rows by accident.
How does GraphQL prevent you from selecting billions of things? I'm not too familiar, but if the answer is something like "you can limit to N results", SQL does have a LIMIT clause, so I'm unsure what is being meaningfully distinguished.
A good implementation will calculate the estimated "cost" of a given query before executing it. This is quite tricky to do well, and I doubt many people do it. Once you know the cost of a given query, it's trivial to reject it if it's too expensive.
A much better alternative is to only allow specific queries in production. These persisted queries can then be assigned an ID, and making a request is a matter of sending the ID with any variables.
It's very different. The main difference is that you define the capabilities on the back end through your GraphQL schema once, then clients generate the query ids through an automated process in development.
In the old way, someone would have to write backend code to build and maintain each individual endpoint. With GraphQL, you only maintain the schema on the backend.
The author makes a case for just sending sql queries from the frontend. For public APIs you can probably make a case. But for a normal application it doesn't make much sense. You only have a fixed amount of queries to run. So why would you implement the hassle and huge exploit surface of validating queries from the frontend. So once you realize this, you'll just give every query some identifier and send the identifier + some parameters instead of the query. Et voila, you have created RPC
Because the "frontend" might be a native application that can't be updated on the fly and would break when the query changes out from under it.
Or maybe it's a web app anyway but people have the web app open while you're deploying the new version, and their old instance of the web app starts getting responses back it doesn't understand.
You'd have to do something like change the identifier (or add a version number to it) every time you modify the query, and retain all of the old query versions as well.
The argument the author is making is that you expose a carefully designed read-only view, not your whole database, so it doesn't really matter what queries the frontend sends as long as they stay within your resource quota. No validation necessary. But of course as with the identifier approach, you can't remove or change columns in your view if you have any clients querying it that aren't guaranteed to receive live updates to their code. But in a sense that's just like GraphQL, where you have to be careful to keep all of your changes backwards-compatible if you don't want to break older clients.
> Because the "frontend" might be a native application that can't be updated on the fly and would break when the query changes out from under it.
From my experience that would be an extremely naive decision to make: change the underlying behavior of an API endpoint with disregard for its impact on client apps.
Backwards compatibility and API versioning strategies can easily handle this kind of situation.
> From my experience that would be an extremely naive decision to make: change the underlying behavior of an API endpoint with disregard for its impact on client apps.
Which is precisely one of the reasons GraphQL exists, so you can change the data queryable in your API without changing the data existing clients receive. If we're talking about moving from GraphQL to SQL then we shouldn't be throwing away this capability.
What's more, if we just offer a set of pre-baked queries, there's no way to measure how many clients are actually using a given column in the query. If we're the developer of both the frontend and backend then we can try to keep track of it internally, and rev the identifiers with each change so we can determine when a given query is no longer being used by any clients. But this requires a lot of bookkeeping internally (e.g. your mobile team bothers to let you know that they're no longer using a given column) and doesn't work at all if you're vending this API to third parties.
If you vend a view, then you could track which columns of your view are actually being used by queries, and once a given column is no longer being used by anyone you can remove it (e.g. because you added a new column that obsoletes it). Though there's no formal way to deprecate the column in order to move people off of it, so that's a step backwards from GraphQL.
> Which is precisely one of the reasons GraphQL exists, so you can change the data queryable in your API without changing the data existing clients receive. If we're talking about moving from GraphQL to SQL then we shouldn't be throwing away this capability.
If you many different clients consuming the API then yes it makes sense.
But if you have only one client (usually a modern frontend or a mobile app) and you can deploy client+backend at the same time, than you don't need that anymore.
You can't deploy a mobile app at the same time as your backend. Even if publishing a mobile app update was instantaneous, it takes time for people to update, and some people never will.
> Backwards compatibility and API versioning strategies can easily handle this kind of situation.
From experience using Wildcard API (a super easy RPC implementation [1]), you usually don't need to do API versioning at all. Instead you always deploy frontend and backend at the same time.
If you have more than one client consuming the API this can get tricky though and then API versioning is necessary.
Furthermore, your "view models", whether their are displayed as json payload or transformed through HTML templating server-side are not and should not be your "database models". This is to me where just using SQL from the front-end doesn't really make sense.
There's a difference if the client developers can register stored procedures. Typically, you can't do that with RPC, and it's a big value-add of GraphQL.
If that happens, then SQL and GraphQL look a lot alike. SQL would be more powerful, which also means that it'd probably be somewhat harder to implement, though there are so many SQL engines existing. I can't imagine people implementing from scratch.
I'm surprised that nobody has mentioned that this is exactly how Facebook's API used to work, before the era of the "Graph API". Your FQL queries ran against what appeared to be some sort of stripped-down MySQL instance.
I used it. It was okay. I tended to prefer the boring REST api because it was simpler.
From an API provider perspective, I much prefer REST or RPC apis just because it's far easier to get the security and performance implications right. "Less powerful" is a feature.
I've spent the last couple of years building a graphql api around my employer's platform.
When it comes to graphql mutations vs. sql writes, the things that can go wrong are ugly and hard to avoid on both.
But for read queries the graphql api is a lot safer. I can think of a few read queries I could make that could make the db server fall over, but those are bugs I'm working on fixing. But with sql queries it's much easier to crush the server, and I've done it way too often just by accident. I would much rather allow Dave in accounting have access to graphql over sql read statements. There's just a lot less damage that he can do.
And the fact is that Dave in accounting has been using the graphql api via GraphiQL, getting great results, and taking a load off of the developers. It's much easier for him to learn than sql. The whole account management staff is gradually climbing on board. They can do things in minutes that would previously have been projects that crossed departments and took weeks. Some clients are using it directly too.
We couldn't have seriously considered opening up sql queries like that, even to replica databases.
Some graphql queries don't even hit the database. For example, it might reach out to a vendor endpoint to fetch data from them.
Some of our graphql mutations touch many tables, and do things besides update tables, like refreshing cache. Having used SQL for 20+ years, graphql sure feels like a higher level abstraction, especially when you want JSON hierarchical data from several tables.
There are many utilities which use SQL for querying logs, filesystems, processes ...
GraphQL could have based its language on SQL, but hierarchical queries are clumsy. For example our golang datamapper query
sql, args := JSQL("SELECT b, c").
Many("f", `SELECT g, h FROM f WHERE id= $1`, 4).
Many("x", `SELECT id, y, z FROM x`).
From("a").
Where("id =$1", 4)
ToSQL()
converts to this SQL which returns JSON
expected := `
SELECT row_to_json(dat__item.*)
FROM (
SELECT
b,
c,
(SELECT array_agg(dat__f.*) FROM (SELECT g,h FROM f WHERE id=$1) AS dat__f) AS "f",
(SELECT array_agg(dat__x.*) FROM (SELECT id,y,z FROM x) AS dat__x) AS "x"
FROM a
WHERE d=$2
) as dat__item
`
The equivalent GraphQL query is certainly more elegant.
> Some graphql queries don't even hit the database.
Except in the sense that the SQL engine is itself part of a database server, that's easily true of SQL queries in an engine that employs FDWs (or, potentially, even store procs, depending on what I/O facilities are available to them), which can also reach to a vendor endpoint and fetch data transparently to the client.
SQL is designed for rows of flat data. GraphQL is designed for nested data. It's two very different use cases. Trying to say one is better than the other completely misses the point.
That is no longer exclusively the case. SQL is used in plenty of places to query nested and semi-structured data. In fact I use it on a daily basis. Eg: avro data in/from Kafka, "json" data from MongoDB.
Do you think any GraphQL api backed by a row based data store is a mistake and should be some kind of SQL api instead? The data could be represented either way so I don't really think it matters.
No. You miss the point. SQL has proven robust over a broad set of types of data. People have tried to come up with new systems for objects (OODB), JSON documents (Mongo), and so on. All were disasters. It turned out SQL was the answer all along.
It turns out a generic tool which works well over many use-cases beats a tool which only works in one use-cases. SQL seems to be that tool.
It's possible to do nested data with SQL too.
I'll believe GraphQL is better when I see a case when it works better. So far, the only cases I've seen were from people who didn't fully grok SQL.
I'll mention GraphQL has a completely separate use-case as a replacement for RESTful APIs, where it's not really acting so much as a query language as an API layer. I think it might have some uses there, but as nedbat points out in the thread, it'd be possible to use SQL as a replacement for RESTful APIs too, and it'd probably do better than GraphQL (before you start screaming about security, read netbat's posts; he is definitely NOT talking about a thin layer over a database).
> No. You miss the point. SQL has proven robust over a broad set of types of data. People have tried to come up with new systems for objects (OODB), JSON documents (Mongo), and so on. All were disasters. It turned out SQL was the answer all along.
Document DBs have failed in the general case because most data is relational. OODB is still alive and well; Postgres implements some OODB features, and I would argue graph databases are an implementation of an OODB.
I also think you're lumping SQL and RDBMS together into one term. It's entirely possible to have an RDBMS that does not use SQL; likewise it's possible to have a system that is not an RDBMS that does use SQL. There are addons for MongoDB that let you speak SQL to it.
> I'll believe GraphQL is better when I see a case when it works better. So far, the only cases I've seen were from people who didn't fully grok SQL.
Most of the advantages come from introducing an intermediary data broker (the GraphQL server). As a query language, I do find GraphQL drastically simpler than SQL. We can compare queries for a person, their parents, and their children. I can guarantee the GraphQL query is shorter and more legible. That comes at the cost of flexibility, which I wholly acknowledge.
> I'll mention GraphQL has a completely separate use-case as a replacement for RESTful APIs, where it's not really acting so much as a query language as an API layer. I think it might have some uses there, but as nedbat points out in the thread, it'd be possible to use SQL as a replacement for RESTful APIs too, and it'd probably do better than GraphQL (before you start screaming about security, read netbat's posts; he is definitely NOT talking about a thin layer over a database).
This is the only use case that I'm aware of. There is dgraph, which is a graph database that speaks in GraphQL natively, but they've basically just integrated the API server into the database itself (which I'm not arguing for or against, I just think it's still basically the same stack).
I don't find GraphQL to be easier to learn than SQL. But SQL is a much more helpful and generalizable skill.
That's the second-system syndrome. Once you tack all the stuff on needed for something "simple" to be useful, it gets as complex as the original. Only all the "SQL replacements" do less, and do it less well.
If you want simple, RESTful APIs are a way to go.
(And no, your average analyst doesn't know GraphQL either -- and if you want to go down that learning curve, might as well learn how to do SQL properly in these contexts instead)
I think you'll find that SQL is one of the most widely understood languages. Current programmer education deemphasizes it, but in basically any analytics space, SQL is the lingua franca.
"Understood" is a strong word. There are lots of people with a basic understanding of SQL but who can't write complex queries to do things like represent tree structures and cyclic relationships. I don't think your average analyst can do that.
And even if all the analysts in the world could do such a thing, that doesn't matter if they're not building the applications.
I've written difficult and complex SQL, over a 20+ year period, along with taking a deep dive into relational theory via Chris Date. I'd say I fully grok SQL; well I'm still learning as are the majority of us who are not Chris Date. It's a vast and comolex topic, enough that there are experts specific to it for an entire career. Graphql is great because it's far simpler. I learned how to use it on the front-end in a matter of hours. And learned how to implement it on the backend in a few weeks. It's a different level of abstraction, and shields our front-end devs from many underlying complexities, including but not just the SQL.
I do see your point that for many applications you want the result in a tree, not in a list of rows. I think there should be a relational query language that lets you map from one schema to another schema (or potentially a subset of an existing schema), allowing the result to be another graph (or database in the parlance of RMDBs)
UIs are built in terms of trees so it's easier to map a tree of result data into UI components than a flat sql result set. The query is also much more clear and concise using a query language closer to pattern matching / query by example rather than a complex SQL join.
I've been playing with some code to generate SQL from a simple JSON query by example format and the resulting SQL query is 7x as long and would be next to impossible to write and compose by hand.
This is one of the most convincing arguments I've seen for GraphQL as a language for supporting UIs: that UIs map to trees, and trees aren't nearly as clean in SQL than they are in GraphQL.
It looks to me that people that disregard graphql or frontend frameworks think of frontend as a place where data is just displayed not as a place where complex user interaction happen
Take the Web's DOM as a prime example, but I think all major graphical user interfaces are primarily organized as a tree of nested components.
> How so more than the RESTful JSON responses traditionally used?
For generating a flat JSON object with scalar values there's a natural mapping from the single row tabular result set to a JSON object. Add an array property from a one-to-many join and you start to notice the impedance mismatch. Do you live with duplicating the scalar properties in every row of the result set or make multiple queries? The more deeply nested the JSON result the worse the problem becomes.
> We shouldn't use SQL at all?
SQL is fine for producing tabular results but its ubiquity means we often end up using it in situations where it's an awkward fit. Relational database are very robust and suitable for many data models so it's often a reasonable tradeoff to make.
Unless this has changed (I haven't looked at GraphQL in a while), the killer SQL feature is a formal filtering syntax (the WHERE clause). GraphQL only deals with projection (the SELECT clause). Of course you can add your own home-grown thing; if you do that, why use anything standardized (e.g. GraphQL) to begin with?
this is a ridiculous position. nothing substantive in the thread aside from "it's semi-possible to hack up a horrible subset of sql into some sort of api query language" -- with absolutely no reason why you'd want to do this ... (aside from maybe "familiarity" as an implicit reason)
if you want to directly expose your database -- you can use an abstraction that publishes a graphql api from your database schema. this is also suboptimal for a lot of reasons but it's very much a viable way to achieve the "same thing" as exposing a client side "sql as query language" interface would ...
Personally I see GraphQL mostly as a replacement for HATEOAS and all that stuff. It has a spec, so there is absolutely no debate as to how to do things, whether one should use PUT/POST or PATCH or whatever, where to put hyperlinks, what format to use, all of that is a waste of time and I hate being told that I'm doing REST wrong. There is no "doing GraphQL wrong", or your GraphQL parser is broken.
It doesn't solve anything else. Sure, one could use SQL instead, why not?
I tried going through graphql but found it quite hard to grok. Plus that graphql is different language, that needs more plugins to then go into sql. It's really hard to beat the expressivity of SQL.
I do hate writing SQL though, you miss a comma and it barfs at you. The error messages aren't very helpful either.
I'm working on a web mysql admin called BoomAdmin (like phpMyAdmin but invented for 21st century). https://github.com/nojvek/boomadmin (open source)
The core difference is there is a SQL block builder UI. It's 10X faster than writing a SQL by hand since you're mostly just clicking buttons and selecting fields. The front-end sends a json AST to /api/select which then compiles it to raw SQL. It's a really nice experience.
The vision is to build really nice editing, query & visualizing experience for SQL-ish backends. There are now nosql databases like couchbase that support SQL via https://www.couchbase.com/products/n1ql
I basically use JSON based RPC for web type client and binary based RPC for native clients in my servers.
My servers mostly expose higher lever functionality (for example calculateSalesDistribution with parameters and return structure). I see no use in mental contortions required to map it to anything resembling GraphQL or SQL or WhateverQL. Works for more standard business like APIs just as well.
What tools have people used to build servers that speak SQL? Where SQL becomes an API protocol like GraphQL, whether it talks to a database or not doesn't matter.
I worked for a company (startup) that built a very unique engine. It was SQL on top of anything - literally.
It fell into the domain of "data federation" or "data virtualization" - something that is quite overlooked by the new generation of tools such as GraphQL - but they are starting to realize that federation/virtualization of data is exactly what these tools provide. One need look no further than Apollo GraphQL to find language showing they are starting to understand it better.
Anyway, Calcite is great for building a SQL engine (parser, lexer, planner, optimizer) because it's a framework independent of any backend system and storage engine.
That said, other people have taken the approach of using any open source SQL database, including Postgres, Apache Derby, SQLite - and basically tearing out the SQL engine, wire protocols, JDBC drivers, and removing the storage engine piece and making that pluggable.
As some others in the post have mentioned, SQL/MED is a feature built in to a number of databases, but if you really wanted just the SQL engine without the rest of the DBMS, the approach above is the one I have seen - deconstruct the database into just the SQL engine and just use that.
These days you can also try to use things like Apache Spark as a programming framework to build such an abstraction as well (with dataframes and SQL over those).
We instinctively think this is a bad idea, because we are trained to have a single big databases, source of all truth that must be guarded and accessed only by vetted and secure backend application.
What if this wasn't true? What if databases were cheap to create and modify? Could you have a database for every user of your application? How that would change the architecture?
Overall I am trying to answer those questions with Https://simplesql.redbeardlab.com
An HTTP API that allow to manage SQLite databases. Rhoda databases are very cheap to create, use and modify.
The problem is that most mix "SQL" the QUERY language, and the RDBMS on top of it. So the problem is how "constrain" the SQL as query language so it not allow to DROP tables.
So, most app will benefit from:
CLIENT(SQL) -> RDBMS(SQL)
but, as normal is instead:
RDBMS(SQL)
and that is the problem.
So, in typical fashion in our industry, we build a impedance mismatched solution!
CLIENT(GRAPHQL) -> CLIENT(SQL) -> RDBMS(SQL)
yep, more logical.
P.D: And seriously, the SQL guys also are part to blame. Somehow, along the line, the FACT sql is a query lang made for humans/developers have been ignored, and can't make a better version of it. For example, SQL can't be composed except with string mangling.
Reading the original notes for SQL, I remember reading it was designed as a human readable query lang. I think you have to remember the context of the time.
At the time (in the 60's), the alternative to SQL was writing your own imperative implementation of a query through a hierarchical database. Kinda like if you had to implement traversal through an XML document every time you had a new query. In some ways, that's what we're doing nowadays writing resolvers with graphql servers.
Composition was bolted on later as an afterthought. The requirements we have for a query language is different with a different environment now.
endpoints.deletePost = async function(postId){
// Only admins are allowed to remove a post
if( !this.user.isAdmin ) return {permissionDenied: true};
await db.query("DELETE FROM posts WHERE id = :postId;", {postId});
return {deleteSuccess: true};
};
If someone made tools that were simple, flexible, and easy to use for exposing a SQL wrapper around your underlying business logic layer, then I would happily use those tools, probably alongside stored procedures to avoid arbitrary client queries.
As far as I know they don't exist. Things like foreign data wrappers in postgres are a pain because you have to drag in the entirety of postgres and because they're difficult to extend with your own functionality.
It’d be nice if there were something that let you use SQL (or similar) actually on the frontend with some good security. If such a thing exists please let me know
I used Postgrest with React for a few projects awhile back. It was... okay. The problem is without having much of an ecosystem around it means there’s no established best practices and you have to kind of guess how to organize things. I ended up with a jumbled mess of SQL functions that never really jived. Maybe I didn’t give it the shot it deserved but Rails feels incredibly simpler.
Or having multiple frontends, each with different API requirements. Yes, you can still write multiple SQL queries, but the “carefully written SQL views” are not that useful anymore.
Not only API, mostly backend. Walking graphs is inherently more expensive, to avoid cycles, than walking tables. Graphs are only for lazy people which cannot remove their up and side links from their data.
It's called normalization for a reason
When I was looking into it the "GraphQL" name was kinda misleading. It's not particularly graph-oriented, and not even especially good at representing graphs in general. Tree-oriented, a bit, I guess, which is a kind of graph. Maybe that's what they're going for.
No idea if this is true, but I think that it's because GraphQL is kind of the successor to FQL, which was used to query Facebook's "social graph," which was a big marketing thing back in those days.
I'm not against using DB objects as an API in small doses, but in defense of GQL what about:
Client side caching
Subscriptions
Authorization & Authentication - stuck with w/e rdbms provides you here rather than being able to choose your A&A solution.
Deeply nested data - duplicate data or you do multiple queries
Business logic in DB - complex backwards compatibility views, CASE statements scattered all over the place, ugly ass row level permissions. All good to say it's avoidable or maybe you don't care, but I prefer not to deal with it at all (dealt with db objects as API for several years, wasn't...parra pa pa paaa lovin' it, so to say)
Unless I'm missing something (which I could be, considering the "author" presented nothing more than a one-line bait-ish challenge), this is attempting to compare two different layers of data interface.
The implicit question is what makes GraphQL and SQL different layers?
They both seem to be solving the same problem allowing clients to query data with joins and projections so they can get exactly the data they want in one query. I could imagine translating GraphQL to SQL and SQL to GraphQL in a relatively straightforward way. Contrast that with attempting to translate SQL into ASM and then ASM back into SQL. This suggests to me they are at about the same level of abstraction.
Adding any layer for clients to fetch data introduces another layer of complexity and if you have graphql purely to surface something that could be queried directly then it is definitely overhead.
Where I see graphql shine is:
1. reducing large payloads to only the data you need
2. combining responses from many different sources into one query - especially when these might all talk different protocols, one sql, one nosql, one http etc
Is graphql the only solution to this? No
Is graphql designed pretty well to make this simple? Yes
> Adding any layer for clients to fetch data introduces another layer of complexity and if you have graphql purely to surface something that could be queried directly then it is definitely overhead.
It's not just overhead, there are a lot of benefits to it as well. I've worked a few places that just have all their applications talk to the same database and have had enormous issues.
Firstly, there is the issue of failing over to secondary databases. You can build it into GraphQL to just have it fail over to secondaries. With raw SQL, each application has to handle that sanely, and the chances that they all do is slim. You could put something like pgbouncer in front of it, but at that point you're still adding that layer of complexity.
You'll also have those people that crush the database with poorly optimized queries. GraphQL lets me choose what queries I allow, so no one can lock the database for an hour doing something insane. SQL doesn't give you a lot of tools for that; especially not when some people are supposed to be allowed to lock the database for an hour and others are not.
GraphQL also knows whether the query is a read or write operation; they're separate in GraphQL. You can sanely route read queries to replicas and pass write queries to the master. This is harder in SQL, and involves parsing the statements that come through. It's still doable, but again, you'll still need another layer before the SQL server to do it (pgbouncer implements this, I think).
I'm not saying GraphQL is the only answer to this, but I have yet to see any implementation of "all the clients just connect to our DB" that worked well.
> Firstly, there is the issue of failing over to secondary databases. You can build it into GraphQL to just have it fail over to secondaries. With raw SQL, each application has to handle that sanely, and the chances that they all do is slim. You could put something like pgbouncer in front of it, but at that point you're still adding that layer of complexity.
Not really, you place pgbouncer in front and problem solved, it's also very likely more performant than your glue app would be.
> You'll also have those people that crush the database with poorly optimized queries. GraphQL lets me choose what queries I allow, so no one can lock the database for an hour doing something insane. SQL doesn't give you a lot of tools for that; especially not when some people are supposed to be allowed to lock the database for an hour and others are not.
we are talking about read queries there (in the tweet he mentioned he is not yet convinced about making changes this way), you can have dedicated instance just for handling that traffic, you also can set up timeout for the query. You also have the same chance of getting bad queries when you use ORM.
> GraphQL also knows whether the query is a read or write operation; they're separate in GraphQL. You can sanely route read queries to replicas and pass write queries to the master. This is harder in SQL, and involves parsing the statements that come through. It's still doable, but again, you'll still need another layer before the SQL server to do it (pgbouncer implements this, I think).
this is not an issue pgpool-II does that as well, although author was talking about select queries.
> You'll also have those people that crush the database with poorly optimized queries. GraphQL lets me choose what queries I allow, so no one can lock the database for an hour doing something insane. SQL doesn't give you a lot of tools for that; especially not when some people are supposed to be allowed to lock the database for an hour and others are not.
You don't need a lot of tools for that, but SQL gives you complete tools for it. (Including, if necessary, strictly limiting allowed queries by only authorizing certain particular roles to access a defined set of sprocs without any access to base tables or even views, which seems to be the direct equivalent of what you are suggesting with GraphQL. This is an extremely common enterprise approach.) Because RBAC is standard with SQL RDBMSs (other than embedded/single-user stores like SQLite), it's trivial to handle “some roles are restricted to pre-cleared access patterns and others have either different pre-cleared patterns or free-query access.”
> GraphQL also knows whether the query is a read or write operation;
So does SQL.
> You can sanely route read queries to replicas and pass write queries to the master.
Replicated SQL setups do this routinely.
> This is harder in SQL, and involves parsing the statements that come through. It's still doable, but again, you'll still need another layer before the SQL server to do it (pgbouncer implements this, I think).
Well, by definition to route requests to different backends you need a server separate from the backends (or, at a minimum, all but one of them, but you want it separately from all of them in practice anyway.)
I don't think SQL is a better API language than GraphQL.
1. First of all, they are designed with different compromises in mind, GraphQL is much less flexible language than SQL — no ad-hoc, joins, joins are predefined in the schema, no ad-hoc filters, ...
Now because GraphQL is a less flexible language it allows more freedom on the implementors side. Implementing GraphQL endpoint is as simple as specifying a set fo resolver functions (parent, args) -> result (of course there could be more sophisticated implementation but the point is that you can start just from that). Now if you want to implement a "virtual table" which would work with SQL — it's much more work.
I'd be very cautious to expose SQL based API (because the languages is so expressive). And if I'd have a task to make SQL less expressive to make it fit then it'd probably look a lot like GraphQL but with a weird syntax.
2. Secondly, SQL is not really as ergonomic as GraphQL for nested data. Even given the modern extensions, it is designed for querying relations, not for nested object structures we like to operate in our JavaScript programs.
3. And third, SQL grammar isn't as composable as GraphQL one. Look how GraphQL has fragments in the language to support composability. It'd be very weird to do the same with SQL. Or you'd need to extend it in some way which may be a good option but I haven't see such extension in the wild.
That said I still think there's some class of apps (internal admin UIs, dashboards, ...) which will benefit from having SQL-level expressiveness. For that I'd like to use something like HTSQL[1] — it's a language with GraphQL like grammar which compiles to SQL. HTSQL is quite dated (been around for more than 10 years but isn't popular sadly) but the concept is sound and its original authors are now working on its successor — QueryCombinators[2].
Now back to GraphQL, I actually think that the right way to define GraphQL endpoints which query databases is by mapping database types to GraphQL types using a language like HTSQL. We have that approach implemented here[3] We found it being A. much more flexible than doing automatic DB schema to GraphQL reflection and B. much more easy than writing resolvers manually. Of course on app startups those declaratively defined queries are checked against database schema so we won't have failures at runtime.
I don't have a lot of GraphQL experience but I think there are some core issues with using SQL.
Single resultset responses might not be ideal. Multi result sets might be fine but I don't think its standard to SQL. (I could be wrong there though)
Transaction semantics can't be guaranteed when you're just a proxy to many services. We'd have to change SQL enough that a web SQL api for any service would be an entirely different beast anyway.
It's both sad and ironic how someone replied with "Ok boomer," SQL is based on relational algebra that dates to 1960-70ies, and GraphQL is based on graph theory that dates back to the 18th century.
Many from the modern generation of self-taught programmers are so eager to hop on just any kind of the hype train without even questioning the rationale of things.
On the other hand - ithere are many young programmers who do respect the fundamentals and do create fantastic projects. I guess the future is bright but messy. Most of the time, it is messy.
I've wondered for some time now why most API endpoints even need to exist. The very concept of "glue code" seems to me like a code smell; if you aren't doing any interesting logic, why do you need an extra layer of code there at all? The only answers I've ever been given are authentication and sanitization, both of which seem solvable.
I'm very confused. There are a lot of problem where you are doing nothing but integrating two services. You run a canned query but things like field names are different or one service uses GraphQL and the other uses CSV files on some network share. You are just converting between two things. Why is the extra layer of code there? Because it's very simple and less work. You just write 5 lines of queries and 10 lines doing field mapping. The alternative might be to use something like Apache Camel. It might be the right tool for the job but now everyone involved has to be an Apache Camel expert even though your team donesn't care about Apache Camel and that is a perfectly acceptable position. Instead making dogmatic decisions like avoiding "wasteful glue code" because of your ideology you just write the code and be done with the problem. The end result is the same but you get to write code in the tech stack your company has been using for years instead of trying to chase some stupid enterprise architecture fad.
Why focus on the easy problems? Why not solve the hard ones?
No, it is kind of what I'm saying. More hamfisted than I'd like, maybe, but along the right lines.
My point is that an extra layer of in-house code is an extra layer to build and maintain, even if it doesn't do anything meaningful. Ideally it could be eliminated altogether, but automatically generating it still relieves most of the development overhead.
I have only encountered public SQL APIs in the wild once, and at that time it made me very happy. I was building a scraper for a manufacturer of bicycle parts that shall remain unnamed. Scraping the whole database using a single SQL query was way better than hacking XPaths. I am not sure if their DBA appreciated it as much as I did.
GraphQL lets you describe the data requirements of a piece of UI as fragments, co-locate those fragments with your actual UI code, then nest those components and fragments to build up a query that concisely describes the UI without over/under fetching.
select *
from messages
where posted < '2020-01-01' or content like '%Bob%'
It's a little bit constrained by the the characters that can appear unescaped in the query component of a URL, which is why it uses '-' instead of '<' for less-than and '/' for 'or'. But there are enough characters to allow a range of operators, and parenthesis is allowed for grouping. Also it's a super set of the usual way that browsers encode forms, so that '=' and '&' work the way you'd expect.
The main problem with "just" REST IMO is that it's only specific at a general level - so things like how to deal with whatever related data should or shouldn't be included are kind of "left as an exercise for the implementer." If REST were basically the same but included some kind of universal specification for managing fields, resources, nested data, and relationships requested/returned, or even allowed a means of going single-endpoint for request reduction, the many variants and alternatives, whether GraphQL, JSON:API, or others, wouldn't exist to the same degree.
As far as I can tell REST has not quite enough structure to allow interesting things. As an API it doesn't allow you to do much more than talk to the thing.
The next step up is something like OpenAPI which is still REST but at least it encodes what result you could expect and how the objects are related to each other.
Another step and you've got GraphQL which doesn't just enforce the schema but actually generates your entire API from it (just the interface though not the implementation). It goes from 'this endpoint returns data conforming to this definition' to 'these are the objects I can return and these are their properties'.
By allowing more flexible queries it is possible to encode information into these objects that is impossible to fit in a single JSON (consider e.g. a simple user-management API that returns for each user a list of all the groups they're in and for each group a list of all its users).
Certainly SQL has broader capability then GraphQL. However, there is something about being able to issue a query that returns nested data (e.g. query for invoices and all their line items) that is really great about GraphQL and you cannot do the same with SQL.
What I really hate about GraphQL is it forces you to define input types separately from return types - that doubles schema sizes.
Well you can - you can write CTEs and then put all line items in a JSON column or something; but still.
> Certainly SQL has broader capability then GraphQL. However, there is something about being able to issue a query that returns nested data (e.g. query for invoices and all their line items) that is really great about GraphQL and you cannot do the same with SQL.
SQL isn't "a language". It's a family of incompatible dialects. The ISO standard costs hundreds of dollars, and nobody follows it anyway.
Or if there's a specific proprietary dialect we're supposed to compare GraphQL to, then name it. I've used a lot of them, and they all suck in some way or other. I bet we can come up with lots of awful issues with any of them.
Is this cheating? I feel like I'm attacking someone with both hands tied behind their back:
- All GraphQL implementations agree on how many bits an integer has
- All GraphQL implementations agree that null is not a string
- All GraphQL implementations agree that strings are UTF-8 (and that "UTF-8" means actual UTF-8 and not some BMP-only subset)
Interfaces with gratuitous incompatibilities are never good because of them. That's simply a method of vendor lock-in. "Best Viewed With ___" badges weren't indicators of technical quality, either.
I've been playing around with SQL as an API querying language for a couple of years now as part of my Datasette project: https://datasette.readthedocs.io/en/stable/
I started out thinking that SQL for API queries was genuinely a terrible idea. Datasette supported it because it's designed as an ad-hoc querying tool (which happens to be able to export resuls as JSON), but obviously that was an awful idea for production applications.
Then I started building a few SQL-in-client-side-JavaScript applications, just as a rapid prototyping tool. And they worked amazingly well. Here's an article I wrote about that in January 2018: https://24ways.org/2018/fast-autocomplete-search-for-your-we... (really simple demo here: https://media.24ways.org/2018/willison/ )
I keep on waiting for the obvious downsides to show up, and they keep stubbornly refusing to appear. And meanwhile the rest of the software world is going all-in on GraphQL which seems MUCH harder to work with - and less performant.