
On GraphQL-to-SQL - danielrearden
https://productionreadygraphql.com/blog/2020-05-21-graphql-to-sql/
======
tango12
I'm surprised that there's no mention about the interplay of authorization and
data fetching in the post.

This is one of the things that seems to be a little bit nuanced
(surprisingly?) because it's not just about creating the right query to fetch
data optimally, but it is also about making sure that it's handling data with
the right authorization context of the current end-user.

This becomes especially tedious when you're fetching objects and lists of
objects and the "resolvers" to fetch those objects need to mix in
authorization rules in each resolver.

Hasura[1] tackles this by providing a model level authorization layer (similar
to Postgres RLS, but at the Hasura/application layer) which makes GraphQL to
SQL feasible even for apps. It's not just about a translation layer.
Postgraphile does this using Postgres RLS.

Those are the key benefits that make Hasura useful for data-fetching in
practice. Over the last few years we've been working with users with billions
of records per table, especially in enterprise environments. And it's great to
see how smooth things have been.

Especially as the number of tables and models explode, having a declarative
system to specify authorization at a model level and then get a guarantee that
an effective SQL query will get created is a nice thing to have. It allows our
users to deal with higher level concepts, model data, setup authorization,
make queries, look at query plans (which now show full SQL queries), add
indexes, test, add to an allow-list, done.

[1] - [https://hasura.io](https://hasura.io)

(I'm from Hasura)

~~~
kasbah
What I didn't find out while trialing Hasura is if it's possible to declare
your GraphQL API using code rather than the Hasura web GUI. I eventually gave
up on it because I figured I couldn't store the configuration I was making as
code in a git repo.

Your comment makes me think I was maybe wrong about that? Could you confirm
and/or link to some docs about this aspect?

~~~
tango12
Everything in Hasura is configured declaratively. The web GUI is essentially
just updating that configuration. There are a few options:

1\. You use the GUI and then export/import the metadata

2\. You can use the GUI and as you do, the GUI will keep incrementally
updating the metadata configuration file and even create database migrations
for you in your folder on your machine, kind of like an IDE

3\. You can stop using the UI altogether, and just write the configuration by
hand

Docs:
[https://hasura.io/docs/1.0/graphql/manual/migrations/index.h...](https://hasura.io/docs/1.0/graphql/manual/migrations/index.html)

Also, feel free to ask around on our discord if you need help:
[https://discord.gg/hasura](https://discord.gg/hasura)

~~~
adav
This approach means one’s own deployments of Hasura can’t be immutable?

~~~
tango12
There's a docker image that can take a configuration file as input.

[https://hasura.io/docs/1.0/graphql/manual/migrations/advance...](https://hasura.io/docs/1.0/graphql/manual/migrations/advanced/auto-
apply-migrations.html)

You can choose to bring in that configuration via a Kubernetes configmaps type
thing, or by baking a new image for a release (like you would have done with
your source code say).

~~~
adav
Yep, that’s what I’ve been doing for my experiments with Hasura so far :) It
just feels a little “off-piste”.

------
fb03
Shameless Plug here:

I have developed a Python library that does essentially this GraphQL-to-SQL
pattern, leveraging the power of SQLAlchemy and some metaprogramming to create
the 4 CRUD methods for each table automatically, all with filters, orderby,
pagination and subrelationships/queries. And if you use SQLAlchemy's automap,
you don't even have to have the database schema ready.

I needed this when a client with a huge Microsoft SQL Server database (2k
tables+) asked me to quickly export it 'to the web' with the least friction
possible. I researched Hasura and it was available only for PGSQL, so I
created my own, and by using SQLAlchemy I can connect with any database
supported. I am using pyODBC with Microsoft's drivers on Linux and I can
easily connect to the database and export it.

I am gonna provide example code soon'ish on the library's repo. But basically,
it goes:

# Instantiate the library, pass in your db url and your SQLAlchemy database
schema (declarative_base() or automap_base())

# GraphQL objects, schema and actual crud functions leveraging the passed
connection_string get dynamically generated.

# Instantiate a GrapQLView, and pass it to your Flask App()

# That's it!

[https://github.com/flipbit03/sqlalchemy-graphql-
schemagen](https://github.com/flipbit03/sqlalchemy-graphql-schemagen)

------
juliennakache
Quick plug: we've made good progress on solving this very problem for the
SQLAlchemy ORM in Python. Our approach is a bit like Join-Monster but more
flexible because:

    
    
      - we introspect your SQLAlchemy models and generate Dataloader-based resolvers that emit small independent SQL queries. This way this optimization composes naturally with other GraphQL types in your schema.
      - we are thin wrapper on top of Graphene, the main GraphQL Python library. It makes it very easy to extend your type if needed. We also added ways to rename and modify the type of a column / field.
      - for the SQL query generation we currently lean on SQLAlchemy to do the heavy lifting. It can emit efficient in IN() JOIN queries for us.
    

So far, it's been working quite nicely for us. We were able to generate
decently performant schemas with very boilerplate on top of our existing
SQLAlchemy models. And each time we've run into a case that the library did
not optimize well for us, we were able to easily override its behavior by
writing our SQLAlchemy queries.

There is still room for performance improvements but I'm confident we'll get
there eventually (though I expect the optimizations to only work for DBs that
support LATERAL JOINs). My main focus at the moment is to come up with an API
that it possible to create performant types that aggregate multiple tables.

Links \- [https://github.com/graphql-python/graphene-
sqlalchemy](https://github.com/graphql-python/graphene-sqlalchemy) \-
[https://github.com/graphql-python/graphene-
sqlalchemy/pull/2...](https://github.com/graphql-python/graphene-
sqlalchemy/pull/253)

EDIT: formatting

~~~
chrstphrhrt
I'm working on a project that depends on Graphene and Django (not my choice).
Any ideas off hand about doing the same thing there? IIRC Django can also use
SQLAlchemy but it didn't used to be very nice back in the day.

~~~
ris
> IIRC Django can also use SQLAlchemy

It's _possible_ to use sqlalchemy with django, but you're basically on your
own (don't expect to be able to use django-admin and other pluggable apps
etc.)

> it didn't used to be very nice back in the day

Having used both django and sqlalchemy ORMs very heavily, I've got to say that
django does what it does _extremely_ well and intuitively. For what the vast
majority of webapps need to do, the "obvious" way of constructing the query
causes django to do more or less the right thing. sqlalchemy on the other hand
I love for other reasons, but am frequently annoyed at how much faffing is
required to do the simple things in a concise way.

~~~
true_religion
One thing you can do is use SQL alchemy, by duplicating code so you will have
Django code to handle things you want to do in Django.... typically anything
that touches the admin or model forms, but use SQL Alchemy for everything
else.

~~~
ris
Yyyyeah, but you _do_ end up duplicating potentially quite a lot of code,
especially if you're following the (IMHO nice) "fat models" approach.

I think a few people have thrown around the idea of creating a frontend to
sqlalchemy to allow it to understand django model definitions and QuerySet
semantics but I don't think anything's come of it.

------
holtalanm
One thing that is bugging me is that, while the article mentions DataLoader,
it kind of just breezes over it without explaining the reasons why DataLoader
isn't a good long-term solution.

> Besides the popular Dataloader approach, another very common way of tackling
> this problem is by taking a GraphQL query, and coming up with the exact SQL
> needed to resolve it

But why? Why not just use DataLoader? Is there something intrinsically wrong
with it that makes it a bad solution? I want to know! My understanding was the
DataLoader was extremely performant and worked for 90% of all situations where
a join is needed to get values for a GraphQL query.

> Data fetching remains something to improve

Certainly this might be the case, but the article didn't address the most
popular form of data fetching and explain why a graphql-to-sql compiler would
be a better solution.

~~~
kevsim
We’ve been using data loader a lot and it’s working ok. We were very careful
to dump all sql queries our app was making per request to ensure we were not
making more queries than necessary.

One thing I will say as we do make a lot of “IN” queries as a result. You
fetch a bunch of rows with a data loader and then you want to fetch a bunch of
rows from some other table based on those first results so you end up with an
IN query where you might have had crafted a JOIN if you were writing it by
hand.

~~~
xyzzy_plugh
But does the query plan perform a join? The text of your query matters less
than the plan it generates.

~~~
kevsim
No, it can't because it's actually making two separate queries. For example,
let's say we're fetching classified ads with the users that posted them. First
your GraphQL resolvers would fetch the ads (and you won't know that the users
are being asked for). Then for each ad, you'd tell your data loader to fetch
the user associated with that add. That will ultimately result in a single SQL
query with an IN like `SELECT * FROM users WHERE id IN (?)` where you'd pass
in all the user IDs you gathered from the results of the ads query.

However, another commenter pointed out above that some implementations allow
peeking into the GraphQL query. That way if you know you're going to need the
user anyway, you can already perform the join proactively when you fetch the
ads.

In a few cases we've done this because we always want the joined results so we
just do the join regardless of the query we get.

~~~
xyzzy_plugh
Erm, are you telling me there is no way to pipeline the ads query to be inline
within the IN clause? Because that'd generate a join, generally.

I still don't understand why this abstraction is useful, if not harmful.

~~~
kevsim
I think there is a way to do it, that was the peeking I referred to. But it's
not how it works out of the box.

I think the abstraction is somewhat useful if your GraphQL queries vary
tremendously (one device asks for the user along with the ads, another device
doesn't). Expand this out to a bunch of different data types and you'd find
yourself writing a lot of variations of the queries. GraphQL with Data loader
allows you to write these things exactly once albeit with potentially
suboptimal queries.

All of that being said, we're finding that our queries vary very little and
are somewhat questioning the value of data loaders and of GraphQL altogether
in our case.

------
arcticfox
Hasura is amazing, glad to see it featured here. The author never got around
to any specific downsides (only a general complaint for some reason), does
anyone have examples of where it struggles? I use it daily and have never run
into a limitation.

~~~
jakevoytko
One example is interacting with complexity introduced by outgrowing a single
database instance.

This scaling could work a variety of ways. Records might be horizontally
sharded across a bunch of database instances. Caching is layered above the
database to protect against hot database queries, etc. The generation logic
now needs to understand how to partition a single GraphQL query across the
shards and caching layers.

It's definitely not impossible, and in some circumstances it may be easy. But
it couples your GraphQL implementation very tightly to your current database
system in a way that may make it difficult to rearchitect your database system
at the next order of magnitude scaling problem - "We really need to redo how
we think sharding and caching, and our GraphQL's direct interaction with the
data layer is going to make this a challenge." Maybe it's easier if the
company is new and is GraphQL-only. An older company that has a mix of a few
data access technologies might wish the GraphQL left the data layer concerns
to the data layer.

~~~
tango12
We think about this at Hasura deeply!

I know it's counter intuitive, but we think of this as an opportunity to solve
those problems at the Hasura layer. Sharding and caching are hard but also not
impossible to make declarative if you know enough about how the data is
modelled, how data ownership works (authorization) and how the cache is
managed (caching hot queries, caching complex queries but infrequent,
invalidated on time or data).

I'm not trying to say the problem isn't hard, I'm trying to ask what it would
take to solve the problem. :) We're also actively working on solving them.
Imagine if those challenges got solved for a sufficient number of use cases!

Incidentally, databases are getting better at preventing users outgrowing them
too while preserving the SQL API as much as possible, or providing a
transform. Citus / YugaByte / Planetscale / Cockroach / Spanner to name a few.

~~~
sixdimensional
I worked with a team on this problem once before. It is something definitely
solvable, it does take some engineering time though. Also, that team I worked
with missed something pretty important - you'll never solve this problem
completely when data becomes too big to dynamically join (or even cache)
across heterogeneous systems - but it is something I think you could maybe
solve differently using event queues and materialized views for specific
complex cases.

Have you guys thought about looking at some other tools that might be
symbiotic? Hasura's interface would be an incredible companion to something
like Dremio (also open source), since it is a fully-SQL engine across a number
of heterogeneous systems, with probably the most advanced caching I have ever
seen in a system like it (Apache Arrow in-memory and columnar storage). If you
can make your query interface adapted to tools like Dremio, you shouldn't have
any problem exploiting caching in those kinds of platforms - might be even
more efficient than building into Hasura for some use cases.

Edit: I think there is a newer type of interface architecture forming here,
and I think GraphQL represents one of the pieces of the puzzle, would be
really interested to see more detail on Hasura's take on that.

------
stephen
This is probably too early to share, but I've been flushing out a Node/TS ORM
that integrates dataloader from day 1:

[https://github.com/stephenh/joist-ts](https://github.com/stephenh/joist-ts)

Disclaimer, it's being used in low-volume production by us, but that is about
it, so ymmv. The "docs" is the sample app used for integration tests/etc.

I mention it b/c I agree with the article that GraphQL is super-easy to N+1,
but the Hasuras/Postgraphiles are a fairly different application architecture
than "I'd like some ActiveRecord-style domain objects".

Prisma is also promising, but AFAIU more of "a really great query layer" than
"how you'd write your domain objects", which is what I wanted to focus on.

I was originally a node/JS skeptic, but it is uncanny how well the JS
"limitation"/architecture of an event-loop + async/await basically provides
Haxl-style batching "for free" to the application programmer.

~~~
sickcodebruh
Good luck with this, I really hope it works out. We've been using TypeORM with
Postgres for maybe a year now and we keep finding little things that are
really surprising, it's hurting our faith in it.

~~~
latchkey
MikroORM is a nice alternative to TypeORM that is newer on the market. I tried
out an experiment with TypeORM and basic relations and things didn't work the
way I'd expect at all. Looked in the issue tracker and saw many open
unresolved issues.

Went looking around a bit more and found MikroORM. The developer is fantastic,
fixes issues and overall it is a better designed tool that just works the way
you'd expect.

Edit: I just noticed that stephen is the op, an old friend. It is worth
looking into Joist as well. =)

------
geuis
Looking for some informed opinions here. I’ve been building web applications
for over 15-16 years. Graphql was introduced a few years ago and I’ll admit, I
have a default anti-Facebook derived tech opinion. I’m biased against anything
that comes out of Facebook. The one exception to that is React. I’ve learned
to use it extensively and while I’m not a huge fan, it does get the job done
well enough.

When looking into Graphql, and having toyed with it a little, it’s never made
sense to me. Always has come across as an extra layer of obfuscation and
complexity when dealing with the data access later. You have to write backing
sql/db queries anyway to support the Graphql layer. More efficient use of time
to just write good db queries.

Is there something I’m missing here or is the world finally catching up to
reality?

~~~
tango12
When you say GraphQL hasn’t made sense to you, do you mean as a api consumer /
frontend developer or do you mean as someone who has to build The graphql api?

The benefits for api consumers and frontend developers are quite clear. Easier
ergonomics to explore the API and ability to collocate data fetching alongside
UI components to automatically have a graphql api client optimise the data
fetching required to load a screen/page itself itself.

The benefits of building a graphql api/service do seem harder to evaluate.

------
julianeon
One thing that may help people who are generally like 'huh?' when it comes to
GraphQL:

You can use GraphQL with SQL. This is about that.

However, you can also use GraphQL with NoSQL. AWS Amplify uses GraphQL with
DynamoDB, and that's a pretty common stack in the AWSiverse.

You can't use SQL with NoSQL, obviously.

So, if you're looking to learn more about GraphQL, just realize the lessons
here aren't universally applicable, though GraphQL-to-SQL is probably the best
implementation, imo.

~~~
postalrat
> You can use GraphQL with SQL.

What do you mean by that?

------
obi1kenobi
Lots of good points there. I'm one of the authors of the GraphQL compiler
Python package, and wrote up some of my thoughts here:
[https://twitter.com/PredragGruevski/status/12639165969401937...](https://twitter.com/PredragGruevski/status/1263916596940193792?s=20)

------
ldeangelis
Maybe not the best place to mention this, but there's something I don't like
about links on this blog: there's no way to see if I already visited the link
or not. It would easy I think to just change the color of the underline on a
visited link.

~~~
xuorig_
Good call, I'll fix that.

~~~
ldeangelis
Thank you!

------
Maka_buka
in case you are on .net stack on your project I'd recommend take a look
NReco.GraphQL. It's very easy to use, need just set up schema in json file and
db connection string. Schema may be generated by script if you have lots of db
tables. You may find a couple nice topics on ITNext (on medium platform -
[https://itnext.io/graphql-api-for-sql-database-in-net-
af2fe3...](https://itnext.io/graphql-api-for-sql-database-in-net-af2fe322fe28)
) regarding this component.

------
mbrello
Thanks, I have been thinking about this lately too.

