
Efficient GraphQL Queries in Ruby on Rails and Postgres - moritzplassnig
https://pganalyze.com/blog/efficient-graphql-queries-in-ruby-on-rails-and-postgres
======
xyzzy_plugh
> The following query produces the data that we want... we just need to figure
> out how to write a batch loader that generates the same result.

This is where the abstraction breaks down. Should you care what the query
ultimately looks like? For anything beyond a toy, you sure should. But how far
should you go to ensure the abstraction produces the implementation you want?
Why not just skip to the implementation you want?

Sooner or later, you need to break the abstraction. I'd rather just write the
SQL for all my endpoints.

I'm not bullish on GraphQL for the same reason I'm not bullish on ORMs. For
some things, an abstraction suffices, but it's far from a universal solution.
If I have to break out of the GraphQL abstraction, why bother in the first
place? Is there value in a mixed GQL/REST API?

> GraphQL can be as efficient as REST, but requires approaching optimizations
> from a different angle. Instead of upfront optimizations, we lazy-load data
> only when required, loading it in batches to avoid excess trips to the
> database.

Admittedly, the author does a good job of preserving the abstraction, but I'm
hesitant to expect most devs to dive this deep, and thus hesitant to expect
this level of quality/performance from a GraphQL backend.

~~~
atombender
The problem with "just SQL" is that you arguably want to compose queries.

A foundational observation that GraphQL makes is that every query is different
depending on the client's needs. You can do "SELECT * FROM users JOIN
companies ON companies.id = users.employer_id" and call it a day, but now
you're already probably overfetching, and the next evolutions of your codebase
ends up accruing extra logic to add or remove features: Some queries want to
fetch just a subset of fields from each of those relations. Some queries don't
want to fetch any companies. Some want to paginate, filter, order in different
ways.

You can do all that in REST, of course. You can use an SQL builder to build
the final SQL query programmatically. You can add WHERE clauses and FULL OUTER
JOIN clauses to do your N+1 queries on foreign associations, with some kind of
query parameter to define what associations the client wants. Field selection
is also doable with query parameter and so on. You can also centralize a lot
of the boilerplate to do this (e.g. FULL OUTER JOIN clauses on associations,
if you do them, need to be untangled into object graphs after you get the flat
results back).

And so on.

What you end up with is pretty much GraphQL. You need code — identical for
every relation, modulo client-specifiable constraints and parameters — for
filtering, pagination, selection, joining, etc., all of those things. You need
some notation for letting the client specify what they want to fetch, both
fields and associations. Then, even when you have ended up with something
approximates GraphQL, you have to write a custom client for it that only works
for this exact API. And then you have to write documentation. And tests, tests
which don't just exercise your "business logic", but also exercises all the
REST-based query API stuff.

The benefits to GraphQL are that it's standard, so you can piggyback off
existing libraries both on the client side (things like Apollo) and on the
backend, libraries that are already tested and documented. GraphQL itself
defines a schema, which serves as documentation and a way to machine-generate
clients. As a case in point, try out Prima's GraphQL playground app [1]. It's
basically a web-based IDE for GraphQL. You can point the app at any GraphQL
API, inspect the schema, write queries (you can autocompletion of all fields,
queries, mutations, etc.) and see live results. It's a really productive
debugging tool.

That alone is worth a lot, and that's why I'm bullish on GraphQL.

[1] [https://github.com/prisma-labs/graphql-
playground](https://github.com/prisma-labs/graphql-playground)

~~~
ako
Or you can use OData, which has the same benefits as graphql.

~~~
atombender
Maybe in principle, but I don't see OData having _anywhere_ near the traction
that GraphQL does.

------
sandstrom
We've looked at GraphQL but ended up going with Graphiti instead.

Only a few weeks since we implemented, but happy so far. Feels like it's the
best of REST and GraphQL.

This page from Graphiti[1] explains the thinking behind it. Don't be
discouraged by the loud pink colors ;)

[1] [https://www.graphiti.dev/guides/why](https://www.graphiti.dev/guides/why)

~~~
SketchySeaBeast
I was excited to look at a REST/GraphQL alternative, but that website doesn't
fill me with a lot of excitement or enthusiasm. The styling choice is fairly
off-putting, and the first line of the "why" has a misspelling.

~~~
sandstrom
Yeah, I know, it's subjective but I also think the design is awful. It's good
software though!

------
bdcravens
I'm leaning towards letting the elements of my app that need to be in Rails be
in Rails, but moving the GraphQL layer to Hasura, bypassing the ActiveRecord
complexity. Has anyone else taken this approach?

~~~
aidos
Yes. Well, we’ve built a prototype recently that just uses Hasura for the
backend (don’t need anything other than data access at this stage).

It’s a really great engine and I’m glad we took a punt on it. Aside from the
expected behaviour, you get some nice bonus stuff like nested updates for
free.

We mostly tried it out because we normally use python and we wanted
subscriptions. The graphql story in python (especially async) isn’t amazing.
When we need more business logic, we’ll build it in python and then you can
get Hasura to do delivery to python (to ensure each event is run exactly
once).

I’d recommend kicking the tyres to see if it works for you. The team are
awesome too, and moving quickly. I have no affiliation but I think it’s going
to do really well.

~~~
bdcravens
How's the performance? As I recall, the way it does relationships is pretty
sane as opposed to AR's defaults of N+1.

~~~
aidos
Performance is really good (not that we’re using it at scale - but apparently
even there it shines).

They use a really clever technique for loading where not only do they. pull
all the child data for multiple parent records as a single hit, they actually
run a single query for multiple _connected clients_.

So if you have 10,000 connected clients subscribed to the same graphql
queries, they build a temp table with 10k rows (one per client) and add all
the environment variables. Then they execute the queries once to get all the
data for everyone at once (joining to the temp table and also joining to all
the tables for access rules you set up in Hasura). Then they split it up and
send back the relevant chunks to each client. It’s much more performant than
Postgres RSL because the rules can be executed in the joins in a way that
works for every client at the same time.

Edit mobile typos

------
therusskiy
to solve N+1 in Ruby Graphql automatically I wrote a thingy that analyzes what
fields were requested in a recursive manner and then uses a preloader at the
root level to load everything in advance. Solve 80% of N+1 problems
automatically.

~~~
rytill
Don't know why this isn't builtin to GQL libraries - seems like the whole
point of GQL is to do these types of queries and it's horribly inefficient by
default.

------
johnymontana
Another option for solving the N+1 query problem is to generate a single
database query at the root level by inspecting the selection set of the
GraphQL query. This is the approach used by Neo4j GraphQL (and other GraphQL
database integrations): [https://grandstack.io/docs/neo4j-graphql-
js.html](https://grandstack.io/docs/neo4j-graphql-js.html)

~~~
yxhuvud
That might be doable in Ruby using the Sequel library, but if you are stuck in
a Rails codebase using Activerecord, the amount of query complexity until that
approach break down is not very big.

------
e12e
Somewhat on the side, but:

> By using the includes method we've been able to knock our queries down from
> six to two: The first to find the events, and the second to find the
> categories for those events.

Am I the only one that thinks rails/activerecord is crazy here for (often)
preferring two trips to the db over a join on indexed primary keys?

Are there really (sane) use-cases where that is better?

~~~
uhoh-itsmaciek
>Am I the only one that thinks rails/activerecord is crazy here for (often)
preferring two trips to the db over a join on indexed primary keys?

You mean

    
    
      SELECT u.* , c.* FROM users u INNER JOIN comments c
        ON u.id = c.user_id WHERE u.id = <...>
    

?

If the average user record includes a non-trivial amount of data and there are
many comments, you're repeating a lot of information in the query results.

disclaimer: I'm currently doing some work for pganalyze.com (the blog host)

~~~
e12e
Yes.

I'm not sure about "non-trivial amount of data" \- especially if we're talking
about a single row (single user) here? I suppose there might be a few blolbs
of large json documents?

And I'm not sure I understand "repeating information in query results" \-
surely we're talking about receiving a query, transforming it and responding
with the data?

In a rails+graphql ideal world, that'd maybe mean Parsing the json query,
generate some simple ruby (active record) code, fire it off to the db (let ar
/arel do its job; generate sql) - serialize to json and write the response?

If there truly is non-trivial data in fields _not requested_ , it might be
worth it to naemrrow the columns selected. But I doubt it, in the general
case.

~~~
uhoh-itsmaciek
>I'm not sure about "non-trivial amount of data" \- especially if we're
talking about a single row (single user) here? I suppose there might be a few
blolbs of large json documents? > >And I'm not sure I understand "repeating
information in query results" \- surely we're talking about receiving a query,
transforming it and responding with the data?

For a query like

    
    
      SELECT u.* , c.* FROM users u INNER JOIN comments c
        ON u.id = c.user_id WHERE u.id = <...>
    

above, the user information is repeated for each comment, e.g.,

    
    
      u.id, u.name, c.id, c.text
      1     Bob     1     you are right
      1     Bob     2     you are wrong
      1     Bob     3     you are right again
    

If you have a hundred comments, and many user fields, this can be a
significant amount of data to pass around. Or maybe this is not what you
meant? I feel like we may be talking past each other. Can you give an example
of what you think the ideal sql should be for "give me user with id = 1 and
all their comments"?

~~~
e12e
No, you're probably right - I suppose the database drivers might naively
serialize that in a one-to-one representation (to be honest, I've never really
looked into the protocols that postgres or Ms sql use for transferring data).

I didn't consider the kind of expansion you're alluding to - on the other hand
i can't ever remember seeing an indication that the amount of data returned is
an issue with app performance.

But I've certainly seen the number of round-trips result in real issues. And
complicated joins, for that matter.

~~~
uhoh-itsmaciek
I've worked on a few Postgres drivers and the protocol is basically sending
data row-by-row, column-by-column. Sure, it could be more clever, but that's
time and cpu and memory spent being clever instead of moving data.

But you're right, there are always trade-offs, and there are some situations
where the current AR approach is not ideal, and a single join would be more
efficient. However, I suspect this is the way AR does it because it's right
enough often enough that it works well as the default approach.

------
chx
After
[https://stackoverflow.com/a/57247614/308851](https://stackoverflow.com/a/57247614/308851)
I am at a complete loss of why I would I want GraphQL. A client uses it , so I
obey but I do not see the point. To quote the linked answer to save you a
click:

> GraphQL does not provide any built-in mechanisms for filtering, sorting,
> pagination or other arbitrary transformations of the response

~~~
lozenge
You wouldn't want to let people filter by an unindexed field, sort huge large
result sets, or make your server store a huge result set because they stop
making requests after getting the first few pages. GraphQL lets you express
what is permitted, without allowing more than your underlying data store can
support.

For example, from the GitHub GraphQL API you can in a single request get all
open pull requests on a repository and the last 10 comments on each one. You
can't get the last 10 comments on all of GitHub or the last 10 comments made
by a particular user, because they haven't exposed or even stored the data
that way.

In a REST API that would be dozens of separate requests, or the endpoint would
just not exist because they have never had the need for that exact combination
of records to be queryable.

