
The dark side of GraphQL: performance - kamranahmedse
https://twitter.com/benawad/status/1212392789637521410
======
jensneuse
The title is misleading. The post doesn't discover any dark sides of GraphQL.
The post is about a potential performance problem with a library that
implements the GraphQL spec. There might be a problem with the library itself.
There might be a problem with the use of said library. The author states that
it takes 19ms to fetch 20 recipes from a postgres database. This looks really
suspicious. Why does it take so long to fetch 20 indexed rows? Maybe there's
some general performance problem with the application?

~~~
ctvo
You focus on and make assumptions it's indexed rows and that ~20ms average for
a database call is "suspicious" but you're not concerned about the 400ms
flamechart for graphql-js doing validation shown in the thread?

graphql-js is the reference implementation of GraphQL, so it's not any random
library.

~~~
jensneuse
The graphql-js library focuses on correctness, not on performance. Facebook
doesn't invoke it at runtime, only at built time. They use persistent queries
only. If you want a high performance server runtime I wouldn't use Node.JS.
Especially for a complicated task like validating and resolving a GraphQL
query Node.JS is the wrong tool. It's too high level to tweak hot paths and
optimize the garbage collector. So no, I don't think the flame graph is
suspicious. In my language of choice (go) I could drill down memory and CPU
consumption for each line of code to find the bottleneck. Maybe this is
possible for Node.JS too, I don't know the tooling so well. I would suggest
that if such tool exists a detailed flame graph of the Node.JS application
might help understand the issue.

~~~
hn_throwaway_99
OP is using Apollo Server, which is by far the most common server
implementation for GraphQL. It may well be there are issues specific to
Apollo, but it's definitely worth getting to the bottom of based on how widely
used Apollo is.

There is nothing in the posts that identifies NodeJS as the culprit, and based
on the info I'd be very surprised if it was. It seems most likely that the
type validation is what is taking so much time. But then again, strong types
are one of the main benefits of GraphQL. If anything, I've found Node to be
one of the easiest and most "natural" server languages for GraphQL, and I have
implemented GraphQL servers in Node, Java and Python.

~~~
dnautics
Have you tried elixir's absinthe?

~~~
deathtrader666
I'm curious to find any performance comparisons between Elixir's Absinthe and
Hasura, on a Phoenix app.

~~~
dnautics
I would be extremely (but prepared to be) surprised if Phoenix + Hasura were
faster in terms of latency than Phoenix + Absinthe, since one has to enter and
exit two vms and the other doesn't, unless you're suggesting the frontend
issue graphql results directly to the hasura backend, bypassing phoenix.

~~~
deathtrader666
Two ways one could test - 1] REST client <-> Hasura <-> Phoenix 2] Phoenix-
generated HTML <-> Hasura <-> Phoenix

------
CharlesW
Reading the thread, this isn't a "dark side of GraphQL" but a "dark side of
not understanding how to debug/improve performance in my software dependency".

~~~
tessting
Ben Awad is pretty knowledgable in graphql. He is one of the most solid
youtube tutorial guys on it.

~~~
CharlesW
I have no doubt. The unfortunate bit is that his tweet uses "GraphQL" to refer
to a specific implementation (Apollo Server) of GraphQL rather than GraphQL
itself.

~~~
wp381640
It would be like posting "The dark side of SQL" for a slow MySQL query

~~~
NotSammyHagar
But there are dark sides to using sql, often from the abstraction that sql
provides.

Maybe the optimizer picks a poor plan and you can't figure out how to make it
work better. Maybe the schema has redundancy you can't change or the indexes
aren't suitable for that query. Maybe it's auto parameterizing constants and
the query with the problem has a parameter causing different behavior than the
original constant used in optimizing the query, or maybe your query with 1000
elements in an in list worked great in memsql or whatever but is slow
unexpectedly in the database you ported your app to. There are downsides to
everything.

------
picardo
I'm not sure if this is mentioned in the thread, but one of the reason it
takes so long for the requests to return is when GQL initializes the entire
record in memory and then reduces it back to only the fields you wanted. This
can be a big problem if you have a deeply nested data model, and potentially
many results. The memory consumption can hit the roof. I find that the best
approach in those cases is to create a one-off REST endpoint (or to create a
field higher up the GQL hierarchy) and handroll the SQL query.

~~~
greenpizza13
Things have matured quite a bit. With Apollo Server it's possible to fully
understand which fields are being requested before creating and running, for
example, and SQL query. Fetching only the requested data for a given query
reduces in-memory footprint. Most people get the whole data object and then
allow GQL to select the subset of fields the user asked for, but for cases
where performance is a problem there is another solution.

~~~
picardo
I haven't used Apollo Server lately. But the way you describe it doesn't
address the core issue, which is the initialization of the intermediate
objects in-memory. So just to give an example, if I wanted to query for the
projects of listings of my company, I can write it this way in GQL:

    
    
       me { company { listings { projects { id name } } } }
    

This will initialize: a User, a Company, Listings and Projects of all
listings.

I can also write this in SQL using a couple of joins and return an array. The
memory consumption is trivial in comparison to the original request.

~~~
klarstrup
You can implement your GraphQL server to do either of those, it's not
inherent.

------
rubyn00bie
I guess I still don't quite get most GraphQL designs or why a lot of people
jump to implement it... I have always thought the big idea behind GraphQL is
you already have endpoints (likely rest) which are cached/optimized? And then
GraphQL becomes a layer over the top to map/reduce client requests for more
optimized request/response cycles for clients (and I guess decouple some
business logic)?

Which then makes me wonder why this is the "dark side" of GraphQL? Isn't this
just not optimizing a query somehow or using a cache effectively? Is it really
the nature of GraphQL that's causing this to be slow or just programmer error
[1]?

I've used GraphQL in production services as an alternative to a rest endpoint
(which I didn't care for) and I don't think sheer the nature of the
validations ever caused that much slow down or rather, more plainly put that
GraphQLs design would not necessarily cause such poor performance on such a
small set of data.

/shrug I dunno, if this were me, I'd just assume I had written a bad query or
validation somewhere. And to be fair to the author, they only made a post on
twitter to reflect on their problem, not to say GQL has a dark side (at least
from what I read in the thread).

[1] We all have made programmer errors, are likely making some "now," and will
for sure make more in the future. No reason to feel bad about it, we're all
human :) Mistakes are just a part of life no matter how "good" at things we
are.

------
staticassertion
From the thread.

'Slow response times for large documents'

[https://github.com/graphql/graphql-
js/issues/723#issuecommen...](https://github.com/graphql/graphql-
js/issues/723#issuecomment-379090872)

It seems that the graphql library is performing a lot of validation, and
that's slowing things down. I expect validation to be a pure-compute task, and
this is Javascript, so I suspect this is really a "working with large amounts
of data in Javascript is slow" issue - but that's just at a glance.

------
m_ke
Sounds to me like an issue that comes with coupling of validation with
serialization. A lot of these API frameworks combine the two, with a the goal
of automating validation when receiving data from clients, but then also do
that validation when serializing response data, which should already be
validated if it's sitting in your database.

I've ran into similar issues with FastAPI and DRF when dealing with really
large payloads.

------
ex3ndr
Quite strange, GQL server source code is literally just walking by fields and
resolving promises, very simple and straightforward.

We had something like this in our backend, but this long times is usually
meant that something wastes event loop and just blocks everything from
execution.

It could be anything for example it could be async hooks that makes ~1000
times slower if you are using a lot of promises (since resolving fields often
are just promises) since overhead is per promise. In general in latest nodejs
you can do huge amount of promises and they have little to no overhead, but,
again - something wrong with nodejs setup, some library populate event loop or
something deeper in nodejs internals. It is not an issue with gql itself since
if you have gql performance issues that means that your server is super slow
in processing like anything. Our team was shocked by performance and it turns
out that NodeJS is super fast and it is some libraries (like sequelize) that
kills the performance, but gql is not one of them.

------
hamandcheese
I’ve seen similar issues in graphql-ruby. Even if I hardcode the data in my
resolvers, it takes hundreds to thousands of ms to render a list with some
moderate nesting.

------
mattbillenstein
I've built fairly complex GQL backends using CPython + Graphene and never seen
something like this - if we had slowness it was because we were yet to
implement dataloader in some places.

------
addityasingh
We faced similar issues at Zalando when trying to use Graphql at scale and to
mitigate this we built [https://github.com/zalando-incubator/graphql-
jit](https://github.com/zalando-incubator/graphql-jit). Try it for your
usecase and let us know how it affects the performance

------
gsvclass
Checkout Super Graph it's a GraphQL to SQL compiler and API service in GO. In
production mode it uses prepared statments so no compiling hence very latency.
[https://github.com/dosco/super-graph](https://github.com/dosco/super-graph)

------
dclowd9901
Forgive me if this sounds a bit “hindsight 20/20”, but I feel like performance
was always a lower consideration when it came to utilizing graphql. The win is
in reducing overhead around providing new endpoints.

Like react, it eschews performance for the sake of enterprise level scaling.
This shouldn’t come as a surprise to anyone, being both of these came from one
of the largest dev organizations in the world.

~~~
toomim
> performance was always a lower consideration when it came to utilizing
> graphql

That's strange, because I thought the main selling point was to consume only
the data you need. The client specifies exactly which fields it wants. Then it
doesn't over-fetch. To make things higher performance.

~~~
wolfgang42
GraphQL the protocol/language was designed for performance, but (when I tried
GraphQL, which was several years ago) the server-side implementations seem to
have had much less of a focus on it.

It's true that the _client_ doesn't over-fetch (and also doesn't need multiple
round-trips), but at least when I tried the gql-js library it required the
_server_ to over-fetch: it would ask for individual records, and then do the
field plucking/record joining itself; there was no way to intercept the query
along the way to find out which fields it needed so you could only fetch
those.

I get the impression that the server libraries were designed to work with a
document store or "fat" REST API that is only capable of taking a single ID
and returning the entire record. In this situation it makes sense to have a
separate middleware server to keep the big fetches and round-trips inside the
datacenter and only give the client exactly what it needs, and needing a
little more server power isn't a big deal. But, if you need to do something
more sophisticated (even something as simple as only fetching certain fields
from the datastore), they were no help whatsoever; when I was looking into it
there wasn't even a way to parse the query into an AST and do the rest of the
query planning yourself.

~~~
gavinray
Echoing this, GraphQL is just a specification, and it is up to library authors
how that spec is implemented.

I think there might be a disconnect or misunderstanding in the developer about
this. GraphQL is sort of like the Flux pattern for MVVM architecture. It isn't
so much a thing as an idea.

------
tuananh
without the actual code, this is as far as we can debug. can the author create
a small reproducible repo instead?

------
ggmartins
any chance to reproduce and post the HAR file? thanks

------
coding123
Hmmmm... if anything the performance of a graphql query should generally
outshine REST in nearly any category of performance. From the sound of things,
the performance issue doesn't make any sense. He's using Dataloader, and he is
certain it's not related to dataloader anyway. So maybe some dependency he's
using is the wrong version.

~~~
fastest963
Can you elaborate? I find it hard to believe that you can't build a REST API
that's faster than GraphQL given all of the bells and whistles that GraphQL
tacks on and that you could hand write the perfectly optimized REST endpoint.
What am I missing?

~~~
arthens
REST can definitely be faster than GraphQL, because as you mentioned GraphQL
is doing much more.

But as you start to re-use REST APIs across multiple pages/apps, you often end
up:

\- over-fetching: you'll receive some data that you don't actually need, but
that it's required by another page

\- over-querying: the data you are over-fetching might have required extra
queries (or even worse calls to an external system)

\- cascading requests: if you are working with nested data you might have to
call the server multiple times, often in a sequential manner

Also in my experience the performance of REST APIs tends to get worse over
time, because as many developers work over the same APIs it's almost
unavoidable to keep these APIs lean. Just before Christmas we had to spend
some time figuring out why a fairly simple GraphQL request was taking almost 2
seconds. Turns out a developer had accidentally introduced an n+1 in the
underlying REST API. The n+1 was on a field/relationship that we didn't
need/use, but with REST you don't usually get to pick what you want to load
so...

Solving these problems with REST is possible but not trivial, while GraphQL
mostly solves them out of the box. So while REST could be faster than GraphQL,
in my experience it usually ends up being slower.

~~~
bdcravens
I think this gets to the underlying criticisms. GraphQL is not the replacement
for REST and by nature an improvement - it is an alternative to REST based on
a certain set of pain points. If you’re a large org with multiple consumers
and your API development is silo’ed, then GraphQL may solve a set of problems
you have that a small homogeneous team may not.

~~~
arthens
Correct, but it's worth noting that GraphQL has much more to offer than just
request performance. I'd argue that performance improvements is the less
interesting part of GraphQL.

The main selling points for me are:

\- speed of development: once you have a complete graph, you can add new
pages/features at a fraction of the time, and often without touching the
backend at all

\- type safety: you can generate typescript/flow types for your queries,
giving you type safety from db to client (assuming your backend has types)

\- query co-location: you can have the query (or a fragment) inside or next
the component that uses it. Need a new field in a specific component? Just
update the fragment, any page that includes it will get it by automatically

and last but not least developer experience. Having worked for a few years
using graphql + apollo + react + typescript (on both a personal and a
reasonably famous large website) I can honestly say that it feels like living
in the future. As a mostly backend developer, I have never enjoyed working in
the frontend so much.

~~~
Ozzie_osman
Well said. That's why my current team started using graphql. It actually had
very little to do with performance or anything backend-related, and mostly
because of the same reasons (FE speed of development, type checking, etc).

