
GraphQL: A Retrospective - lchsk
https://verve.co/engineering/graphql-a-retrospective/
======
meritt
While this post is supposed to be about GraphQL, it seems to focus more about
the benefits of having a centralized request broker / gateway for your APIs
and nothing unique to GraphQL.

Their rule of "AVOID NESTING OBJECTS, JUST RETURN IDS OF RELATED OBJECTS"
definitely is missing out one of the core benefits of GraphQL: being able to
fetch related resources in a single request. Feels to me like they decided to
switch to GraphQL cause it's supposedly better but are still utilizing it
exactly like you would a REST API.

~~~
joekur
My understanding is that they are keeping their REST endpoints flat, however
their gateway graphql layer can still respond with nested resources - it would
require aggregating multiple REST requests together.

~~~
andrewingram
Right, this is the approach I've always advocated. I don't mention it
specifically in this post I wrote a while ago about optimising GraphQL, but my
implicit assumption was that you're building your GraphQL server on top of an
underlying platform:

[https://blog.apollographql.com/optimizing-your-graphql-
reque...](https://blog.apollographql.com/optimizing-your-graphql-request-
waterfalls-7c3f3360b051)

In real-world UIs, I've found that queries rarely end up being more than a few
levels deep and are relatively easily optimised as long as your internal APIs
can handle batches (easy for entities, harder for pagination). Additionally,
even though the only-return-IDs-for-relations pattern means you can't utilise
joins effectively, the upside is that you end up with much simpler database
queries that area easier to optimise at scale. My rule of thumb was that as
long as the query representing an entire screen could typically return in sub
100ms in production, it was acceptable (this was without any caching at the
GraphQL level, which I had planned but left the company before I could
implement it).

------
ergothus
I haven't yet had much luck in finding a GraphQL review that addresses my
general concerns (I'm frontend).

1) GraphQL almost always means everything is POST, removing CDN and browser
caching of GET-like requests is gone (and ServiceWorker caching just got much
more complicated, nigh-impossible if CORS is involved). Everyone says "oh,
clients can do better caching", as if that's not true without GraphQL. Still,
the caching I mention might be trivial and mostly worthless. I'd just like to
see some actual inspection of the issue.

2) The models I've seen work well if your frontend is largely a thin skin over
the services with minimal business logic of their own. (this isn't GraphQL
directly, but the client libs that use it, but those exist because talking
GraphQL without them is more effort). Which is, of course, what we really
want. Business logic in the front end is always a painful idea. But it also
definitely happens, for real business reasons - are we making those cases
harder? How much so? With REST we have a lot more flexibility, it seems, even
if we choose to avoid using most of it.

~~~
013a
The caching story in GraphQL sucks. Its pretty good with frontend frameworks
like Apollo, but that's not a solution to the whole problem. Sure it works for
your website/app (if you've got one), but it does nothing to offer caching for
a generic API.

Its easy to argue that the main advantage of GraphQL is to reduce overfetching
fields on the data you need for your views. That's a great advantage. But how
much of the performance advantage gained from this is offset by the
substantially reduced backend cacheability of these requests? I would guess a
ton, especially with highly complex views that require lots of database pulls.

That isn't to say simple caching strategies aren't still possible (you can
encode GraphQL requests into GETs and just cache that url at the CDN layer,
this is part of the spec AFAIK). But when you have an open API serving many
users, where you can't predict what fields they're going to ask for (or even
the ORDERING of those fields in their request, which would change the request
body despite the response being the same!), this has to be a problem that
crops up pretty quick.

There's no HTTP-level solution to this. I doubt there's any solution that
would work well enough to be worth implementing. Which leads me to believe
that its an intrinsic problem in GraphQL; the more freedom you give clients to
request whatever they want, the harder it becomes to guarantee performance for
the requests they're making. And GraphQL gives clients all the freedom in the
world.

Oh, and don't even get me started about the fact that because GraphQL stitches
together essentially depth-unlimited data from your data graph in one request,
there's no way to express different TTLs on each item returned, on the backend
_or_ the frontend. If you've got data that could TTL for 24 hours, but another
piece of data that TTLs for 60 seconds, you essentially have to specify the
cache-control to account for the smallest TTL.

Overall, I think GraphQL is fine. But I also believe what we'll slowly
discover is that "Company X" simply fucked up their REST API, then will look
to GraphQL to solve all their problems. And it might solve some of them, but
then they'll have an even more complex system in place with even harder
problems. Facebook can solve those problems; us small shops can't. Better hope
Facebook shares their solutions with the world.

~~~
sethherr
Case in point, from the article:

> Any API change had to be deployed simultaneously to all services using that
> API to avoid downtime, which often went wrong and resulted in long release
> cycles.

There’s an easy solution to that problem, which is a versioned API.

~~~
np_tedious
Is it easy? Multiple api versions can be pretty annoying / tedious

------
AaronFriel
GraphQL is a curious beast. Developed by Facebook as part of one of its many
attempts to solve a fundamental problem for engineers: how to loosely couple
the frontend and backend, while maximizing query performance and allowing
flexibility in the frontend.

GraphQL does this, yes, but it's not particularly _smart_ about how caching
works or how to avoid the Select N+1 problem. Their solution* is the blunt
hammer that is Facebook's dataloader project which is basically: aggressively
cache data model, pretend databases and joins and SQL doesn't exist, throw
away any hope for ACID/consistency. Dataloader for example exposes all sorts
of new and exciting types of inconsistency. This is hand-waved away because, I
guess, consistency is boring and user expectations are low or irrelevant. (A
comment with a missing edge to a post is invisible, a post with a missing edge
to a comment has 0 comments. It'll all work itself out in the end.)

Curiously, Facebook went a long way down the road to fixing _this exact
problem_ on the backend with a library called Haxl, written in Haskell. Haxl
allows expressing relations between multiple data stores in a way that _looks_
like using an ORM, but under the hood creates a query and obviates the Select
N+1 problem: a function which appears to select a post and for each comment
retrieve an edge to the person who posted it will perform a single SELECT
against the database, maintaining consistency with that store. There's no
fundamental reason that couldn't be written in most dynamically typed
languages or ORMs (though Haskell provides some really nice type level
guarantees).

What's bizarre to me is that the former took off, and the latter is largely
unknown outside the Haskell community.

* - Other ORMs have recognized this, and there are efforts underway for GraphQL backends in Python (Graphene) and Ruby, at least, to solve this.

~~~
ctulek
I didn’t understand why you have to use caching with DataLoader. I am also not
sure about what you mean by inconsistency. The n + 1 query problem is usually
solved by making two consecutive queries to the database. One for the root
element and one for the total list of all edges, thanks to DataLoader. What is
the problem with that?

~~~
AaronFriel
Because you are now relying on consistency to be orchestrated in both the
database and the dataloader instance(s).

1\. If the dataloader isn't the sole service with database connections, the
cache will be invalid when other services interact with the database.

2\. Even if the dataloader is the sole mechanism for accessing the database,
you have to figure out how to scale that to multiple nodes and maintain cache
coherency on each.

3\. Even if you run just a single dataloader instance or figure out how to
ensure cache coherency, that layer is still oblivious to triggers on the
database and so you had better not use any advanced functionality there.

4\. Even if you strip away all of the low level SQL features and treat the
database as a dumb searchable KVS with a single dataloader instance (or cache
coherent layer in front of it), then you are still performing two queries and
mutations which occur in parallel with queries can result in non-repeatable
reads or phantom reads because the default in many GraphQL packages is that
each query processed with sub-queries runs with no transaction wrapped around
it.

5\. Even if you ensure that every GraphQL gets a unique transaction, that
doesn't mean the DataLoader cache is _coherent_ with the database
transactions, and I haven't seen any papers or effort to verify that, so
there's no guarantee parallelization can't result in dirty reads.

6\. Okay, so you have a single threaded, single instance dataloader instance
with a mutex around a database connection that runs every GraphQL query's
subqueries in a transaction...

This is all fine if you're dealing with, well, comments and posts or other
trivium in which consistency isn't an issue. Which actually happens to be the
type of problems many large successful companies have to deal with.

But if you are dealing with financial data, medical data, scheduling of
resources, anything where the equivalent paradigm of "my friend posted but I
don't see it yet, therefore I can't comment on her post" or "my post loaded
but I don't see my friend's comment on it yet" is an issue makes it a
minefield for consistency.

~~~
ctulek
I still don't understand what you expect DataLoader or GraphQL to solve for
you.

The list of problems you mentioned are not what DataLoader/GraphQL are trying
to solve. I am even not sure if there is an individual library that can solve
these problems. The solution to them are at architectural level and requires
more discussions than the decision to use GraphQL/DataLoader or not.

------
ag56
Sounds like they implemented an ESB. What’s old is new.

[https://en.m.wikipedia.org/wiki/Enterprise_service_bus](https://en.m.wikipedia.org/wiki/Enterprise_service_bus)

~~~
LaGrange
To be fair, I'll take a GraphQL server over WebMethods.

------
danr4
Off-topic if employees of the company are here: Friendly suggestion to work on
the home page. I couldn't for the life of me understand what it is that
company is about. The headline says NOTHING, and the paragraph after it is so
confusing.

~~~
StevePerkins
[http://verve.co/our-products/](http://verve.co/our-products/)

> _VERVE EVENTS is the global market leader in word-of-mouth sales in the live
> entertainment industry. We use networks of advocates to sell products and
> experiences to their friends in exchange for rewards such as free tickets
> and backstage passes._

...

> _POLLEN is a community of influential young people who are passionate about
> sharing the best events. We handpick Members and, through our tools and
> support, make it easy for them to share their passion._

...

This _sounds_ like a hip, trendy veneer atop the concepts of affiliate sales,
and offshore farms of fake review writers. I suspect that their landing page
is vague by design.

------
fermigier
If you need some additional insights on how to integrate GraphQL and Python,
Patrick Arminio from the Verve team gave a talk at PyParis this month:

\- Slides:
[http://pyparis.org/static/slides/Patrick%20Arminio-1cba4f64....](http://pyparis.org/static/slides/Patrick%20Arminio-1cba4f64.pdf)

\- Vidéo:
[https://www.youtube.com/watch?v=IA1TuKfVTlg&feature=youtu.be](https://www.youtube.com/watch?v=IA1TuKfVTlg&feature=youtu.be)

------
tmitchel2
This article is really hard to understand. Is the gateway just for the UI as
per normal or is it for all the backend services also?! The diagram doesn't
help much because it looks like the backend services can call the front-end
services. I don't quite get that a gateway can be on each backend service too,
is that just to save a single network hop at the expense of every service
taking a dependency on the gateway and thus every service..

------
Udik
Whenever I see descriptions of adopted solutions like this one, I always feel
that someone is rushing towards a predetermined solution without having really
checked the available options. You have a starting point and a set of issues.
What are the minimal changes you can adopt to solve your problems, before
deciding to use a new concept/ framework/ pre-packaged solution? In this
specific case, two issues seem to stand out: the lack of a gateway, and the
issue of breaking api changes. Both seem to be easily solvable without the
need to jump to a completely different model. Maybe it was really the best
choice, but I would be more convinced if the complete set of problems with the
old architecture was clearly laid out and the possible options to solve them
one by one had been clearly analyzed.

------
andrewingram
One point of clarification (or correction, not quite sure) - using GraphQL as
an API Gateway (as opposed to the GraphQL server directly talking to the data
layer) for 1st-party clients is actually pretty common in the GraphQL world.

What's less common is using the GraphQL server for service-to-service
communication, though i've been aware of people using it this way for my
entire 3 years with GraphQL. I'm not yet convinced it's superior to
alternative solutions to this problem (like gRPC, Thrift, other API Gateway
patterns like JSON-API), but it could well be. I'm still happy using GraphQL
in its sweet spot (and not-coincidentally what it was designed for), which is
building an API for 1st-party websites and apps.

~~~
dobs
One service-to-service case where I've found GraphQL extremely useful is
report generation.

For example, in one case we had around a dozen services with fairly typical
REST APIs, a few of which we wanted to pull large inter-related sets of
information from. Using GraphQL allowed us to:

\- Have a single large-but-human-readable query to retrieve all report data.

\- Analyze deep nesting up front to determine opportunities for caching and
eager loading.

\- Abstract details regarding what data was coming from what service, plus
handle any quirks (e.g. inconsistent auth strategies) at the GraphQL layer.

Without having to make many modifications to the underlying services. We also
saw a two-order-of-magnitude performance improvement that would otherwise have
required building out a lot of service-specific awareness into our reporting
service.

Granted reporting is a unique case, but the simplified gateway layer and
potential for query analysis are also major advantages as the number of API
consumers grows.

------
revskill
The best part about Graphql is that it's a query language for API. It was my
dream years ago to have a query language from my frontend code. Using REST
always seems a smell to me.

~~~
donatj
It's not _really_ a query language. Certainly not a general purpose query
language. It's more of a __somewhat flexible RPC __, honestly.

It smells like a query language from a distance, but when you get close it
smells much more of SOAP.

For instance say I'm receiving a list of widgets, but I only need the red
ones. Unless the API developers explicitly foresaw the need to include a color
filter, I can't filter that on their side. I have to get _all colors_ of
Widgets and filter it myself. The amount of unessessary data then can really
multiply when you're getting the children and children's children of those
widgets.

Having ported a number of APIs connectors from REST to GraphQL, I can say it
has certainly greatly reduced the number of requests I've needed to make, but
has often also greatly increased the amount of actual bytes I've received,
particularly the bytes I don't need.

~~~
wnevets
>It smells like a query language from a distance, but when you get close it
smells much more of SOAP.

This smell is something I just can't move past when it comes to graphql.

