
Migrating to GraphQL: A Practical Assessment - eqcho4
https://arxiv.org/abs/1906.07535
======
desc
I can see absolutely no positive value whatsoever in GraphQL for internal
APIs. You're basically saying 'please, formulate any query you like, and I
will waste years of my life trying to optimise that general case'.

Seriously. For internal stuff you want to be _as specific as humanly
possible_. You want to optimise the living fuck out of that hot path thousand-
times-a-second query, and build an entirely separate service to handle that
particular URL if necessary.

For a public API, why would you ever encourage submission of arbitrary
queries? They will destroy your database servers.

Note that SQL vs. NoSQL is not even a competition here. An RDBMS can handle
_arbitrary_ queries much better than any brute-force map-reduce system, and
doesn't require spending thousands of man-hours writing your own query
planner. The difference is that it doesn't (always) automatically scale
horizontally.

So from where I'm sitting, GraphQL is nothing more than an invitation to
maintenance and performance headaches, predicated on the idea that everyone
scales infinitely horizontally and can brute-force every query.

Personally I prefer being able to serve a thousand queries a second from a
single server managing a 1TB database with a 50GB working set, with a latency
under a second even when (looking at the raw query) it 'should' touch more
rows than there are atoms in the universe.

[edit] In light of the replies, I should express some surprise that building
REST endpoints is expensive. If you can execute at runtime an automatic
combination of various other endpoints to produce a result, is it not equally
simple to generate the code necessary to do the same? That can at least be
examined and maintained more easily than dealing with the massive array of
possible varations to known-working queries which comes with modifying the
rules driving the GraphQL evaluator...?

~~~
jahewson
GraphQL was created at Facebook specifically for internal APIs. The motivator
was to avoid having to write a custom endpoint for every single REST query,
because otherwise REST is not efficient.

Note that GraphQL does _not_ allow the specification of arbitrary queries -
much like REST does not allow access to arbitrary resources - the sever
defines what queries are available and the user can choose to request a
selection of them, and to pluck data from them (or follow links - simply a
JOIN).

In other words, GraphQL lets you combine a subset of pre-defined queries in
one go.

~~~
desc
'Combine a subset of pre-defined queries': you mean run a bunch of queries and
perform the join locally? Like a poorly-designed app server 're-using' several
repository methods and doing joins in-memory?

I may be misunderstanding something here, but when 'general' queries are
combined externally a lot more work is done than is necessary. Which may be
fine for small intermediate sets. But treating it as any kind of general
solution is silly.

That said, people do similarly stupid things within individual codebases
running in individual services to a single database, so as usual it likely
comes down to how the tool is used rather than how it can be abused. Still,
spreading these things across services and processes looks like it only makes
abuse easier.

~~~
strken
Not sure I understand why in-memory joins are wrong by default, especially in
a large system with more than one data store.

Let's say you call UserService::batchGetUsers(userIds) to get a list of users
from a service backed by a MySQL DB, call
WidgetService::widgetsForUser(userId) to get a list of a user's widgets from a
separate service backed by a Redis cache and another MySQL DB, and return
them. What's the problem here? Lack of transactions? Unnecessary fields sent
down the wire? Something else?

~~~
emerongi
There is no problem, the same way GraphQL does not automatically allow
arbitrary queries.

The parent author holds some very stubborn beliefs about how systems are built
(his/her way is the correct way!), which is great for discussion, but probably
not the best example on how to actually build big systems.

------
rocmcd
I don't think there's any surprise that GraphQL reduces the size of the
responses compared to typical REST (excluding things like JSON:API). If that's
the measurement that's most important, then it seems GraphQL is the obvious
choice.

For myself, the 'practicality' of GraphQL would be more on complexity of the
implementation, training of engineers, potential re-implementation of client
logic, ease and depth of debugging, performance, etc. It seems these days that
the size of the response is not typically a limiting factor in most
applications I've interfaced with (though maybe I've never been exposed to
that world before).

Can anyone speak to how a migration from REST to a GraphQL went? My biggest
concern is around the complexity of the thing. It just seems so much more
complex than REST, but maybe I haven't spent enough time with it.

~~~
lwansbrough
We migrated from RESTful HTTP to GraphQL and then back to RESTful HTTP.
GraphQL is cool, but actually maintaining it is a nightmare and I think it
would be rare for the end product to turn out better. Here's a few of the cons
for us:

\- GraphQL parsing and interpretation is considerably slower than RESTful
JSON. I'm talking an order of magnitude difference in .NET Core.

\- The required POSTs cannot easily be (if at all?) cached by caching services
such as Cloudflare

\- It's more work to have to define what data you want than to just spit out
the data that's available. Having to continually update your client side
queries in order to fetch all the available data is tedious as hell. The whole
over/underfetching argument is not worth the incurred performance hit nor the
bandwidth improvements.

\- GraphQL promotes laziness about documentation because it's "self-
documenting" \-- turns out most API users still struggle to understand how it
works and need better API guides anyway, so the reflection is largely useless
(it's about as good as those auto-generated Java docs you find on Oracle's
website.)

\- Users get RESTful. They know it, it's not a toy, and it just works. I'll
repeat this again because Silicon Valley doesn't seem to get it: GraphQL is
not user friendly.

~~~
atombender
> The required POSTs cannot be easily (if at all?) cached

GraphQL allows queries to be performed with GET:

    
    
      http://example.com/graphql?query=query{user{id}}
    

You want to use POST for mutations, but read queries can be run through GET
and cached just like REST. There's literally no difference -- it's HTTP, after
all.

~~~
lwansbrough
There is a difference. When "everybody" has a unique query, there's no cache.

~~~
alexchamberlain
In practice, doesn't everyone run the same query in Production? If variables
are being used, they're the same degrees of freedom that rest had anyway,
right?

------
D3vMn9r
If anyone was around when SOAP was being pushed on the dev community as the
new way of structuring Web-like APIs, you may agree with me that SOAP was a
good idea, but too complex and slow to implement - with questionable immediate
/ long term benefits. I don't think SOAP is in real prod systems much anymore.
GraphQL, to me, inherits some issues of SOAP: complexity, non-mainstream
terminology ... So, if I develop just fine with REST, moving to GraphQL is an
added pain for me, and the dev community picks simple and straight-forward
stuff over complex - as the SOAP's history teaches us

~~~
smacktoward
SOAP, itself, wasn’t too bad. What was bad was all the enterprisey
architecture-astronaut cruft that people insisted on burying it under. (Kind
of the same thing that happened with Java EE.)

Beware if your favorite technology catches on in the BigCo world, because when
those people fall in love with something they end up hugging it so hard they
strangle it.

~~~
barrkel
That stuff buried it, but it was a dead horse already - object RPC is the
wrong mindset for internet RPC, doesn't treat latency (and thus async calls)
and errors as first-class concerns.

------
ragerino
The author of the paper forgets, that asking the API to only return specific
attributes of an object(tree) instead all of them, can also be achieved with
Rest.

e.g.
/api/v1/getUser?id=1234&return=name,zipCode,workplace.address,workplace.zipCode

Not sure if that has changed in the meanwhile, but when I tried GraphQL for
the first time it was obvious for me, that every service call has to be
implemented by hand. Depending on what functionality is required,
implementations of service calls can get quite complicated.

As someone coming from Java where we have ORM frameworks and tools like
Spring, I was surprised that Facebook didn't come up with something better.

But hey it's Facebook. A company where the CEO thinks it's cool maintaining a
huge codebase mostly written in PHP and C(++).

~~~
atombender
You'll end up inventing your own ad-hoc GraphQL that way. For example, GraphQL
supports this, trivially:

    
    
      query User {
        user(id: "123") {
          id
          name
          topPosts(limit: 10) { id, title }
          drafts: comments(published: false) { id, body }
          friends {
            id
            photo(size: "100x100") { url, width, height }
          }
          newestPosts(since: "3days", categories: ["news", "chat"]) {
            id, title
            forum { id, name }
            creator { id, photo(size: "100x100") { url, width, height } }
          }
        }
        latestNews(limit: 10) { title, url }
      }
    

All one query. Selecting a subset of attributes is a small part of GraphQL.
(gRPC also allows you filter fields, but doesn't have structured querying.)

GraphQL is a protocol and a schema language, not an ORM. It's a way to provide
a common interface on top of any implementation. For example, in the above
query, users could be in the backend's own database, whereas "friends" could
be something stored in a completely different backend. The GraphQL server
could trivially act as a façade that federated/aggregated the results of both.

I would argue that GraphQL fulfills the objectives (or at least desired
features) of REST much better than what has been delivered thus far. For
example, REST's hypermedia aspect ("HATEOAS") has not been widely adopted —
because discoverability is poor and there's no standard protocol for
introspecting a schema. But look around and you can easily find a dozen
different GraphQL clients (my personal favourite being Prisma's GraphQL
Playground [1]) that can be pointed at any GraphQL-compliant server to run
queries and browse the schema.

Those that criticize GraphQL generally don't seem to realize how simple it is,
and how close it's still to the REST style of consuming and producing JSON.

[1] [https://github.com/prisma/graphql-
playground](https://github.com/prisma/graphql-playground)

~~~
jayd16
I'd argue that the growing popularity of Swagger/OpenAPI is close enough to
HATEOAS as we want to get.

~~~
trumpeta
Swagger/OpenAPI is very fragmented, it builds on JSON Schemas by adding some
properties and removing others, then on top of that the AWS API Gateway again
removes some options and adds others. Its a total mess trying to navigate
which features I can and can not use.

------
bencevans
For others, esp. on mobile [https://www.arxiv-
vanity.com/papers/1906.07535/](https://www.arxiv-
vanity.com/papers/1906.07535/)

------
he0001
If I write an API that exposes the resources A,B and C. Now I use graphql to
access a subset of all of those resources, doesn’t this just mean that I have
written the wrong API? That I would ditch A,B and C and write a resource D
which just did that I wanted?

~~~
underwater
You can make an API that exposes exactly what a client needs, but then the
server/client are tightly coupled (and it's not RESTful). What happens when
you have multiple clients, and long lived releases with differing
requirements.

~~~
he0001
But that’s not what I mean. I mean that your original resources are not the
the actual ones you should expose. Hence you have designed your API wrong. And
I don’t agree that a _resource_ becomes more tightly coupled if you write it
right. It is a resource after all.

~~~
tracker1
Just because requirements change, doesn't automatically imply the original
design was bad.

~~~
he0001
But that’s the thing. Should you invest in an obsolete API or just change it?
If you don’t, you have both an obsolete API and a more complex solution
instead of just changing your API. And if you don’t change your API to adapt
to the newer requirements, but use Graphql to “bridge” that problem, you now
have a client that’s heavily invested in your old obsolete API with a
technology glued to it. To me, that sounds like a bad idea.

~~~
tracker1
In the case of GraphQL and in the most prolific examples (Netflix and
Facebook), you're not considering that the time to setup a GQL interface is
less in aggregate than modifying many, many APIs for data that is distributed
across varying data stores, often column/kv structures. These are also
organizations doing more than just public facing pieces. For example, Netflix
has to deal with art, media, and licensing that will vary by location,
language and even distribution models under different contract terms for a
different region. That's only one aspect of how things work.

In the end, an API written for one department may not match the needs of
another, and another still may need additional related aggregate data. In the
end, would you rather define your data sources once, or communicate API needs
across several departments and maintain 3x or more the surface area to support
them?

edit: I'm not saying GraphQL is appropriate for _all_ scenarios... but I'm
saying it is definitely better for many.

~~~
he0001
But graphql cannot do more than the API already does? It’s constrained to the
capabilities of the API. Such as if there’s no way of updating a field,
graphql won’t do that either? So it can only make over-fetching more
efficient? And by what you are saying, it’s more complicated to create a new
resource than incorporating several endpoints with graphql, I find it hard to
believe.

~~~
tracker1
Imagine the scenario, where you're a developer on one team and need to consume
data owned/orchestrated by another team. _YOU_ don't have access to their data
source... you either get a custom API that will be updated when they're able
to, their priorities are different from yours and under a different internal
organization. With GQL, as long as the source data is defined, you get as much
or as little as your queries define. With API only, you're stuck explaining to
your boss why you have to wait for another team in another org to let you get
to it, because they have other priorities.

~~~
he0001
But Graphql doesn’t enable more functionality than the API already gives you.
You don’t get more access to data just because you use graphql? And a team
that gives direct access to their data is in very much deep trouble already. I
really don’t get what you are getting with this.

~~~
pm90
> You don’t get more access to data just because you use graphql?

The data you expose via apis is often one “view” of the actual rich data store
you own. If a dependent team wants more data than your rest api provides, it
is blocked on you adding that data to your api.

If you instead provide a way to query your data store, the dependent team
isn’t blocked on you.

~~~
he0001
>If you instead provide a way to query your data store, the dependent team
isn’t blocked on you.

This means that you bypass the API and give direct access to the data? Why
have an API at all then?

Don’t you think that this mean you tightly couple the data storage with the
client?

~~~
pm90
> This means that you bypass the API and give direct access to the data? Why
> have an API at all then?

False dichotomy.

You’re not bypassing the api. You’re exposing a richer api. Use the right
terms.

The purpose of exposing apis is not to limit the attributes of exposable data.
Eg your definition of an object that the rest api represents May evolve over
time while the api does not since they may not be in sync in the code. With
graphql this is handled automatically.

Of course it’s tightly coupling data storage, that is the point! The problem
with using rest apis when doing this is that you can’t then change your api
without breaking dependencies. Graphql recognizes that in a rapidly evolving
data model, trying to model a permanence is futile. Instead it gives you and
your clients tools to a) reduce expansive dependencies by querying only for
data you need and b) identifying which dependencies are used and how
frequently allowing you to deprecate attributes safely.

This is not a tool meant for enterprises abusing rest apis as “contracts”,
where every change needs to be communicated in advance yadda yadda. It’s
targeted towards consumers who are ok with and actually _want_ a richer api in
lieu of the stability provided by versioned apis. Good candidates for this use
case include internal teams and external, non contractual apis.

------
je42
To GraphQL there is also the opposite: Just move as much stuff as possible to
the backend - make the frontend thin.

With that approach you also don't end up writing a lot of specialized REST
endpoints for the frontend. You can just use the queries directly and output
for example HTML (or json and render that into HTML on the client). You can
also do a lot more queries at once, because the services backend have usually
more bandwidth and less connection limits than a frontend running inside a
browser.

You are full control of the queries. Only UI related data is exposed to the
frontend.

One draw back is the frontend team wouldn't do any business stuff anymore.
They would do more generic components that the backend can drive.

And depending on the implementation you need more server round trips.

On the other hand, there is less JS to load and execute by the browser, since
the backend is doing most of the orchestration and rendering.

~~~
tracker1
Servers are expensive, and if you do it all on the server, you spend more
scaling. Clients connecting have their own compute resources that are
effectively free in terms of growth of a system as a whole.

GQL allows for the client UI/UX to describe what it needs, and the backend
delivers that, within the context of well defined data structures that can
then be distributed.

There's a cost to setting this up, but in the case of Facebook or Netflix,
absolutely worth taking on. For a few hundred users, not so much.

~~~
je42
However, Frontends are also expensive when you look on other metrics. They
consume battery for example on a mobile device.

> GQL allows for the client UI/UX to describe what it needs, and the backend
> delivers that, within the context of well defined data structures that can
> then be distributed.

In the other model, it is done in the same way, except it is being executed in
the backend and not the frontend.

Both the frontend and the backend are general computing platforms, but they
have different characteristics in terms of performanc, connectivity, control
etc.

Most UI/UX requirements are not that specific that only a frontend heavy
implementation will be adequate to fulfil the requirements.

Please keep in mind that back in the old days, when we didn't have JS in the
browser, all html was rendered on the server and a thin client was rendering
it.

And even in right now, sometimes there are architectures popping up that try
to make the client thin and the push the majority of the computation to the
backend: Like for example Google's Stadia.

~~~
tracker1
I'm sorry, but the battery example just doesn't fly... the biggest battery
usage is the screen, if the user is looking at their phone, they're actively
using it.

Back in what old day? I've been at this since 1995, and have been using JS
pretty proactively since 1998 or so, it's been 20+ years now. Computers are a
considerably more powerful and power efficient compared to 1998.

If you want to be able to use the internet on an 80386 without a math co...
have fun with that. I'll take modern tooling, with modern hardware.

~~~
je42
Generating UX/UI in the backend doesn't imply to use old tooling.

React server side rendering, Vue's Nuxt, etc are all modern ways to render
UX/UI in the backend for example. And there are more.

The discussion of rendering server side goes deeper than just performance or
using a "modern" framework.

There is for example: how many teams can work on a single application at the
same time and deploy independently features in an efficient manner.

How can we share business processing flow requirements between the web version
of an application with the native mobile application counter parts in an
efficient way, without implementing everything three times.

Or other aspects mentioned here, like predictable performance etc.

Or choice of language, in the backend your choice of languages to choose is
greater and/or simpler than in the frontend.

You can add more monitoring in the backend about aspects of your application
than you can do in the frontend.

Obviously, the frontend has advantages as well. I.e. if something needs to be
calculated, animated it is almost instantly available to the user when done in
the frontend.

Frontend can perform functions offline, can use device specific features: like
camera, contact list, push notifications etc.

Thus I prefer to look at the requirements and pick the tools for the job that
best fulfils all the requirements for a giving application in a given
organization and strategy.

If that means i do more stuff in the frontend, sure fine. But I wouldn't go
for GraphQL just because it is "modern". Being "modern" is not solid reason to
choose a technology unless you want to attract developers that like "modern"
technologies. Which could be a valid reason as well for some organizations.

~~~
tracker1
And you don't think server rendering has a cost? Running a very rich UI client
side allows for tens of thousands of users per server, your suggestion reduces
that to dozens or a couple hundred. Meaning many times more servers... more
servers to configure, support, orchestrate, pay for. Meanwhile the clients are
idle resources that could have been used.

If _you_ are the one paying for the servers, and you want to pay for 10x+ the
servers to support your app, feel free.

------
np_tedious
How much bigger did the request bodies become?

