Hacker News new | past | comments | ask | show | jobs | submit login
Migrating to GraphQL: A Practical Assessment (arxiv.org)
127 points by eqcho4 14 days ago | hide | past | web | favorite | 101 comments



I can see absolutely no positive value whatsoever in GraphQL for internal APIs. You're basically saying 'please, formulate any query you like, and I will waste years of my life trying to optimise that general case'.

Seriously. For internal stuff you want to be as specific as humanly possible. You want to optimise the living fuck out of that hot path thousand-times-a-second query, and build an entirely separate service to handle that particular URL if necessary.

For a public API, why would you ever encourage submission of arbitrary queries? They will destroy your database servers.

Note that SQL vs. NoSQL is not even a competition here. An RDBMS can handle arbitrary queries much better than any brute-force map-reduce system, and doesn't require spending thousands of man-hours writing your own query planner. The difference is that it doesn't (always) automatically scale horizontally.

So from where I'm sitting, GraphQL is nothing more than an invitation to maintenance and performance headaches, predicated on the idea that everyone scales infinitely horizontally and can brute-force every query.

Personally I prefer being able to serve a thousand queries a second from a single server managing a 1TB database with a 50GB working set, with a latency under a second even when (looking at the raw query) it 'should' touch more rows than there are atoms in the universe.

[edit] In light of the replies, I should express some surprise that building REST endpoints is expensive. If you can execute at runtime an automatic combination of various other endpoints to produce a result, is it not equally simple to generate the code necessary to do the same? That can at least be examined and maintained more easily than dealing with the massive array of possible varations to known-working queries which comes with modifying the rules driving the GraphQL evaluator...?


GraphQL was created at Facebook specifically for internal APIs. The motivator was to avoid having to write a custom endpoint for every single REST query, because otherwise REST is not efficient.

Note that GraphQL does not allow the specification of arbitrary queries - much like REST does not allow access to arbitrary resources - the sever defines what queries are available and the user can choose to request a selection of them, and to pluck data from them (or follow links - simply a JOIN).

In other words, GraphQL lets you combine a subset of pre-defined queries in one go.


> GraphQL was created at Facebook specifically for internal APIs.

AFAIK Facebook created GraphQL specifically for their gateway API - a service used as a facade between internal service mes(s/h) and their client - not for internal ones themselves. That's why things like schema stitching didn't came from FB - they weren't using it in that context.


This is my understanding also.

'Combine a subset of pre-defined queries': you mean run a bunch of queries and perform the join locally? Like a poorly-designed app server 're-using' several repository methods and doing joins in-memory?

I may be misunderstanding something here, but when 'general' queries are combined externally a lot more work is done than is necessary. Which may be fine for small intermediate sets. But treating it as any kind of general solution is silly.

That said, people do similarly stupid things within individual codebases running in individual services to a single database, so as usual it likely comes down to how the tool is used rather than how it can be abused. Still, spreading these things across services and processes looks like it only makes abuse easier.


nope, it's meant for micro-service architecture where you have a lot of different api end-points spread across different servers, databases, technologies, shards, etc. So you need to get user profile from one side, and a list of news from the other, and then cross-reference that with a list of friends from the 3rd source, and then you need to combine all that together into something usable in the client. GraphQL is a way to define how to merge those datasets, without manually coding each step of taking field X from feed A and list Y from feed B and indexing them by Z. Beside obviously solving problems for huge players like FB, it also helps with common situations when UI design is being constantly changed along the way and all of the sudden front-end team needs APIs to return shitload of new relations that were not in initial specs. In most startups that I've worked on APIs 99% of my time was reworking queries to include more data because designers and UX people had changed their mind. GraphQL makes this fairly painless as long as you have a well defined set of basic APIs. Then later when the design gets more stable you can locate the bottlenecks and rework them into more complex, but faster API calls.


Not sure I understand why in-memory joins are wrong by default, especially in a large system with more than one data store.

Let's say you call UserService::batchGetUsers(userIds) to get a list of users from a service backed by a MySQL DB, call WidgetService::widgetsForUser(userId) to get a list of a user's widgets from a separate service backed by a Redis cache and another MySQL DB, and return them. What's the problem here? Lack of transactions? Unnecessary fields sent down the wire? Something else?


There is no problem, the same way GraphQL does not automatically allow arbitrary queries.

The parent author holds some very stubborn beliefs about how systems are built (his/her way is the correct way!), which is great for discussion, but probably not the best example on how to actually build big systems.


> the sever defines what queries are available

>avoid having to write a custom endpoint for every single REST query

So what are you saving, really? Could you better implement GraphQL type functionality as a client side wrapper for traditional REST apis? Then you could keep the traditional tooling as well as simple query join semantics for client devs.


Oh really? Because I’ve yet to see anyone explain how joins can be accomplished with GraphQL.

We're building a GraphQL internal/external API. Our main app uses the same API that clients can hit directly. Our backend dev team spends a large fraction of our time building custom REST API endpoints. With GraphQL a lot of that work is being shifted to front-end devs and to previously non-dev account executives, and their clients directly. A project to create a report that used to involve many organization layers and take weeks, can now often be done on the phone by an account exec without a dev ever hearing about it. Clients have deeper and easier access to their data than our competitors yet offer.

Of course we're very concerned about the performance drawbacks and are trying to plan for them. We don't expect it to be web scale and aren't replacing our REST APIs that serve the public. Even so we expect to need to be smart about calculating query complexity.

So while most of the criticisms of GraphQL on this page do apply, for us the net value is looking quite positive.


I just don't get it. You still have to have provided the data for the client devs to query. Does opening up the full schema model make things easier to implement? Aren't you just kicking the can of api design down the road to the client devs causing thicker clients full of their own business logic?

I think you’re failing to understand how companies tend to work. The front end teams and backend teams are usually different and want to work on different things. But for the front end team to build an app with the data they need, they need the backend team to change the rest api to provide different data. That kind of change is usually not very interesting for backend teams to do and they stall, while the frontend team is waiting By the seat of their pants for days for a change which they feel should be pretty damn small.

One easy way to fix this is by using graphql instead of REST. Now the frontend teams have access to somehow get the data they need and are thus unblocked.


Fair enough, it certainly seems to offer benefits re. flexibility and rapid development.

I'm just too used to being the person who then has to make the server deal with this particular use case (meaning a million possible variations on the same query) run a thousand times faster :P

So if it's understood at all stakeholder levels that there are tradeoffs, it could be a good tool. If. >_>


I’ve always thought the perceived rapidness is because you are now able put code in the front end which really should be existing in the back end and while at the beginning it’s faster development, you just delay the inevitable problems of spaghetti APIs.

I'm very surprised by this. I have built REST in the past and recently I built a large GraphQL service. I found GraphQL to be a lot less confusing both for front-end and for back end. The tooling is amazing, standardized schema introspection is great.

Sure, if you try to shoehorn an API that does not fit the database, it can be challenging, but you can always start with the simple case, doing multiple queries to the DB for one GraphQL request and only later see what is actually used and could benefit from optimizing.

You have more opportunities to optimize. With REST, if a client needs to list some resource and then access all of them one by one, it is going to be n+1 requests and you can't do anything on the back end to change that. With GraphQL you can look at the query holistically and optimize as needed.

All in all it feels to me that GraphQL gives the client the ability to better communicate the intent of what they are trying to do. Declarative over imperative.


GraphQL is bad for public APIs. It is good for precisely one thing: when the server and the client are controlled by the same entity, making updates to clients without having to add new internal APIs to the server.

But this one thing is so useful for almost everyone that for internal APIs using GraphQL is usually a no-brainer.

You actually don't want to be as performant as possible for internal APIs. There is a performance-flexibility trade-off involved and GraphQL lets you choose a different point on the Pareto frontier than maximum performance.


This was the most useful comment, should be at the top.

As far as I can tell, I agree.

An API is like a promise.

In the best case, people decide to use your API and build on it. Then, if for whatever reason and in whatever way, you break your API, you force everyone affected to reimplement at least to some degree.

With GraphQL you’ll be promising the sun the moon and the stars if you aren’t very careful. (Even if you are very careful you’re still promising a lot, though hopefully the available tools will help you out.)

With GraphQL you put yourself in a very tough position of either keeping big promises or breaking big promises.

Most of the time you are going to be better off keeping small promises.


For internal APIS, one essential value Graphql provides is significantly reduce the number of read APIs. When using REST or RPC the number of APIs explodes, because RPC is too specific, while REST is not flexible, so people just keep rolling their own new but similar APIs.

For front-end/public API parsing a query to a querable graph enables adhoc-ness and co-location which removes a lot of manually written controllers with much fewer resolvers, which also means removes huge amount of accidental complexities.


How can graphql reduce the number of reads? If you are reading stuff you don’t need why are you making those calls? And if there’s no API for for you to get the data how would Graphql know about it?

It's mainly about reducing the number of read APIs. It can also reduce the number of reads through the dataloader.

There are 10 different pages you need to read the 'Article' resource, in the index page you need the title and the summary, in details and edit pages you need the content, in the details page you also need comments.

The thing is, all of the pages read 'Article', they need it, but they only care about only some parts of it, some reads also require extra and possibly cascaded joins with other resources.

In most medium-sized web apps people will tend to write a bunch of different APIs to fulfill these needs, it's grunt works and hardly consistent because they're duplicating things. Or even worse, they will invent their own queries, a half-baked GraphQL everywhere, which is inconsistent and sometimes buggy.

Our application has est. 3000 internal APIs as of last year, it's becoming harder and harder to even find which API to use because there are so many nuances. Almost half are doing different kinds of reads, say if we can make everything in a queryable graph (ironically, almost all businesses are graphs, yet only very few people explicitly treat them as graphs), the APIs will be much lesser and easier to understand.


Again, I don’t see how it can reduce read APIs. If you haven’t read the data you can’t change the data.

Also I fail to see how graphql can simplify an API, the graphql operates on the actual APIs so if you don’t understand those, you’ll probably not understand the graphql version of it too.


You can query the schema and get only the data you need. Eg a rest endpoint that returns a list of all users with all of their info. With Graphql, if you only want the users name, you can ask for that and graphql will return only the names of all users, reducing the payload by a lot. Or use the schema to get only the data you need perhaps eliminating the need to fetch all user’s names in the first place.

But that’s not reducing the complexity of the API, just the payload. If you remove the “load all users with everything in it” API the backend still needs to load all users and the filter it to reduce the payload. If the backend doesn’t have that capability graphql cannot ask for that functionality, so you still rely on the team to have implemented it.

Don’t move the goddamn goalposts. The responses are to the specific question of reducing read api payloads which you have admitted is what it does. It’s not to reduce the complexity of the api or whatever new arbitrary goal you invent next.

If you need to query for a nested relationship, that's multiple REST requests, such as "all posts by a user" and "replies to a given post". You could write up a single REST endpoint to support that, but now you're making a customized API endpoint to optimize for your current frontend needs. If you decide to display more or less, that API needs to be changed too.

With that said, I'm not a big fan of GraphQL, but this particular trade-off seems like a win over REST.


Are you saying that graphql bunch those multiple requests into a single, actual, not perceived, request?

In front-end application's perspective, yes, it's a single request.

And in the back-end, after the query was parsed it would be split to a bunch of requests depends what's your data source.

If the data source is in-memory then good everything's done.

If the data source is an RDBMS it would result in an n+1 query. However, people would always use data loader with GraphQL, which batches the query and converts the n+1 query to two queries. REST APIs usually only care about single kind of resource per API, so if there are cascaded joins (eg: get the user and its posts and comments and comments' comments), there would be less overall requests in you are thinking in amortization.

If the data source is another REST API, if there's batch APIs for resources, the similar approach as RDBMS would also be taken.


Yes, that's the main sell of GraphQL I believe, the "graph" of the name. Your clients could theoretically make one API request for all their data needs by specifying the relationships the care about.

E.g. I want all of my posts and their respective comments and the users making those comments. You can narrow things down a lot by specifying fields: only give me the name and avatars of the commenters. With REST I've done this in the past with N API calls: give me my posts, iterate over the IDs of those posts to make N API calls for their comments, and another N calls for the comment user data I need.

At that point, a custom API endpoint could trim down the network calls a lot, and this is where GraphQL shines in comparison. You wrote a resolver for a user, comment and post in the backend, and GraphQL server frameworks can piece those together for you.


So the main sell point is to reduce traffic and let the backend sort out the n+1 and caching problems. I can see a bunch of problems with that, like required fields etc, which also must be fulfilled by the client and would be very fragmented as an error reply. Although in a microservice environment where a single or a few resources belongs to a single microservice, the client still needs to make several individual requests.

Also by bunching APIs together means that if you split an API between two clusters all those graphql requests are now broken. But you don’t really know that because it’s hard to know what queries a client have created.

Also if someone creates a query that triggers a path where the n+1 problem absolutely wreak havoc with the backing data storage you could have serious problems with that. By fixing an API you get predictability. There’s also the mismatching between data versions and on what version or view you are operating on.


I don't really understand it either. I thought the whole purpose of having an application layer on top of your database was to provide a specialized interface for your application. If you're just going to provide a general query language to the frontend what is the point of the application layer? It seems like you'll just end up writing a database on top of your database. If you are writing a general purpose interface now, why are you writing custom code? Couldn't it all be abstracted out into an RDBMS? Would an RDBMS with user login and permissions system meant to be accessed by the end user/frontend be better than everyone writing their own weird database wrapper? It all seems very strange to me.


GQL makes it convenient for the front-end to assemble a collection of queries needed to render the page in a single request, and then go about its rendering business upon response. The React-Apollo client I use also features an in-memory cache that can serve as a state store, often entirely obviating the need for libs like redux.


In the reference implementations of GQL, that “one query” from the client balloons out in to possible hundreds of queries out the back side of GQL.

Things like dataloader exist but the behavior of your schema gets harder and harder to reason about when different caching mechanisms get thrown on top.

I think everyone agrees that the client facing api for GQL is fantastic... maybe that means graphdbs are the next wave :P


For distributed data stores, with collectives of data across different databases/clusters it starts to make more sense.

If you're using a column store structure for most data, you're mainly doing individual lookups based on a single key, graph data in another key/keys, and related keys looked up separately. Each optimized for the single record(s).

Especially since distributed/collectively this data will be accessed faster than a single rdbms would be able to manage.

That said, if your application can/does use a single sql datastore, and you don't need that much scale, the effort to setup/configure GQL may not be worth it in a given instance.


To me, on the back-end, GraphQL means replacing controllers with query (GET) and mutation (POST/PUT/DELETE) resolvers, with very similar concerns, routing takes place in the GQL schema, and everything else stays the same.

You're optimising for the wrong things. An efficient server that vends the wrong data, or that saturates the client's bandwidth, is useless. Facebook invented GraphQL for two reasons:

* It allows rapid product iteration over the "social graph". It allows each product to have unique behaviours and data requirements, and new product capabilities can be rolled out quickly. Facebook's entire data layer (TAO, Ent framework) is optimised towards rapid iteration.

* It optimised performance for client load times. Mobile networks are high latency and low bandwidth. GraphQL allows each application to load data in the most efficient way possible.


I agree. I wrote a simple app that pulls data from a PostGres database and denormalizes it to ElasticSearch, and then the frontend code calls ElasticSearch. And the frontend developers can tweak the config file however they like to create whatever data structure they want in ElasticSearch. And this seems a lot simpler than anything offered by GraphQL. I wrote a bit about it here:

http://www.smashcompany.com/technology/caches-are-cheap-buil...


GraphQL is not an alternative to ElasticSearch. It doesn’t care how you fetch the data needed to resolve a query, and could as easily be pulling from ES if performance demands it.


No, the app I wrote is an alternative to GraphQL.

I'm conflicted. On the one hand I agree with you. On the other I think SQL works and isn't GraphQL just an abstraction approaching that? I think there's something here but I'm not necessarily sold.

We're throwing decades of web architecture to the wind here. Caching is a an exercise left to the reader.

If we're going as far as GraphQL, why _not_ open your DB to the public?


I think GraphQL is highly misunderstood. I wrote about it extensively [1].

TL;DR: it's not really a query language, like SQL. It's more of an opinionated RPC framework. I generally don't think building out one's entire data model in their GraphQL schema is a good idea, in general, but it's also not the true value-add of the system.

[1] http://artsy.github.io/blog/2018/05/08/is-graphql-the-future...


Well this is the conflict. If its just a way to batch dependent queries there are things like capnproto that do that too.

Ideally though, you don't use dependent fetches and you have a bespoke endpoint that collapses the dependent operations possibly as tight as a single join query. If this is the case you're trying to solve, then why shouldn't we strive for something like a declarative query language?


I haven't looked at capnproto recently, but just because it solves a similar problem doesn't mean there's no room for a different system.

The bespoke endpoint thing is a big part of what GraphQL avoids. It gives you something explorable, but as a team, you can decide how comprehensive you want it to be. If you have a monolithic database, you can use SQL to explore it, but GraphQL often sits in front of a service layer, which connects to my next point.

GraphQL's mutations often map to entire processes, which might change a data store, like a DML SQL operation, but also have other sorts of effects.


GraphQL is still an API that sits in front of the business logic. E.g. fetching a like count on Facebook might map to a cache lookup or a complex query, and everything fetched respects the privacy business rules. Queries shouldn't map directly to a SQL query.

I can see the value, but I think it may well be limited... it takes a fair amount of work to setup a GraphQL data source for an application. That effort may or may not be worth the return, and will vary case by case. I can definitely see why Facebook and Netflix use it.

I don't think there's any surprise that GraphQL reduces the size of the responses compared to typical REST (excluding things like JSON:API). If that's the measurement that's most important, then it seems GraphQL is the obvious choice.

For myself, the 'practicality' of GraphQL would be more on complexity of the implementation, training of engineers, potential re-implementation of client logic, ease and depth of debugging, performance, etc. It seems these days that the size of the response is not typically a limiting factor in most applications I've interfaced with (though maybe I've never been exposed to that world before).

Can anyone speak to how a migration from REST to a GraphQL went? My biggest concern is around the complexity of the thing. It just seems so much more complex than REST, but maybe I haven't spent enough time with it.


We migrated from RESTful HTTP to GraphQL and then back to RESTful HTTP. GraphQL is cool, but actually maintaining it is a nightmare and I think it would be rare for the end product to turn out better. Here's a few of the cons for us:

- GraphQL parsing and interpretation is considerably slower than RESTful JSON. I'm talking an order of magnitude difference in .NET Core.

- The required POSTs cannot easily be (if at all?) cached by caching services such as Cloudflare

- It's more work to have to define what data you want than to just spit out the data that's available. Having to continually update your client side queries in order to fetch all the available data is tedious as hell. The whole over/underfetching argument is not worth the incurred performance hit nor the bandwidth improvements.

- GraphQL promotes laziness about documentation because it's "self-documenting" -- turns out most API users still struggle to understand how it works and need better API guides anyway, so the reflection is largely useless (it's about as good as those auto-generated Java docs you find on Oracle's website.)

- Users get RESTful. They know it, it's not a toy, and it just works. I'll repeat this again because Silicon Valley doesn't seem to get it: GraphQL is not user friendly.


> The required POSTs cannot be easily (if at all?) cached

GraphQL allows queries to be performed with GET:

  http://example.com/graphql?query=query{user{id}}
You want to use POST for mutations, but read queries can be run through GET and cached just like REST. There's literally no difference -- it's HTTP, after all.


There is a difference. When "everybody" has a unique query, there's no cache.


In practice, doesn't everyone run the same query in Production? If variables are being used, they're the same degrees of freedom that rest had anyway, right?

Akamai can certainly cache POST request based on the body of the requesr. And persisted queries can be added as a caching solution on top of GraphQL.

> GraphQL parsing and interpretation is considerably slower than RESTful JSON. I'm talking an order of magnitude difference in .NET Core.

Looks like an implementation problem. I couldn't measure a significant difference between both for comparable requests in Java. Also, parsing/interpretation is almost never the bottleneck of an application.

> It's more work to have to define what data you want than to just spit out the data that's available. Having to continually update your client side queries in order to fetch all the available data is tedious as hell. The whole over/underfetching argument is not worth the incurred performance hit nor the bandwidth improvements.

"all the available data" is almost always an anti-pattern, so yes, this gets tedious. Frontends should have specific requirements what data they need and if you only request those data it isn't tedious at all + you don't waste any bandwith. This is the "select * from" school of sql all over again combined with "what do you mean we have more than one client with different requirements?" - now you either always send the superset or start an ad-hoc implementation of what GraphQL provides you to only send the required data to each client.

> GraphQL promotes laziness about documentation because it's "self-documenting" -- turns out most API users still struggle to understand how it works and need better API guides anyway, so the reflection is largely useless (it's about as good as those auto-generated Java docs you find on Oracle's website.)

GraphQL self-documentation and exploration are sufficient if whoever uses the API understands the domain model underlying it. You will always have to teach people the domain model, but after they've understood this people can use self-exploration to find what they need in that model.

With REST APIs you have to document the domain model and then painstakingly each and every technical API endpoint, cause domain model and technical API always differ (usually even two endpoints of APIs cannot be called in the same fashion - what do you mean the parameter here is called maxResults now? It was called maxCount over there!)

> Users get RESTful. They know it, it's not a toy, and it just works. I'll repeat this again because Silicon Valley doesn't seem to get it: GraphQL is not user friendly.

No, they don't. They have learned to accept that companies only provide them REST and they have to live with it, even if it is bad. SOAP or GraphQL are both (in different ways) vastly superior, but one has fallen out of fashion and the other one is seen by some of the "REST is great" people as a toy, cause they think REST is some kind of holy grail.

Disclaimer: Does GraphQL solve all your problems? No. Nothing does, you will have to do it yourself, that's part of your job.


> Looks like an implementation problem. I couldn't measure a significant difference between both for comparable requests in Java. Also, parsing/interpretation is almost never the bottleneck of an application.

That would depend on your definition of significant, and the complexity of the AST you're parsing. Obviously I'm talking about the implementation I have available to me, but GraphQL would routinely take 100+ms to return the same data that ASP.NET could deliver in 10ms or less. And this does indeed make sense as it is simply more overhead. The more likely cause of the difference is that it's just complicated to optimize GraphQL graph queries. When you have a single endpoint, you know exactly what data you need and a human can optimize around that specific requirement, whereas GraphQL attempts to fetch each piece of the graph in isolation (or not, if you optimize for that, which is additional work.)

> "all the available data" is almost always an anti-pattern, so yes, this gets tedious.

That's not really what I meant. What I meant was, in order to fetch the data I want for a specific task, I have to list all of it. Which is silly. I know what I need to return on the server side, why should I have to write it twice? This is especially cumbersome in the case of return a Dictionary<string, string> or whatever it might be, where the object keys can change over time.

> This is the "select * from" school of sql all over again combined with "what do you mean we have more than one client with different requirements?" - now you either always send the superset or start an ad-hoc implementation of what GraphQL provides you to only send the required data to each client.

Indeed, this is a real problem, but typically if all your data is mapped well you shouldn't run into this very often. Most people who are implementing GraphQL on their servers don't have this problem. They're doing it because it's cool. Problems like this are what versioning is supposed to cover. If you're having to constantly change your API for different clients however, your product is not the API itself. Most people are just trying to create an API that people can pick up and use easily.

> GraphQL self-documentation and exploration are sufficient if whoever uses the API understands the domain model underlying it. You will always have to teach people the domain model, but after they've understood this people can use self-exploration to find what they need in that model.

> With REST APIs you have to document the domain model and then painstakingly each and every technical API endpoint, cause domain model and technical API always differ (usually even two endpoints of APIs cannot be called in the same fashion - what do you mean the parameter here is called maxResults now? It was called maxCount over there!)

If you haven't explained the purpose for each endpoint (or queries/mutations as GraphQL calls them) then you haven't documented it. So you must "painstakingly" document each "endpoint" in either case. Your API docs should be guidance, not just "listTransactions(): This lists the transactions". This is exactly the kind of thing I'm talking about. At the very least, it's the same amount of work. The difference between not-GraphQL and GraphQL is with GraphQL you've probably set up GraphiQL and now your model attributions are no longer co-located with your real documentation!

> No, they don't. They have learned to accept that companies only provide them REST and they have to live with it, even if it is bad. SOAP or GraphQL are both (in different ways) vastly superior, but one has fallen out of fashion and the other one is seen by some of the "REST is great" people as a toy, cause they think REST is some kind of holy grail.

This is spoken like someone who hasn't actually worked with developers who have to implement this stuff. Users _in general_ are confused by GraphQL. Especially, of course, novice developers. They really don't know where to start with it. Real REST is not something people advocate for these days. All I'm talking about is, users are happy when you provide them with well thought out endpoints that each have their own URL and cover the use cases they have.

Imagine if Stripe had started with GraphQL. The number one question they'd have received would have been: "what do you mean I should install the GraphQL library so I can hit your APIs?". APIs work best when they're SIMPLE, and GraphQL is anything but simple.


It depends on your backend architecture. We migrated a microservices (Go) architecture to GraphQL and it gets very complex with a lot of boilerplate. Perhaps part of that is GraphQL's and its supporting libraries' infancy, but supporting the ability to ~infinitely nest objects in a microservices architecture at scale makes every field addition feel like building a gigafactory.

On the plus side GraphQL is very simple and adaptable for frontends. It just comes at the cost of moving so much complexity to the backend.


I think it heavily depends on the server library for you language of choice. I've tried sangria for scala, graphql-java and graphene for python and all of them are quite different, and mostly badly documented, but I found the java implementation to be the easiest one to get started with.

Graphene seems like a nightmare but the documentation is improving and once you get used to the design it's actually not so bad and more feature complete than the others.

I think moving complexity to the backend is actually not such a bad idea, so the frontend can focus more on actually just displaying the information.


Using graphene at work.

Absolutely love graphql as a technology. The pagination and ability to structure a query to pull a lot of data is nice.

As for some of the other comments - it's true you can't just write any query: but you can always add new fields, lists, connections (basically a pagination-friendly list), etc against arbitrary things on the backend. It's your backend, you control what data you serve.

The graphql-python stack is layered like an onion (graphene on top, graphql-core inside). You will probably be doing something like flask-graphql or graphene-django on top of that.

There is a huge negative for python <-> graphql for us, and it's the error system and promises. Perhaps this is what graphql servers in JS are like, but graphql-core hijacks python's error system by wrapping fields in promises.

So graphql-core is acting very very true to graphql's implementation in node: to the point it creating a mountain of breadcrumbs in sentry.

It also overrides the "next" built-in and tries to emulate express-style callbacks. Another thing ported from node into python that doesn't translate well imo.

Aside from that though: graphene has been really speedy, fast to work with. Documentation is getting nicer. The developers on the issue tracker are very nice. And as a general graphql thing: https://github.com/graphql/graphiql is really nice!

And one more graphql thing: It's typed. In a big API, being able to lay out stuff like that goes a long way. We're generating typescript types via schema.graphql output, and response types via relay. In a real big frontend project, it pays off a lot.

I recommend giving graphql a shot.


That was our assessment, some gained front-end speed at the cost of a loss of back-end performance and simplicity - we ended up staying with bespoke REST endpoints because they're cheap and do the job.


I was at GitHub during their migration to GraphQL, and it was extremely expensive. But they had some weird ideas - there was a mandate to use GraphQL even to render server-side templates, which meant completely replacing the existing MVC architecture with a sort of fat-model DSL that implemented GraphQL resources. Of course, all of that DSL was developed in-house, so documentation was weak and it was impossible to debug...


I've found it reduces complexity in your app code (at least on the front end). You basically have no or extremely minimal data fetching logic where as with REST you often find yourself making more requests in response to other requests. I.e to traverse relations. GQL Gives you that for free.

Also the tooling is next level . GraphIql alone is miles ahead of any REST tooling offerings. Also, typescript generation. Truly improved my workflow.


I find the lack of tooling the annoying part actually... the necessity for 3rd party frameworks like Apollo, tac-on modules like Dataloader... the lack of up-to-date, feature-parity server implementations besides the canonical Javascript version... etc.


What’s the complexity you’re thinking of? In my experience, GraphQL is so much simpler than REST. There’s a whole category of thinking you don’t have to engage in (is this POST? PUT?). And to start I wouldn’t migrate from REST. I’d wrap a REST service with GraphQL and take it from there.


That essentially means you are still overfetching underneath the hood but yeh -- for me the benefits are in its ergonomics anyway. The whole under/overfetch thing is overstated.


Locally, yes. But that’s not a big deal unless your network is that saturated. Also, with wrapping we’re just talking first steps. Anyway the overfetching thing is less about bandwidth and more about maintaining compatibility with older clients and knowing which clients require which fields. Since each GraphQL client explicitly states the fields it needs, it’s much easier to retire old fields.


> There’s a whole category of thinking you don’t have to engage in (is this POST? PUT?)

There’s a reason that category of thinking exists. But since GraphQL libs are now busy implementing caching in top of non-cacheable POST requests, this argument is lost on them.


> There’s a reason that category of thinking exists.

What's the reason?

> But since GraphQL libs are now busy implementing caching

What are you referring to? You mean like https://github.com/graphql/dataloader ?

> in top of non-cacheable POST requests, this argument is lost on them.

I think caching/serving HTTP results is a bad paradigm. It makes sense if you're serving HTML for a content site or something like that, but doesn't make sense when you're serving an API.


> It seems these days that the size of the response is not typically a limiting factor in most applications I've interfaced with

As a "for your consideration," optimizing for the _client's_ concerns might not the true win, but if the server-side were able to side-step some joins, it could be a win for the whole community since it could -- in theory -- make everyone's experience better. The ability to push conditionals on the server side _could_ side-step multiple round-trips, too: https://graphql.github.io/learn/queries/#directives

Just this morning I tried out GitHub's GraphQL interface, and it for sure requires some thinking to reframe the question in terms of the GraphQL surface-area they expose, but GraphQL also has built-in schema discovery, which is something one would typically have to read the docs to access.

It's tremendously annoying that one must package a GraphQL query _inside_ a JSON field named `query` versus the much more sane `curl --data-binary 'query MyQuery { some fields }' https://example.com/graphql` but it seems that packaging is required for separating variables from the actual query itself: https://graphql.github.io/learn/queries/#variables (although having two endpoints, `/graphql` and `/graphql.variables`, or switching behavior based on content-type, would be amazing and, at least in theory, very little server-side work)

Speaking of switching endpoints, the GraphQL community claims that the api is a lot more versioned, too, getting one out of the business of `/v1/customer` and `/v2/customer` etc but I don't have experience to know how much of that is "in theory."


Increasing the endpoints queried from 1 to greater than one would probably be a breaking change which is not allowed in graphql.

If anyone was around when SOAP was being pushed on the dev community as the new way of structuring Web-like APIs, you may agree with me that SOAP was a good idea, but too complex and slow to implement - with questionable immediate / long term benefits. I don't think SOAP is in real prod systems much anymore. GraphQL, to me, inherits some issues of SOAP: complexity, non-mainstream terminology ... So, if I develop just fine with REST, moving to GraphQL is an added pain for me, and the dev community picks simple and straight-forward stuff over complex - as the SOAP's history teaches us

SOAP, itself, wasn’t too bad. What was bad was all the enterprisey architecture-astronaut cruft that people insisted on burying it under. (Kind of the same thing that happened with Java EE.)

Beware if your favorite technology catches on in the BigCo world, because when those people fall in love with something they end up hugging it so hard they strangle it.


That stuff buried it, but it was a dead horse already - object RPC is the wrong mindset for internet RPC, doesn't treat latency (and thus async calls) and errors as first-class concerns.

The core problem with SOAP was that it inherited too much naive RPC mindset - a simple object access protocol encourages you to think in terms of manipulating objects, a bit like COM, DCOM, CORBA encourage you to think of manipulating remote objects via apparently-local proxies.

Latency and failure are core to network (and especially internet) RPC though, and something that presents as a method call is too leaky an abstraction. It inhibited people from structuring their network calls as batch operations (the documentation orientation palaver was an attempt to change the conversation) or treating errors in a first class way (errors in RPC are way, way more common than in-process method calls, and require much more attention) or using async calls (don't block the world due to latency, which is rarely a hard requirement locally).

All the standardization efforts as the CORBA etc. crowd piled on in and translated IDL (WSDL!), and layered authorization & authentication (rather than use existing HTTP idioms), transactions (WS-AT), etc. etc. just buried the already dying horse.


Out of GraphQL and REST, I'd say REST is much closer to SOAP than GraphQL is.

SOAP is essentially the same as REST -- you have request/response for every resource. Its just more verbose and inscrutable thanks to XML and all the schemas.

GraphQL on the other hand enables new interactions that were not possible or practical with either of the two.


Nah, SOAP is fundamentally different from REST in that it’s all organized around procedure calls. A SOAP API tends to be structured as a big bag of functions, whereas a REST API puts resources front and center. REST thinks in nouns, SOAP thinks in verbs.

The author of the paper forgets, that asking the API to only return specific attributes of an object(tree) instead all of them, can also be achieved with Rest.

e.g. /api/v1/getUser?id=1234&return=name,zipCode,workplace.address,workplace.zipCode

Not sure if that has changed in the meanwhile, but when I tried GraphQL for the first time it was obvious for me, that every service call has to be implemented by hand. Depending on what functionality is required, implementations of service calls can get quite complicated.

As someone coming from Java where we have ORM frameworks and tools like Spring, I was surprised that Facebook didn't come up with something better.

But hey it's Facebook. A company where the CEO thinks it's cool maintaining a huge codebase mostly written in PHP and C(++).


You'll end up inventing your own ad-hoc GraphQL that way. For example, GraphQL supports this, trivially:

  query User {
    user(id: "123") {
      id
      name
      topPosts(limit: 10) { id, title }
      drafts: comments(published: false) { id, body }
      friends {
        id
        photo(size: "100x100") { url, width, height }
      }
      newestPosts(since: "3days", categories: ["news", "chat"]) {
        id, title
        forum { id, name }
        creator { id, photo(size: "100x100") { url, width, height } }
      }
    }
    latestNews(limit: 10) { title, url }
  }
All one query. Selecting a subset of attributes is a small part of GraphQL. (gRPC also allows you filter fields, but doesn't have structured querying.)

GraphQL is a protocol and a schema language, not an ORM. It's a way to provide a common interface on top of any implementation. For example, in the above query, users could be in the backend's own database, whereas "friends" could be something stored in a completely different backend. The GraphQL server could trivially act as a façade that federated/aggregated the results of both.

I would argue that GraphQL fulfills the objectives (or at least desired features) of REST much better than what has been delivered thus far. For example, REST's hypermedia aspect ("HATEOAS") has not been widely adopted — because discoverability is poor and there's no standard protocol for introspecting a schema. But look around and you can easily find a dozen different GraphQL clients (my personal favourite being Prisma's GraphQL Playground [1]) that can be pointed at any GraphQL-compliant server to run queries and browse the schema.

Those that criticize GraphQL generally don't seem to realize how simple it is, and how close it's still to the REST style of consuming and producing JSON.

[1] https://github.com/prisma/graphql-playground


I'd argue that the growing popularity of Swagger/OpenAPI is close enough to HATEOAS as we want to get.

Swagger/OpenAPI is very fragmented, it builds on JSON Schemas by adding some properties and removing others, then on top of that the AWS API Gateway again removes some options and adds others. Its a total mess trying to navigate which features I can and can not use.

Eeeeh, I hit the question of if I should take an approach like that recently and I think that writing out a custom format like that makes your code brittle and is generally poor at scaling across endpoints. We were actually considering one of the graphQl look-a-likes to step directly into a pre-rolled format for declaring the fields expected in the response rather than roll our own format.

You absolutely can define reports as graphQl entities (and really, table -> entity mapping should never be 1to1) and just let users leverage the field filtering component of the transit format.

Lastly, PHP is pretty sweet now a days you should give it a try!



If I write an API that exposes the resources A,B and C. Now I use graphql to access a subset of all of those resources, doesn’t this just mean that I have written the wrong API? That I would ditch A,B and C and write a resource D which just did that I wanted?

You can make an API that exposes exactly what a client needs, but then the server/client are tightly coupled (and it's not RESTful). What happens when you have multiple clients, and long lived releases with differing requirements.

But that’s not what I mean. I mean that your original resources are not the the actual ones you should expose. Hence you have designed your API wrong. And I don’t agree that a resource becomes more tightly coupled if you write it right. It is a resource after all.

Just because requirements change, doesn't automatically imply the original design was bad.

But that’s the thing. Should you invest in an obsolete API or just change it? If you don’t, you have both an obsolete API and a more complex solution instead of just changing your API. And if you don’t change your API to adapt to the newer requirements, but use Graphql to “bridge” that problem, you now have a client that’s heavily invested in your old obsolete API with a technology glued to it. To me, that sounds like a bad idea.

In the case of GraphQL and in the most prolific examples (Netflix and Facebook), you're not considering that the time to setup a GQL interface is less in aggregate than modifying many, many APIs for data that is distributed across varying data stores, often column/kv structures. These are also organizations doing more than just public facing pieces. For example, Netflix has to deal with art, media, and licensing that will vary by location, language and even distribution models under different contract terms for a different region. That's only one aspect of how things work.

In the end, an API written for one department may not match the needs of another, and another still may need additional related aggregate data. In the end, would you rather define your data sources once, or communicate API needs across several departments and maintain 3x or more the surface area to support them?

edit: I'm not saying GraphQL is appropriate for all scenarios... but I'm saying it is definitely better for many.


But graphql cannot do more than the API already does? It’s constrained to the capabilities of the API. Such as if there’s no way of updating a field, graphql won’t do that either? So it can only make over-fetching more efficient? And by what you are saying, it’s more complicated to create a new resource than incorporating several endpoints with graphql, I find it hard to believe.

Imagine the scenario, where you're a developer on one team and need to consume data owned/orchestrated by another team. YOU don't have access to their data source... you either get a custom API that will be updated when they're able to, their priorities are different from yours and under a different internal organization. With GQL, as long as the source data is defined, you get as much or as little as your queries define. With API only, you're stuck explaining to your boss why you have to wait for another team in another org to let you get to it, because they have other priorities.

But Graphql doesn’t enable more functionality than the API already gives you. You don’t get more access to data just because you use graphql? And a team that gives direct access to their data is in very much deep trouble already. I really don’t get what you are getting with this.

> You don’t get more access to data just because you use graphql?

The data you expose via apis is often one “view” of the actual rich data store you own. If a dependent team wants more data than your rest api provides, it is blocked on you adding that data to your api.

If you instead provide a way to query your data store, the dependent team isn’t blocked on you.


>If you instead provide a way to query your data store, the dependent team isn’t blocked on you.

This means that you bypass the API and give direct access to the data? Why have an API at all then?

Don’t you think that this mean you tightly couple the data storage with the client?


> This means that you bypass the API and give direct access to the data? Why have an API at all then?

False dichotomy.

You’re not bypassing the api. You’re exposing a richer api. Use the right terms.

The purpose of exposing apis is not to limit the attributes of exposable data. Eg your definition of an object that the rest api represents May evolve over time while the api does not since they may not be in sync in the code. With graphql this is handled automatically.

Of course it’s tightly coupling data storage, that is the point! The problem with using rest apis when doing this is that you can’t then change your api without breaking dependencies. Graphql recognizes that in a rapidly evolving data model, trying to model a permanence is futile. Instead it gives you and your clients tools to a) reduce expansive dependencies by querying only for data you need and b) identifying which dependencies are used and how frequently allowing you to deprecate attributes safely.

This is not a tool meant for enterprises abusing rest apis as “contracts”, where every change needs to be communicated in advance yadda yadda. It’s targeted towards consumers who are ok with and actually _want_ a richer api in lieu of the stability provided by versioned apis. Good candidates for this use case include internal teams and external, non contractual apis.


To GraphQL there is also the opposite: Just move as much stuff as possible to the backend - make the frontend thin.

With that approach you also don't end up writing a lot of specialized REST endpoints for the frontend. You can just use the queries directly and output for example HTML (or json and render that into HTML on the client). You can also do a lot more queries at once, because the services backend have usually more bandwidth and less connection limits than a frontend running inside a browser.

You are full control of the queries. Only UI related data is exposed to the frontend.

One draw back is the frontend team wouldn't do any business stuff anymore. They would do more generic components that the backend can drive.

And depending on the implementation you need more server round trips.

On the other hand, there is less JS to load and execute by the browser, since the backend is doing most of the orchestration and rendering.


Servers are expensive, and if you do it all on the server, you spend more scaling. Clients connecting have their own compute resources that are effectively free in terms of growth of a system as a whole.

GQL allows for the client UI/UX to describe what it needs, and the backend delivers that, within the context of well defined data structures that can then be distributed.

There's a cost to setting this up, but in the case of Facebook or Netflix, absolutely worth taking on. For a few hundred users, not so much.


However, Frontends are also expensive when you look on other metrics. They consume battery for example on a mobile device.

> GQL allows for the client UI/UX to describe what it needs, and the backend delivers that, within the context of well defined data structures that can then be distributed.

In the other model, it is done in the same way, except it is being executed in the backend and not the frontend.

Both the frontend and the backend are general computing platforms, but they have different characteristics in terms of performanc, connectivity, control etc.

Most UI/UX requirements are not that specific that only a frontend heavy implementation will be adequate to fulfil the requirements.

Please keep in mind that back in the old days, when we didn't have JS in the browser, all html was rendered on the server and a thin client was rendering it.

And even in right now, sometimes there are architectures popping up that try to make the client thin and the push the majority of the computation to the backend: Like for example Google's Stadia.


I'm sorry, but the battery example just doesn't fly... the biggest battery usage is the screen, if the user is looking at their phone, they're actively using it.

Back in what old day? I've been at this since 1995, and have been using JS pretty proactively since 1998 or so, it's been 20+ years now. Computers are a considerably more powerful and power efficient compared to 1998.

If you want to be able to use the internet on an 80386 without a math co... have fun with that. I'll take modern tooling, with modern hardware.


Generating UX/UI in the backend doesn't imply to use old tooling.

React server side rendering, Vue's Nuxt, etc are all modern ways to render UX/UI in the backend for example. And there are more.

The discussion of rendering server side goes deeper than just performance or using a "modern" framework.

There is for example: how many teams can work on a single application at the same time and deploy independently features in an efficient manner.

How can we share business processing flow requirements between the web version of an application with the native mobile application counter parts in an efficient way, without implementing everything three times.

Or other aspects mentioned here, like predictable performance etc.

Or choice of language, in the backend your choice of languages to choose is greater and/or simpler than in the frontend.

You can add more monitoring in the backend about aspects of your application than you can do in the frontend.

Obviously, the frontend has advantages as well. I.e. if something needs to be calculated, animated it is almost instantly available to the user when done in the frontend.

Frontend can perform functions offline, can use device specific features: like camera, contact list, push notifications etc.

Thus I prefer to look at the requirements and pick the tools for the job that best fulfils all the requirements for a giving application in a given organization and strategy.

If that means i do more stuff in the frontend, sure fine. But I wouldn't go for GraphQL just because it is "modern". Being "modern" is not solid reason to choose a technology unless you want to attract developers that like "modern" technologies. Which could be a valid reason as well for some organizations.


And you don't think server rendering has a cost? Running a very rich UI client side allows for tens of thousands of users per server, your suggestion reduces that to dozens or a couple hundred. Meaning many times more servers... more servers to configure, support, orchestrate, pay for. Meanwhile the clients are idle resources that could have been used.

If you are the one paying for the servers, and you want to pay for 10x+ the servers to support your app, feel free.


How much bigger did the request bodies become?


[flagged]


Arxiv is free, there is a link to the paper on the right.



Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: