Hacker News new | past | comments | ask | show | jobs | submit login

I can see absolutely no positive value whatsoever in GraphQL for internal APIs. You're basically saying 'please, formulate any query you like, and I will waste years of my life trying to optimise that general case'.

Seriously. For internal stuff you want to be as specific as humanly possible. You want to optimise the living fuck out of that hot path thousand-times-a-second query, and build an entirely separate service to handle that particular URL if necessary.

For a public API, why would you ever encourage submission of arbitrary queries? They will destroy your database servers.

Note that SQL vs. NoSQL is not even a competition here. An RDBMS can handle arbitrary queries much better than any brute-force map-reduce system, and doesn't require spending thousands of man-hours writing your own query planner. The difference is that it doesn't (always) automatically scale horizontally.

So from where I'm sitting, GraphQL is nothing more than an invitation to maintenance and performance headaches, predicated on the idea that everyone scales infinitely horizontally and can brute-force every query.

Personally I prefer being able to serve a thousand queries a second from a single server managing a 1TB database with a 50GB working set, with a latency under a second even when (looking at the raw query) it 'should' touch more rows than there are atoms in the universe.

[edit] In light of the replies, I should express some surprise that building REST endpoints is expensive. If you can execute at runtime an automatic combination of various other endpoints to produce a result, is it not equally simple to generate the code necessary to do the same? That can at least be examined and maintained more easily than dealing with the massive array of possible varations to known-working queries which comes with modifying the rules driving the GraphQL evaluator...?

GraphQL was created at Facebook specifically for internal APIs. The motivator was to avoid having to write a custom endpoint for every single REST query, because otherwise REST is not efficient.

Note that GraphQL does not allow the specification of arbitrary queries - much like REST does not allow access to arbitrary resources - the sever defines what queries are available and the user can choose to request a selection of them, and to pluck data from them (or follow links - simply a JOIN).

In other words, GraphQL lets you combine a subset of pre-defined queries in one go.

> GraphQL was created at Facebook specifically for internal APIs.

AFAIK Facebook created GraphQL specifically for their gateway API - a service used as a facade between internal service mes(s/h) and their client - not for internal ones themselves. That's why things like schema stitching didn't came from FB - they weren't using it in that context.

This is my understanding also.

'Combine a subset of pre-defined queries': you mean run a bunch of queries and perform the join locally? Like a poorly-designed app server 're-using' several repository methods and doing joins in-memory?

I may be misunderstanding something here, but when 'general' queries are combined externally a lot more work is done than is necessary. Which may be fine for small intermediate sets. But treating it as any kind of general solution is silly.

That said, people do similarly stupid things within individual codebases running in individual services to a single database, so as usual it likely comes down to how the tool is used rather than how it can be abused. Still, spreading these things across services and processes looks like it only makes abuse easier.

nope, it's meant for micro-service architecture where you have a lot of different api end-points spread across different servers, databases, technologies, shards, etc. So you need to get user profile from one side, and a list of news from the other, and then cross-reference that with a list of friends from the 3rd source, and then you need to combine all that together into something usable in the client. GraphQL is a way to define how to merge those datasets, without manually coding each step of taking field X from feed A and list Y from feed B and indexing them by Z. Beside obviously solving problems for huge players like FB, it also helps with common situations when UI design is being constantly changed along the way and all of the sudden front-end team needs APIs to return shitload of new relations that were not in initial specs. In most startups that I've worked on APIs 99% of my time was reworking queries to include more data because designers and UX people had changed their mind. GraphQL makes this fairly painless as long as you have a well defined set of basic APIs. Then later when the design gets more stable you can locate the bottlenecks and rework them into more complex, but faster API calls.

Not sure I understand why in-memory joins are wrong by default, especially in a large system with more than one data store.

Let's say you call UserService::batchGetUsers(userIds) to get a list of users from a service backed by a MySQL DB, call WidgetService::widgetsForUser(userId) to get a list of a user's widgets from a separate service backed by a Redis cache and another MySQL DB, and return them. What's the problem here? Lack of transactions? Unnecessary fields sent down the wire? Something else?

There is no problem, the same way GraphQL does not automatically allow arbitrary queries.

The parent author holds some very stubborn beliefs about how systems are built (his/her way is the correct way!), which is great for discussion, but probably not the best example on how to actually build big systems.

> the sever defines what queries are available

>avoid having to write a custom endpoint for every single REST query

So what are you saving, really? Could you better implement GraphQL type functionality as a client side wrapper for traditional REST apis? Then you could keep the traditional tooling as well as simple query join semantics for client devs.

Oh really? Because I’ve yet to see anyone explain how joins can be accomplished with GraphQL.

We're building a GraphQL internal/external API. Our main app uses the same API that clients can hit directly. Our backend dev team spends a large fraction of our time building custom REST API endpoints. With GraphQL a lot of that work is being shifted to front-end devs and to previously non-dev account executives, and their clients directly. A project to create a report that used to involve many organization layers and take weeks, can now often be done on the phone by an account exec without a dev ever hearing about it. Clients have deeper and easier access to their data than our competitors yet offer.

Of course we're very concerned about the performance drawbacks and are trying to plan for them. We don't expect it to be web scale and aren't replacing our REST APIs that serve the public. Even so we expect to need to be smart about calculating query complexity.

So while most of the criticisms of GraphQL on this page do apply, for us the net value is looking quite positive.

I just don't get it. You still have to have provided the data for the client devs to query. Does opening up the full schema model make things easier to implement? Aren't you just kicking the can of api design down the road to the client devs causing thicker clients full of their own business logic?

I think you’re failing to understand how companies tend to work. The front end teams and backend teams are usually different and want to work on different things. But for the front end team to build an app with the data they need, they need the backend team to change the rest api to provide different data. That kind of change is usually not very interesting for backend teams to do and they stall, while the frontend team is waiting By the seat of their pants for days for a change which they feel should be pretty damn small.

One easy way to fix this is by using graphql instead of REST. Now the frontend teams have access to somehow get the data they need and are thus unblocked.

Fair enough, it certainly seems to offer benefits re. flexibility and rapid development.

I'm just too used to being the person who then has to make the server deal with this particular use case (meaning a million possible variations on the same query) run a thousand times faster :P

So if it's understood at all stakeholder levels that there are tradeoffs, it could be a good tool. If. >_>

I’ve always thought the perceived rapidness is because you are now able put code in the front end which really should be existing in the back end and while at the beginning it’s faster development, you just delay the inevitable problems of spaghetti APIs.

I'm very surprised by this. I have built REST in the past and recently I built a large GraphQL service. I found GraphQL to be a lot less confusing both for front-end and for back end. The tooling is amazing, standardized schema introspection is great.

Sure, if you try to shoehorn an API that does not fit the database, it can be challenging, but you can always start with the simple case, doing multiple queries to the DB for one GraphQL request and only later see what is actually used and could benefit from optimizing.

You have more opportunities to optimize. With REST, if a client needs to list some resource and then access all of them one by one, it is going to be n+1 requests and you can't do anything on the back end to change that. With GraphQL you can look at the query holistically and optimize as needed.

All in all it feels to me that GraphQL gives the client the ability to better communicate the intent of what they are trying to do. Declarative over imperative.

GraphQL is bad for public APIs. It is good for precisely one thing: when the server and the client are controlled by the same entity, making updates to clients without having to add new internal APIs to the server.

But this one thing is so useful for almost everyone that for internal APIs using GraphQL is usually a no-brainer.

You actually don't want to be as performant as possible for internal APIs. There is a performance-flexibility trade-off involved and GraphQL lets you choose a different point on the Pareto frontier than maximum performance.

This was the most useful comment, should be at the top.

As far as I can tell, I agree.

An API is like a promise.

In the best case, people decide to use your API and build on it. Then, if for whatever reason and in whatever way, you break your API, you force everyone affected to reimplement at least to some degree.

With GraphQL you’ll be promising the sun the moon and the stars if you aren’t very careful. (Even if you are very careful you’re still promising a lot, though hopefully the available tools will help you out.)

With GraphQL you put yourself in a very tough position of either keeping big promises or breaking big promises.

Most of the time you are going to be better off keeping small promises.

For internal APIS, one essential value Graphql provides is significantly reduce the number of read APIs. When using REST or RPC the number of APIs explodes, because RPC is too specific, while REST is not flexible, so people just keep rolling their own new but similar APIs.

For front-end/public API parsing a query to a querable graph enables adhoc-ness and co-location which removes a lot of manually written controllers with much fewer resolvers, which also means removes huge amount of accidental complexities.

How can graphql reduce the number of reads? If you are reading stuff you don’t need why are you making those calls? And if there’s no API for for you to get the data how would Graphql know about it?

It's mainly about reducing the number of read APIs. It can also reduce the number of reads through the dataloader.

There are 10 different pages you need to read the 'Article' resource, in the index page you need the title and the summary, in details and edit pages you need the content, in the details page you also need comments.

The thing is, all of the pages read 'Article', they need it, but they only care about only some parts of it, some reads also require extra and possibly cascaded joins with other resources.

In most medium-sized web apps people will tend to write a bunch of different APIs to fulfill these needs, it's grunt works and hardly consistent because they're duplicating things. Or even worse, they will invent their own queries, a half-baked GraphQL everywhere, which is inconsistent and sometimes buggy.

Our application has est. 3000 internal APIs as of last year, it's becoming harder and harder to even find which API to use because there are so many nuances. Almost half are doing different kinds of reads, say if we can make everything in a queryable graph (ironically, almost all businesses are graphs, yet only very few people explicitly treat them as graphs), the APIs will be much lesser and easier to understand.

Again, I don’t see how it can reduce read APIs. If you haven’t read the data you can’t change the data.

Also I fail to see how graphql can simplify an API, the graphql operates on the actual APIs so if you don’t understand those, you’ll probably not understand the graphql version of it too.

You can query the schema and get only the data you need. Eg a rest endpoint that returns a list of all users with all of their info. With Graphql, if you only want the users name, you can ask for that and graphql will return only the names of all users, reducing the payload by a lot. Or use the schema to get only the data you need perhaps eliminating the need to fetch all user’s names in the first place.

But that’s not reducing the complexity of the API, just the payload. If you remove the “load all users with everything in it” API the backend still needs to load all users and the filter it to reduce the payload. If the backend doesn’t have that capability graphql cannot ask for that functionality, so you still rely on the team to have implemented it.

Don’t move the goddamn goalposts. The responses are to the specific question of reducing read api payloads which you have admitted is what it does. It’s not to reduce the complexity of the api or whatever new arbitrary goal you invent next.

If you need to query for a nested relationship, that's multiple REST requests, such as "all posts by a user" and "replies to a given post". You could write up a single REST endpoint to support that, but now you're making a customized API endpoint to optimize for your current frontend needs. If you decide to display more or less, that API needs to be changed too.

With that said, I'm not a big fan of GraphQL, but this particular trade-off seems like a win over REST.

Are you saying that graphql bunch those multiple requests into a single, actual, not perceived, request?

In front-end application's perspective, yes, it's a single request.

And in the back-end, after the query was parsed it would be split to a bunch of requests depends what's your data source.

If the data source is in-memory then good everything's done.

If the data source is an RDBMS it would result in an n+1 query. However, people would always use data loader with GraphQL, which batches the query and converts the n+1 query to two queries. REST APIs usually only care about single kind of resource per API, so if there are cascaded joins (eg: get the user and its posts and comments and comments' comments), there would be less overall requests in you are thinking in amortization.

If the data source is another REST API, if there's batch APIs for resources, the similar approach as RDBMS would also be taken.

Yes, that's the main sell of GraphQL I believe, the "graph" of the name. Your clients could theoretically make one API request for all their data needs by specifying the relationships the care about.

E.g. I want all of my posts and their respective comments and the users making those comments. You can narrow things down a lot by specifying fields: only give me the name and avatars of the commenters. With REST I've done this in the past with N API calls: give me my posts, iterate over the IDs of those posts to make N API calls for their comments, and another N calls for the comment user data I need.

At that point, a custom API endpoint could trim down the network calls a lot, and this is where GraphQL shines in comparison. You wrote a resolver for a user, comment and post in the backend, and GraphQL server frameworks can piece those together for you.

So the main sell point is to reduce traffic and let the backend sort out the n+1 and caching problems. I can see a bunch of problems with that, like required fields etc, which also must be fulfilled by the client and would be very fragmented as an error reply. Although in a microservice environment where a single or a few resources belongs to a single microservice, the client still needs to make several individual requests.

Also by bunching APIs together means that if you split an API between two clusters all those graphql requests are now broken. But you don’t really know that because it’s hard to know what queries a client have created.

Also if someone creates a query that triggers a path where the n+1 problem absolutely wreak havoc with the backing data storage you could have serious problems with that. By fixing an API you get predictability. There’s also the mismatching between data versions and on what version or view you are operating on.

I don't really understand it either. I thought the whole purpose of having an application layer on top of your database was to provide a specialized interface for your application. If you're just going to provide a general query language to the frontend what is the point of the application layer? It seems like you'll just end up writing a database on top of your database. If you are writing a general purpose interface now, why are you writing custom code? Couldn't it all be abstracted out into an RDBMS? Would an RDBMS with user login and permissions system meant to be accessed by the end user/frontend be better than everyone writing their own weird database wrapper? It all seems very strange to me.

GQL makes it convenient for the front-end to assemble a collection of queries needed to render the page in a single request, and then go about its rendering business upon response. The React-Apollo client I use also features an in-memory cache that can serve as a state store, often entirely obviating the need for libs like redux.

In the reference implementations of GQL, that “one query” from the client balloons out in to possible hundreds of queries out the back side of GQL.

Things like dataloader exist but the behavior of your schema gets harder and harder to reason about when different caching mechanisms get thrown on top.

I think everyone agrees that the client facing api for GQL is fantastic... maybe that means graphdbs are the next wave :P

For distributed data stores, with collectives of data across different databases/clusters it starts to make more sense.

If you're using a column store structure for most data, you're mainly doing individual lookups based on a single key, graph data in another key/keys, and related keys looked up separately. Each optimized for the single record(s).

Especially since distributed/collectively this data will be accessed faster than a single rdbms would be able to manage.

That said, if your application can/does use a single sql datastore, and you don't need that much scale, the effort to setup/configure GQL may not be worth it in a given instance.

To me, on the back-end, GraphQL means replacing controllers with query (GET) and mutation (POST/PUT/DELETE) resolvers, with very similar concerns, routing takes place in the GQL schema, and everything else stays the same.

You're optimising for the wrong things. An efficient server that vends the wrong data, or that saturates the client's bandwidth, is useless. Facebook invented GraphQL for two reasons:

* It allows rapid product iteration over the "social graph". It allows each product to have unique behaviours and data requirements, and new product capabilities can be rolled out quickly. Facebook's entire data layer (TAO, Ent framework) is optimised towards rapid iteration.

* It optimised performance for client load times. Mobile networks are high latency and low bandwidth. GraphQL allows each application to load data in the most efficient way possible.

I agree. I wrote a simple app that pulls data from a PostGres database and denormalizes it to ElasticSearch, and then the frontend code calls ElasticSearch. And the frontend developers can tweak the config file however they like to create whatever data structure they want in ElasticSearch. And this seems a lot simpler than anything offered by GraphQL. I wrote a bit about it here:


GraphQL is not an alternative to ElasticSearch. It doesn’t care how you fetch the data needed to resolve a query, and could as easily be pulling from ES if performance demands it.

No, the app I wrote is an alternative to GraphQL.

I'm conflicted. On the one hand I agree with you. On the other I think SQL works and isn't GraphQL just an abstraction approaching that? I think there's something here but I'm not necessarily sold.

We're throwing decades of web architecture to the wind here. Caching is a an exercise left to the reader.

If we're going as far as GraphQL, why _not_ open your DB to the public?

I think GraphQL is highly misunderstood. I wrote about it extensively [1].

TL;DR: it's not really a query language, like SQL. It's more of an opinionated RPC framework. I generally don't think building out one's entire data model in their GraphQL schema is a good idea, in general, but it's also not the true value-add of the system.

[1] http://artsy.github.io/blog/2018/05/08/is-graphql-the-future...

Well this is the conflict. If its just a way to batch dependent queries there are things like capnproto that do that too.

Ideally though, you don't use dependent fetches and you have a bespoke endpoint that collapses the dependent operations possibly as tight as a single join query. If this is the case you're trying to solve, then why shouldn't we strive for something like a declarative query language?

I haven't looked at capnproto recently, but just because it solves a similar problem doesn't mean there's no room for a different system.

The bespoke endpoint thing is a big part of what GraphQL avoids. It gives you something explorable, but as a team, you can decide how comprehensive you want it to be. If you have a monolithic database, you can use SQL to explore it, but GraphQL often sits in front of a service layer, which connects to my next point.

GraphQL's mutations often map to entire processes, which might change a data store, like a DML SQL operation, but also have other sorts of effects.

GraphQL is still an API that sits in front of the business logic. E.g. fetching a like count on Facebook might map to a cache lookup or a complex query, and everything fetched respects the privacy business rules. Queries shouldn't map directly to a SQL query.

I can see the value, but I think it may well be limited... it takes a fair amount of work to setup a GraphQL data source for an application. That effort may or may not be worth the return, and will vary case by case. I can definitely see why Facebook and Netflix use it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact