Hacker News new | past | comments | ask | show | jobs | submit login
Principled GraphQL (principledgraphql.com)
288 points by debergalis 37 days ago | hide | past | web | favorite | 83 comments



The main argument I have against "One Graph", is that it's not that uncommon to have two (or more) quite distinct views of the world.

At my last job, we were building a social shopping app. Behind the scenes, products were versioned so that we could deal with disputes related to attempts to defraud customers. This (along with several other things) meant that the logical internal abstraction of the data model for things like dispute dashboards was considerably more complicated compared to an abstraction that made sense for the clients (apps and website).

If we only had one graph, all the clients developers would have to develop around a data model that was far more unwieldy than they needed. But with two graphs, the world was a lot simpler (at the cost of having to maintain two graphs).


The RDF world has conclusively proven that there is more than "One Graph". (e.g. people try to make "One Graph" and their projects die; try to make as many graphs as there are points of view and the sailing is smooth)


> The RDF world

I'll never forgive them for taking a great concept and absolutely beating it to death with intellectual (for lack of a better word) wankery. We could have ubiquitous Datomic style triple stores today if not for the Semantic Web researcher's need to generate pseudo-academic journal articles.


In the case of GraphQL, I'd be interested in seeing strategies for multiple graphs within a single codebase. Essentially being able to produce different schemas based on config. I know that superficially it's as simple as some "if" statements, but I'm curious about the maintenance/scalability side of it.


Many GraphQL libraries take some sort of schema definition and then serve it at a route (eg. /graphql). To support multiple schemas, you'd just write a different definition and serve it at a different route. How you resolve the fields is up to you, but both can use shared underlying business logic in these resolvers.

In terms of maintainability, you have to take care that your changes to the underlying business logic don't break assumptions of each schema. And if you want to evolve one schema (eg. say, deprecate a mutation argument, rename a field and deprecate the old naming), you have to ensure that your underlying business logic is backwards compatible for any other schemas (and their clients) relying on it.


How about having N base graphs and the ability to make unions of the graphs, as well as other kinds of algebra.

It is relatively easy to do this in the RDF world since the graph is composed of individual facts which may or may not be in a particular graph.


Which triplestore are you using that gives great front-end performance at load?


None of them are really "great". I get acceptable results with OpenLink Virtuoso if I give it a lot of RAM, tweak the configuration, and baby it when needed.


Is there any reason you couldn't have multiple graphs essentially overlaid on top of each other? With proper tooling you could expose different subsets of the same schema in different scenarios, and still have one unified graph underneath.


See my reply to another comment. But yeah, there’s no fundamental reason. But it’s conceivable that the difference goes beyond simple subsets of fields, but entire relationships being hidden.

Let’s say in your internal model there’s a relationship A - B - C. But for client apps it makes no sense to expose B, so instead you choose to represent the model as A - C. This is more than just a simple subset. Someone who understands graph theory better than I may be able to explain if there are elegant approaches to this.


Since GraphQL lets you select a subset of attributes, there is no reason you can't expose C on A directly, as well as exposing the indirection through B. Redundant attributes are A-OK.

You'd have something like this::

a { b { c { x } }

as well as

a { c { x } }

or even:

a { cX }


Agree. And congrats on the round.


Thanks!


I agree. I think their recommendation is a bit overzealous.

I can see the argument if you have a web frontend that consumes data from multiple backend services – have one GraphQL service that manages them all instead of a GraphQL layer on each service.

But this breaks down greatly when you have different "Viewers". In a web app, the "Viewer" can be a logged in user. In an admin dashboard, the "Viewer" is very different – an employee acting on behalf of users. Service to service communication likely doesn't have a concept of a "Viewer".

I would propose that you have different schemas when you have these different views of the world or different permission boundaries. The business logic can be shared – you may just enforce different authorization checks at the GraphQL layer. You could also share GraphQL types that are common between schemas.


From my (naive) understanding it seems that one graph would not exactly be right, one graph per bounded context might be more likely. I think one graph might fail when one entity means something different for different clients. Imagine a SAAS system where customer means any currently paying user for the main saas, any prospective or current user for marketing and any enterprise user that has signed a deal or wants to sign one for enterprise sales. There's just no way to map those three into a single customer type. Similarly, in a school management system, a student, with all his personal details, might be one entity for a school nurse or the dean, but n different entities for a teacher who teaches that one student n things. How would you map that to one graph?


Agreed. We decided to go with 2 views, an internal and an external. Currently the only GraphQL clients we have are customer facing so we just don’t model any internal details in the API. I expect we’ll introduce an internal one at some point.

Not only do these end up with very different data models, but they are also likely to have different access control (customer facing is basically all open, scoped to user, internal has many different layers and permissions), it’s likely to have different performance concerns, reliability, etc. That’s a lot of complexity you don’t need slowing you down when you’re building the other graphs.


I think they mean it technically only.

You have one root, but mutiple nodes after that, every one being essentially another graph.


That's the general argument for key value storage versus normalized relational databases as well.


Can you elaborate on that?


On a second reflection I may have oversimplified your point.

There's this trade off between having logically structured databases and having data stores that are faster to access. An all-too-superficial scan saw your point as just an iteration of that, which it may not be.

The word "graph" itself might have a different meaning than in mathematics. Of course something like wikidata is a directed multigraph (or a tuple of incidence relations on the same nodes). Still I was under the impression that you were talking about optimizing data stores for access at the cost of having to explicitly maintain consistence in your code and not be able to rely on the database properties themselves -- like what happens when you move from SQL to Mongo.


Which graph technology were you using, and were the graphs maintained separately or via some kind of syncing?


I'm experiencing some frustration with this. Click on this principledgraphql.com site and you find it's 'Apollo.' Nearly everywhere you look on the Internet regarding GraphQL you find 'Apollo' being injected.

I'm using GraphQL in production systems. One is Node (graphql-js) and another is Java (graphql-java.) I read the GraphQL specification and adopted the reference implementation and it works great. Am I missing some enormous value by naively just using GraphGL without involving Apollo?

That is a sincere question. What is the deal here?


They're one of the few companies who are building GraphQL developer tools, so they have a financial incentive to have their name associated w/ GraphQL.

This piece, while somewhat valuable, is largely content marketing. First they sell you on the idea of "best practices." Then they'll follow up w/ a tool that, surprise, does all those "best practices" for you. "The modern marketer creates their own demand." Prisma is another GraphQL company that produces content like this.


It's called marketing and they're pretty good at it. They do provide some useful tools however if you want to use them and don't just blindly follow the docs (which are selling you on things you may/may not need).


Apollo's marketing game has been ridiculously on point. They are the only GraphQL client in the ecosystem that is fully compatible with any GraphQL compliant server [0]. And since it is pretty good, whenever one has to talk/market/propose anything about GraphQL in the community, they end up cross-marketing for Apollo.

[0] Relay has its own spec and can be used only if the server supports it. Other GraphQL clients are either not well maintained or do not support as many features as Apollo does (caching, subscriptions are two features that come to mind)


FWIW: You don't have to follow any of the Relay spec to use Relay. Though you'll lose many of the benefits if you don't at least use globally unique 'id' fields on each major type. Connections are optional, the node interface is optional.


Back in the Relay Classic days, the Apollo client library was a lot easier to use and a lot less heavyweight than Relay. My team ripped out Relay and replaced it with Apollo early in development because we felt Relay was too confusing. Relay Modern may be better; I haven't used it.

They're also intending to make most of their money by promoting GraphQL as a tool, so low rates of adoption are an existential risk for their company, while Facebook only needs GraphQL as a PR and recruitment tool.


Ironically Relay Classic only needed a babel plugin, whilst Modern has a separate compiler you need to use. That said, the main issue with Relay is that it still has poor documentation, so it can't really compete with Apollo in that regard.

But I'm not going back to a world without data-masking at the component level, so Relay is the only choice for me right now (though I wish it wasn't).


Thanks for telling it's everywhere. I immediately closed the tab when spotted the first "Apollo". Actually when I saw the title "Principled", I almost not-clicking it. Also these frontend stuff REALLY like "modern" word so much.


I'm trying to think back here, but I've been using GraphQL in production (at two different employers) for over 3 years, and I don't think there's any Apollo code in any of those systems.

Apollo Server is an improvement over baseline graphql-express, but for the most part it's nothing you can't easily add yourself. Having compatibility with Apollo Engine is useful if you end up wanting to use their monitoring tools. Pretty much everything else is just down to how much you want to buy into their ecosystem.


It depends on your needs, on the server side you are probably not missing much. Apollo tracing might be nice if you want some statistics on how your graph is consumed.

The client side depends on the client, we have a Angular SPA that extensively uses Apollo Client together with some code generation tools. The benefits we see from this are:

* The whole schema is automatically defined in frontend, so it is easy to keep types in sync between frontend and backend. * .gql files are scanned, so services that consumes the queries/mutations are automatically generated with correct typing. * The Apollo Client cache makes it easier keep the data updated across different components.

This greatly reduces the boilerplate code our developers have to create, and it makes it much easier to share type definitions from backend to frontend without having to worry about them getting out of sync. We are still exploring different tools, to see what we might benefit from.


1. It's marketing, as someone mentioned. If the first thing that comes up when you search for GraphQL is Apollo, it's good for business.

2. Marketing aside, they are one of the very few (if any) solutions that provide tools that fill the glaring gaps in GraphQL and make it easy(-ish): caching, auth etc.


Let's admit it - GraphQL is NOT simple-and-simply an easy replacement to REST.

I see it's advantages of having auto-documentation for each API. Thumbs up here.

Having said that, the amount of hoops one has to jump before completely adopting it - makes it 'meh'. There are lot of things : cache (clien-and-server), n+1, apollo (why?), deeply nested queries, schema stitching. The amount of patch work one needs to learn is not worth it.


> Let's admit it - GraphQL is NOT simple-and-simply an easy replacement to REST.

I disagree. GraphQL is more difficult to implement than "bad REST", but probably easier to implement than "good REST", partially because there's very little tooling around good REST implementations.

If you just want JSON over HTTP with some status codes, then sure GraphQL is loads more work, but that's definitely not REST.


If only Facebook was not behind GraphQL - this would have faded to oblivion (RIP Parse)

The innumerable patches/packages trying to fix what GraphQL lacks - goes to show it's a good idea at surface level yet poorly thought out from ground up.


"If you just want JSON over HTTP with some status codes"

yes this

Qualifier: have not been able to use graphql yet, and I'd still like to try it out at least before making and judgements


GraphQL costs a lot more to do, but also gives you a lot more.

Just some benefits we've seen:

- Automatic codegen in the client removes a lot of boilerplate.

- Verification of schemas in CI makes sure we don't "go backwards" on the API inadvertently.

- Documentation tooling that makes it easy to show API structure.

- Easy to generate schemas automatically from underlying data models in the backend service (database tables, ORM models, enums, etc).

- Great type safety compared to JSON over HTTP.


I disagree with the notion of `One Graph`.

We have multiple "gateways" for multiple backend services.

We have our main application that has one graph and we have multiple internal applications each owns its own graph.

I don't think the main application needs to know about the other internal graphs, nor should it have access to it (it should not even discover it).


They advocate one graph, because it's too hard to merge graphs in parent-child (many-to-one, many-to-many) relationships (recursively) in a flexible way. This is purely because of technical limitation of GraphQL IMO.


An alternative architecture (following the BFF idea):

* Non-GraphQL services (speaking REST, gRPC, whatever) * GraphQL gateways, owned by frontend teams/applications.

E.g. each frontend team/application owns its own schema owned by these teams. They can compose the backend services however they wish—and whatever is the most natural representation for their domain.


[author here]

Yup, we see this pattern a lot. You get the benefits of fewer bytes on the wire, typed APIs, elimination of data fetching code, etc.

But it leaves a lot on the table. The bigger wins come when new features can draw from all your data (in unanticipated combinations) without new API or services development, when you can make new data available for every team's use just by plugging it into the central graph, and when you have a birds-eye view of exactly how all your data is being used so you know where to make further investments.


Yeah, I'd also add that it is a huge help in ramping up new devs when understanding your schema.


Don't know if this is related, but after initially being excited to hear about GraphQL, I was very disappointed to find that it doesn't seem to offer any way to (in effect) express joins.

Am I missing something here? Has this been addressed?

Thanks.


I would concur. The BFF pattern works really well with GraphQL.

I think 'One Graph' makes sense if that's how you model your data under the covers (e.g. Facebook's graph store)—but if you're in a service-laden world, it makes far less sense.


Likewise. Sometimes it doesn't make sense to stitch graphs together. We have a gateway that proxies different services and exposes their graphs individually.


Access could be limited by authorization. Role in JWT for instance.

You can't control malicious clients so I don't think the access part holds much water.

I think they are advocating a monolith deployment to maintain contracts across ownership and client expectations as a best practice vs the version management of multiple graphs requiring different cross sections of data for each client. It helps keep it consistent.

my 2 cents


One graph does not mean that the different parts have to know about each other. The different parts can be merged together without them knowing and form a single endpoint that the user can query without knowing where that data is coming from.


I agree, if your data spans multiple logical core domains you could potentially have one entry per core domain, auxiliary data that spans domains can be supplied by each core domains issuer in a manner that is logical for that domain.


Just out of curiosity, do you have multiple independent teams working on your app? Do you use a monorepo, and if so, do you feel like it makes refactoring and bug fixes more difficult?


Multiple independent teams might work on each of the graphs.

We don't use a mono-repo, we use a repo per service.

We have shared libraries like https://github.com/globality-corp/nodule-graphql that make generating gateways easy and straight forward.


One question I have is with the "One Graph" idea. I get the principle, and it does seem like it would be nice. However, it's hard for me to imagine what this looks like in practice.

I haven't done anything with GraphQL in a few years. But when I did, we basically had several teams in a part of our organization (one of many in a huge giant megacorp) which built an API with GraphQL. We shared that code base and did the changes in it we needed for different parts of the graph. Sometimes it was fine, and sometimes it didn't go so well (unexpected issues, unexpected deployments when a team didn't prepare properly, etc). So it was an OK idea, but it caused some friction.

A decent of the problems were communication between teams of course. But even if communication was as good as one could imagine, it seems like this model would cumbersome at some point. And I can't imagine it spanning our entire organization, much less our entire megacorp across the board.

So assuming what we did wasn't good, what is a way you might make this work? Do you have a single code base like we did and just be more disciplined? Do you have a project whose job is to stitch together other schemas and pull in submodules? A monorepo for the entire company? Do you actually limit what the idea of the company is at some point to limit what the scope is that this should cover?

This turned out to be more of a ramble than anything. Oops.


[author here]

> So assuming what we did wasn't good, what is a way you might make this work? Do you have a single code base like we did and just be more disciplined? Do you have a project whose job is to stitch together other schemas and pull in submodules? A monorepo for the entire company? Do you actually limit what the idea of the company is at some point to limit what the scope is that this should cover?

I think these are great questions. One approach that looks promising is the idea of modular schemas that can be stitched together into a larger graph. Quite a few large organizations are doing this now. I recommend watching a recent talk by Martijn Walraven for more on this idea: https://www.youtube.com/watch?v=OFT9bSv3aYA


We do this at Artsy in production too, here's an overview of how it works for us: http://artsy.github.io/blog/2018/12/11/GraphQL-Stitching/


So the way we've currently been doing this at work is that we have a few different services that each maintain a portion of the One Graph that fits within their domain. The different APIs get merged together in a specialized service that understands the full picture.

The old way required a lot of manual work to pull in the different service schemas and do the stitching by hand. Luckily the team at Apollo and others are working on a new approach that automates a lot of the old work.

Shameless plug: I wrote a blog post that goes into some detail about what this can actually look like if you want to see more: https://medium.com/@aaivazis/a-guide-to-graphql-schema-feder...


Summary: have one company graph, implemented as federated implementations from different teams, with a registry for tracking the (iteratively evolving) schema. Use per-client access permissions and structured logging, incrementally improving performance as usage grows. Data graph functionality should be a separate tier rather than baked into every service.


We should probably add another point which is "you probably don't need Apollo tools".

I'm using GraphQL in production since 2015 and I feel like the only Apollo contribution to the ecosystem was pushing lot of marketing content around their hacky tools and "best practices", leading teams to poorly designed backends.


Yeah, I haven't seen a lot that has been particularly useful from Apollo so far.

They supposedly have a slightly different recommended way of doing the things Relay tries to do, but it's poorly documented and only compatible with Apollo, whereas the Relay spec is relatively clear (just the spec, I haven't used the Relay library).

Also they seem entirely JS focused. Our backend is Python and our frontend so far is Swift. They have no Python tooling, and their Swift codegen has a lot of issues, our iOS devs use it pretty reluctantly and replace a lot of parts of it, and even that tooling is all in JS, not Swift, which means our iOS builds suddenly need Node, Yarn, NPM, etc, a whole load more complexity, caching, and so on.


I'm currently working on building an open source library for building backend GraphQL APIs in Python. I'd love to hear your use cases! My email is in my profile, if you're interested.


We've been working on doing something similar to create "one graph" at my job (though using our own library instead of graphql [1]).

There are definite difficulties to doing this, you need some strong cohesion between teams. If you try to do it in an agile manner, you're going to end up with a lot of duplication with minor differences. You'd need a way to flag existing similar data when trying to add new branches and nodes.

Our hope is to make our graph explorable and interactive so teams we don't have direct contact with, often in other countries, are able to find the data they would be interested in and see immediately how to read it out.

[1] https://dwstech.github.io/muster/


I think the single graph idea is impractical in any large organization. Every centralized model deteriorates over time due to many reasons: time pressure, carelessness and bigly varying competence levels in federated teams. A couple of years down the line everyone will hate it, and it'll be too big to discard.

Independent team structures and microservices based architectures are acknowledging that such monoliths are impossible to build and maintain.


The point of federating your schemas across different services is that you can achieve the single graph that represents the organization without having the separate parts step on teach other. It doesn't discard microservices, it embraces them.


There's nothing that says "we're really good at making front end development easier" like a website where even after I click into the main frame it completely fails to respond to up/down/pgup/pgdown ... I understand the whole "should work on mobile" thing but it should also bloody well work on a desktop too :(


Thanks, looking into it. I agree and I hope we're able fix this. We're moving all of the Apollo docs to the same system. It's built on Gatsby, which uses GraphQL under the hood.


Cheers. I get really cranky about my keyboard not working (twitter's web UI manages to break it under a bunch of circumstances, which aggravates me at least once a week) but I'm well aware that there's always a billion things that need doing :)


Single graph: Large companies often do not have a single graph. Different experiences and use cases require necessary variation in how teams surface data and do experimentation independently from other teams.

Federated implementation. Implementation being spread across many teams is important, but if that is through federated services being reused by multiple teams, it can create brittle dependencies.

Track schema in the registry: It's difficult to have a single source of truth when it is unable to display the variation based on #1 comment. I don't think anyone has solved this well for really large systems. Good principle but hard in practice!

I think it is really hard for a VC backed company selling a product to describe principles without being biased. This feels very biased toward selling their product.


Principle 8 mentions "demand control", including "estimating the cost of a query before performing it". This is very much in line with GraphQL API Management work we are doing at IBM Research. I recently wrote about this: https://www.ibm.com/blogs/research/2019/02/graphql-api-manag...

I wonder about the other proposal for demand control, namely that untrusted users "should only send queries that have been preregistered by the authenticated developer of the app". To me, that seems to somewhat negate a great advantage of GraphQL, namely giving clients more flexibility in deciding (on the fly) what data to fetch/mutate.


is graphql something anyone really needs unless they're facebook


While using GraphQL does have some benefits when dealing with a massive amount of users, that's not the reason most people choose to use it. GraphQL is simply a drastically different (and IMO better than REST) model for clients to talk to servers.


I don’t think it is - mutations (and state management) are the biggest issue. It’s ok for reads though, although even there tooling for rest is better.


> I don’t think it is - mutations (and state management) are the biggest issue.

How so? I don't think GraphQL mutations are any worse than their REST equivalents, especially once you get more complex than "update these fields on this entity".


I agree with RussianCow, I don't think mutations are in any way harder in GraphQL than they are in REST.

IDK what you mean re: state management. That sounds like a client side concern, nothing to do with the protocol used to communicate with the server.

Additionally, in my experience, the client-side tooling is better for GraphQL, despite it being around for only a fraction of the time of REST. In fact, that's why I chose GraphQL for my projects. I wanted the tooling (Relay, Apollo) that works out of the box with GraphQL, and requires hacks to get it work with REST.


Well, I tested it by creating a little sample android app that fetched a collection of images for display in a grid.

For some reason I found this very simple scenario quite a hassle using a native GraphQL client for Android. And the alternative, ditching the client and making simple http requests was rather clunky when poking around what's returned.

Admittedly I only spent an hour or so on it but...REST just seems drastically simpler to get up and running with.


Have you had to maintain the graphql server


Well, its a fancier API tool, which smooths out the flow between Backend and API consumers better (at a cost).

At its core, it is a project based around a _consumer facing schema_, that can be queried. You put in upfront and ongoing be work to give product/frontend/clients the ability to discuss data requirements around a first-class schema, and then once the resolvers are written, allow them to explore/experiment with the data, in queries that can be used directly in production.

From what I have seen, the first (only?) ones to feel the pain it solves for are Product/Frontend/Clients.

Hopefully that ^ waves away some of the haze


I believe it makes life easier for consumers. What about if you have to maintain the api? And if it is that much harder to maintain is it really a smart choice unless the surface area of your api is the size of say Facebook's?


You don’t “need” almost anything, the point of new tools isn’t usually that you need them it’s that they make your life easier.

Apollo Client makes a lot of things about frontend apps easier, which is possible because of the well-defined and typed interface that graphql enforces. No, you don’t need it, and there are other ways (like maybe using Swagger) that you could get similar typing for a REST interface. But, FWIW I’m not familiar with any tools that are quite as comprehensive as Apollo out of the box that don’t use graphql.


You are being downvoted but I think it's a pretty legitimate question and the answer to it is: Yes.

If you want to have a united interface to multiple services behind the scenes and incorporate ACL or any kind of top-level resource management, GraphQL is a great way to go about doing that.


> If you want to have a united interface to multiple services behind the scenes and incorporate ACL or any kind of top-level resource management, GraphQL is a great way to go about doing that.

I like GraphQL outside of these points as well because it's more opinionated than REST. It means you can get on with things instead of having long disagreements about what is and isn't RESTful.


What is the reasons for a backend service to delegate it’s authentication/authorisation to the graphql in front of it. Making the graphql server a super user for the service Sounds wrong to me


Just wanted to throw out that this is cool amongst the critical feedback. Thanks Apollo and GraphQL for pushing the boundaries. It was fun using it when it was needed. Recently moved onto other pastures but solid progress!


Anyone else's job have Cisco Umbrella flagging this page as possibly a malware distributor?


I think the author doesn't understand what "principled" means.


[flagged]


Sounds like you've been shadow banned. You should contact hn@ycombinator.com and hear if there's been a mistake.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: