It's that the team maintaining the API is different from the team that needs changes to the API. Due to the scale of the organization the latter doesn't have the access or know-how to easily add fields to that API themselves, so they have to wait for the maintainers to add the work to their roadmap and get back to it in a few quarters. Relevant Krazam: https://www.youtube.com/watch?v=y8OnoxKotPQ
At a small start-up, if the GET /foo/:fooId/bar/ endpoint is missing a field baz you need, you can usually just add it yourself and move on.
- Front end devs save time by.... sharing queries. So component B ends up fetching records it has no use for because its sharing GQL with component A.
- Backenders never optimise column selection. You may think you are really optimising by sending a GQL query for one column, but the backend will go ahead and collect ALL the columns and then "filter" down the data that was asked for.
- Backenders can also forget to handle denormalisation. If you query related many to many records but the GQL only asks for related ids of implementations will go ahead and do a full join instead of just returning results from the bridge table.
- Frontenders aren't even aware you can send multiple graphql GraphL requests simultaneously.
GraphQL is great, but any technology is limited by how well people can extract its value. I personally feel sometime we'd be better off with REST, or at least make sure people receive the training to use GraphQL effectively.
An unfortunate problem that really only exists with Apollo. Facebook’s graphql client, relay, does not have this issue as it requires each component to explicitly declare its data dependencies.
Citation needed. If two components call into the same data fetching utility for expediency's sake, and that utility queries data that one or both components do not need, you have this problem. What makes that uniquely likely with Apollo?
Whether it’s a React component, or a just a plain old function, Relay has mechanisms to prevent the kind of problem you’re describing. It has a steep learning curve, but it’s very well considered.
In the scenario described, wouldn't this lead to the same problem because someone copied the code or reused a utility from another project and so they have at least the superset of those dependencies plus whatever new ones they added?
Making scalable, well performing queries work is nontrivial, particularly with the current ecosystem of GraphQL libraries. The main workaround for this provided appears to be directly mapping GraphQL to an ORM.
But more and more, I think Backend For Frontends solve this issue in a much better way. And of course that idea isn’t new and Yahoo for instance had that kind of architecture.
Frontend teams get to adjust by themselves a simple interface to their needs, and backend teams can provide more info through internal APIs with less restrictions than if it was directly exposed to the outside.
Going with basic REST gives you simpler caching/optimisation paths, more straightforward mapping between the front request and the backend calls, and it makes it easier for other teams to look at what you’re doing and comment on/fix stuff as needed. GraphQL would be pure syntax sugar, and I’m not sure it would be worth the trade-off.
I disagree, at least compared to something like Apollo Server or Hasura.
The mapping is much easier, especially if you consider things like resolvers having parts of the schema going to different backend servers, API endpoints, separate caching, etc.
We tried to do this manually via a REST server, and found that we were just reinventing something like Apollo, but badly.
With that structure you can have any number of layers with your front call and the different business abstractions all representated by an API (let’s say you want a user’s average engagement with a service, you’d hit the high level API, which will fetch access stats from another API, which rely on a lower level API which goes through another separate layer managing DB cache etc.
Most of these call are of course internal to a data center.
So what you can do is some sort of generative graphql thingie when doing your initial iteration, with the client hitting whatever is convenient (in that situation you'd just expose the entire backend unprotected).
Once the needs have gelled out you strip it out and replace the graphql queries by bespoke API endpoints.
I'm a fan of YAGNI, but basic security and leaving your system trivially vulnerable to attacks are a couple of exceptions in my mind.
The other part about team dependence is very true but it also shows a lack of knowledge/thinking/care by whoever formed the teams. It seemed for a while Amazon had things right both in terms of boundaries of teams and in terms of forcing people to use APIs- not sure what they do these days.
Also tends to use a lot less bandwidth.
Until they solve the stateful part, I’m not using it or recommending it to be used anywhere. Bandwidth is cheap, type safety is overrated, and compute is expensive.
Compare it to a database, what if you couldn't use random queries with SQL, but only had the option to call stored procedures?
The problem is when genericity diffuses its way into a large system it becomes impossible to maintain. How do you refactor a code base when everything everywhere is just SQL queries. If you want to change the schema how do you know you're not breaking anything? The short answer is you don't and so the software becomes incredibly brittle. The common workaround is testing but you can never test everything and now your tests also become coupled to everything else making things even more difficult to change.
The database in your example, while being generic is already an abstraction of sorts. Now if you're building lessay gmail the external users should see "create email", "get all emails", vs. issuing SQL queries to the database. That makes it easier to change the two pieces (client and server in this simplified example).
I don't even want to think about undoing this mess.
We don't use GraphQL, but we do use an API that is mostly generated from meta data about the schema and permissions on a per field basis, with the ability to override on a per table basis.
To the API consumer it's invisible if they're referring to something that refers directly to a real database columns or to a method on a model class that doesn't correspond directly to the database (e.g. the user "password" attribute is
Effectively there are two schemas: the API schema and the database schema, it's just that the API schema is "prepopulated" from introspecting the database schema using Sequel (Ruby ORM), with the model classes translating between the two, with a synthesised default mapping.
The "API schema" includes more granular type information suitable for the frontend, and permissions, including type information returned to the frontend to provide default client side form rendering and validations (also enforced on the server, of course). It also auto-generates documentation pages with example data, inspired by Airtable's doc pages.
But key to avoiding what you describe is that these are all easily overridable defaults, and the permissions default to allowing no columns, so while the db schema introspection makes it quick to expose things where a direct mapping makes sense, it also makes it easy to avoid.
Unlike GraphQL we explicitly avoided allowing you to request arbitrary complex sets of data, but what you can do is expose queries by defining model metadata for them that either wraps suitable views or custom queries. We have a UI to allow us to define and store custom queries and views for e.g. reporting needs, so we can prototype new API needs in the browser writing just queries with some metadata annotation.
It gets us the flexibility of being able to quickly return exactly the desired data, while retaining control over the queries and complexity.
Of course there are ways to prevent data from being returned but that’s fragile.
For example, field-level security pretty much means every field could be null at any time. Depending on your graphql server implementation, this might cause an entire request to fail rather than just that field to be omitted, unless you change your schema to where everything is nullable.
Checking every field can also easily lead to performance issues, because it’s not uncommon for a large, complex graphql request to have hundreds or thousands of fields (particularly when displaying lists of data to a user).
A proper solution to security/privacy issues should have sensitive data never reach the outermost GraphQL layer.
So the problem is with the existing tooling that enables GraphQL implementations, but like anything else, if that tooling is deficient, the entire approach is on the shaky ground too.
At my current job, this was done before I was involved. It isn’t a deal breaker, but it throws away one of the best features of GraphQL.
In the end you just have every client implement the rules that should have been in an API tier (if they are competent), or worse no validation that gets you a giant mess.
For some reason I don’t think graphql actually works this way. Can’t quite put my finger on why allowing access to any column on a table might be a really bad idea.
I don't think GraphQL makes the problem worse except by encouraging experimentation by putting an unusually powerful query language in the hands of the users
If clever architects manage to expose the carefully segregated database of the small and secure authentication module, they cannot claim it was an accident or someone else's fault,