Hacker News new | past | comments | ask | show | jobs | submit login

Having worked in big tech and small startups, I think GraphQL is a brilliant way to solve an organizational problem that massive tech companies have.

It's that the team maintaining the API is different from the team that needs changes to the API. Due to the scale of the organization the latter doesn't have the access or know-how to easily add fields to that API themselves, so they have to wait for the maintainers to add the work to their roadmap and get back to it in a few quarters. Relevant Krazam: https://www.youtube.com/watch?v=y8OnoxKotPQ

At a small start-up, if the GET /foo/:fooId/bar/ endpoint is missing a field baz you need, you can usually just add it yourself and move on.

That's the theory. In my experience at both large and small organisations is that NONE of the theory makes it into practice.

Some reasons:

- Front end devs save time by.... sharing queries. So component B ends up fetching records it has no use for because its sharing GQL with component A.

- Backenders never optimise column selection. You may think you are really optimising by sending a GQL query for one column, but the backend will go ahead and collect ALL the columns and then "filter" down the data that was asked for.

- Backenders can also forget to handle denormalisation. If you query related many to many records but the GQL only asks for related ids of implementations will go ahead and do a full join instead of just returning results from the bridge table.

- Frontenders aren't even aware you can send multiple graphql GraphL requests simultaneously.

GraphQL is great, but any technology is limited by how well people can extract its value. I personally feel sometime we'd be better off with REST, or at least make sure people receive the training to use GraphQL effectively.

> Front end devs save time by.... sharing queries. So component B ends up fetching records it has no use for because its sharing GQL with component A

An unfortunate problem that really only exists with Apollo. Facebook’s graphql client, relay, does not have this issue as it requires each component to explicitly declare its data dependencies.

> really only exists with Apollo

Citation needed. If two components call into the same data fetching utility for expediency's sake, and that utility queries data that one or both components do not need, you have this problem. What makes that uniquely likely with Apollo?

It’s more that compared to Relay, which is explicitly designed around preventing components using data they didn’t explicitly declare as a dependency, Apollo (along with most other GraphQL clients) doesn’t go that extra mile.

Whether it’s a React component, or a just a plain old function, Relay has mechanisms to prevent the kind of problem you’re describing. It has a steep learning curve, but it’s very well considered.

> Facebook’s graphql client, relay, does not have this issue as it requires each component to explicitly declare its data dependencies

In the scenario described, wouldn't this lead to the same problem because someone copied the code or reused a utility from another project and so they have at least the superset of those dependencies plus whatever new ones they added?

In many ways you also need to be a massive tech company to not create a massive scalability problem. The first time someone ships a shitty query to a large user base on a mobile app you are now dealing with the consequences of a frontend engineer creating a bad query you can not kill quickly any more.

Making scalable, well performing queries work is nontrivial, particularly with the current ecosystem of GraphQL libraries. The main workaround for this provided appears to be directly mapping GraphQL to an ORM.

I used to see GraphQL (and to an uglier respect Soap like interfaces) as complicated solutions to that problem you describe.

But more and more, I think Backend For Frontends solve this issue in a much better way. And of course that idea isn’t new and Yahoo for instance had that kind of architecture.

Frontend teams get to adjust by themselves a simple interface to their needs, and backend teams can provide more info through internal APIs with less restrictions than if it was directly exposed to the outside.

I'm not following if you think GraphQL is a bad fit still, but we used GraphQL with the BFF pattern, and it was nice to use from Frontend to BFF. The backend services would use REST or whatever appropriate behind the BFF.

I see GraphQL as unneeded if you already have a BFF managed by front teams.

Going with basic REST gives you simpler caching/optimisation paths, more straightforward mapping between the front request and the backend calls, and it makes it easier for other teams to look at what you’re doing and comment on/fix stuff as needed. GraphQL would be pure syntax sugar, and I’m not sure it would be worth the trade-off.

> more straightforward mapping between the front request and the backend calls,

I disagree, at least compared to something like Apollo Server or Hasura. The mapping is much easier, especially if you consider things like resolvers having parts of the schema going to different backend servers, API endpoints, separate caching, etc.

We tried to do this manually via a REST server, and found that we were just reinventing something like Apollo, but badly.

You can use GraphQL for BFF btw.

Do you have any reference material for the Yahoo architecture?

It was pretty basic so I’m not sure how much it was talked outside of Yahoo. I see this presentation http://www.radwin.org/michael/talks/php-at-yahoo-zend2005.pd... providing an overview, but otherwise the basic principle was to have each addressable web server host an apache with a routing module, and that module will map a request path to a cgi file.

With that structure you can have any number of layers with your front call and the different business abstractions all representated by an API (let’s say you want a user’s average engagement with a service, you’d hit the high level API, which will fetch access stats from another API, which rely on a lower level API which goes through another separate layer managing DB cache etc.

Most of these call are of course internal to a data center.

I think an other thing with graphql is it reduces friction when trying to discover what your API should be.

So what you can do is some sort of generative graphql thingie when doing your initial iteration, with the client hitting whatever is convenient (in that situation you'd just expose the entire backend unprotected).

Once the needs have gelled out you strip it out and replace the graphql queries by bespoke API endpoints.

In my experience, "let's do something non-scalable and obviously wrong while we're exploring the problem space and replace it with something better before shipping" reduces to "let's do something non-scalable and ship to production" 100% of the time.

So you are saying it works? At least all the production systems where we’ve done this work fine, they’re just inefficient, feel vaguely dirty, and are unpleasant to work on. They’re still bringing in boatloads of money though.

I think the risk with unrestricted access to your database via GraphQL is that it's potentially a very easy DOS attack sitting waiting to be abused, and potentially a security risk if there's some information you shouldn't be exposing (as a very easy example, allowing unrestricted access to a user database would reveal password hashes, reset tokens, etc. which should never be available in a API).

I'm a fan of YAGNI, but basic security and leaving your system trivially vulnerable to attacks are a couple of exceptions in my mind.

I just have never had this as a problem having worked on many APIs at many companies. Usually we decide what we want to work on and the frontend/full stack can read the documentation / chat with the backend engineers if it’s not clear. At no point is “discoverability” a issue

The discoverability I’m talking about is not about knowing what the API is, it’s finding out what the API should be.

100% agree that it’s about dependence on other teams. That said, I’d much rather that we were communicating across a well defined api boundary, rather than a graphql api. You could, of course, very easily do this with an api layer in the middle.

Nobody seems to get the idea of building software out of pieces with well defined APIs any more </rant>. I would say it's not possible to build large software without adhering to this principle but I seem to be proven wrong. You can build large poor quality software and just throw more people at it.

The other part about team dependence is very true but it also shows a lack of knowledge/thinking/care by whoever formed the teams. It seemed for a while Amazon had things right both in terms of boundaries of teams and in terms of forcing people to use APIs- not sure what they do these days.

This is one of the strengths of gRPC, it forces and centralizes the (mostly type safe) API design from the get-go.

Also tends to use a lot less bandwidth.

The biggest issue with GRPC is that it is only suitable for stateful languages (iow, languages that can hold values and share them between requests). GRPC is basically worthless for stateless languages and unusable. These stateless languages also don’t work well for websockets either, so it is what it is.

Until they solve the stateful part, I’m not using it or recommending it to be used anywhere. Bandwidth is cheap, type safety is overrated, and compute is expensive.

What's a "stateful language"? Can you give an example here? gRPC is orthogonal to whether an API relies on state or not.

Is this bait? Lol

What is a well defined API to access a lot of related datasets if you have 100s of external users, using it for 10s of different types of use cases?

Compare it to a database, what if you couldn't use random queries with SQL, but only had the option to call stored procedures?

It's the narrowest abstraction fitting those use cases. A database by its nature is a generic component. So sure, the piece of your software that's "SQL database" has a SQL interface, pretty quickly you'd want some abstractions on top of that around the different uses of that database.

The problem is when genericity diffuses its way into a large system it becomes impossible to maintain. How do you refactor a code base when everything everywhere is just SQL queries. If you want to change the schema how do you know you're not breaking anything? The short answer is you don't and so the software becomes incredibly brittle. The common workaround is testing but you can never test everything and now your tests also become coupled to everything else making things even more difficult to change.

The database in your example, while being generic is already an abstraction of sorts. Now if you're building lessay gmail the external users should see "create email", "get all emails", vs. issuing SQL queries to the database. That makes it easier to change the two pieces (client and server in this simplified example).

If that field isn’t populated aren’t you in the exact same spot?

Let's say you need to get a field back that is already in the database table, but that wasn't previously returned by the GraphQL endpoint. All you have to do on the front end is ask for it and GraphQL will populate it for you on the server.

If your GraphQL schema is just a mapping of database tables, in my experience you are in for a world of hurt in the future.

At my workplace they made this decision before I started and I can fully agree with this. It's essentially a typed REST without any of the benefits. No joins, everything is multiple calls away to perform a "full" query.

I don't even want to think about undoing this mess.

It's possible to do this without it getting painful, but you need to annotate the database schema with a lot of meta data.

We don't use GraphQL, but we do use an API that is mostly generated from meta data about the schema and permissions on a per field basis, with the ability to override on a per table basis.

To the API consumer it's invisible if they're referring to something that refers directly to a real database columns or to a method on a model class that doesn't correspond directly to the database (e.g. the user "password" attribute is

Effectively there are two schemas: the API schema and the database schema, it's just that the API schema is "prepopulated" from introspecting the database schema using Sequel (Ruby ORM), with the model classes translating between the two, with a synthesised default mapping.

The "API schema" includes more granular type information suitable for the frontend, and permissions, including type information returned to the frontend to provide default client side form rendering and validations (also enforced on the server, of course). It also auto-generates documentation pages with example data, inspired by Airtable's doc pages.

But key to avoiding what you describe is that these are all easily overridable defaults, and the permissions default to allowing no columns, so while the db schema introspection makes it quick to expose things where a direct mapping makes sense, it also makes it easy to avoid.

Unlike GraphQL we explicitly avoided allowing you to request arbitrary complex sets of data, but what you can do is expose queries by defining model metadata for them that either wraps suitable views or custom queries. We have a UI to allow us to define and store custom queries and views for e.g. reporting needs, so we can prototype new API needs in the browser writing just queries with some metadata annotation.

It gets us the flexibility of being able to quickly return exactly the desired data, while retaining control over the queries and complexity.

A world where the front end can access any database field it wants sounds like a security / privacy nightmare to me.

Of course there are ways to prevent data from being returned but that’s fragile.

This isn’t remotely a problem. Field by field granular security is trivial to implement in GraohQL

I have to disagree with you there. It is possible, but it causes other annoying problems.

For example, field-level security pretty much means every field could be null at any time. Depending on your graphql server implementation, this might cause an entire request to fail rather than just that field to be omitted, unless you change your schema to where everything is nullable.

Checking every field can also easily lead to performance issues, because it’s not uncommon for a large, complex graphql request to have hundreds or thousands of fields (particularly when displaying lists of data to a user).

Not to mention GraphQL wasn’t designed with security and user-state in mind. It was an afterthought that was bolted on, varying from framework implementation to implementation.

How the heck was that not on their minds from day 1? It's the most obvious question to ask about a project like that.

It’s from before the https-everywhere days, or around the same time letsencrypt was started up, IIRC. Back then, I feel like security wasn’t as big of an issue, at least for less sensitive things. Like literally the entire site would be http until you got to checkout and the only reason you had the certs was to be PCI compliant.

GraphQL is mostly concerned with the query semantics.

A proper solution to security/privacy issues should have sensitive data never reach the outermost GraphQL layer.

So the problem is with the existing tooling that enables GraphQL implementations, but like anything else, if that tooling is deficient, the entire approach is on the shaky ground too.

> unless you change your schema to where everything is nullable

At my current job, this was done before I was involved. It isn’t a deal breaker, but it throws away one of the best features of GraphQL.

In the end you just have every client implement the rules that should have been in an API tier (if they are competent), or worse no validation that gets you a giant mess.

How does GraphQL help here?

Assuming the backend actually supports mapping of that particular field.

For external users of the API this can be quite helpful when you’re looking for the password column on the users table.

For some reason I don’t think graphql actually works this way. Can’t quite put my finger on why allowing access to any column on a table might be a really bad idea.

Putting passwords in a database, and that database behind some kind of service that allows queries, is a stupid mistake that can be implemented with SOAP, CORBA, a remote shell, or any other protocol or API style.

I don't think GraphQL makes the problem worse except by encouraging experimentation by putting an unusually powerful query language in the hands of the users

Ancestors of your post are suggesting exposing entire DB schemas (I would assume mechanically). While that could also be the case in other protocols, typical an IDL is used to separately define the API layer. Of course it’s completely possible to generate a WSDL, etc. from a DB schema, in practice I’ve never seen it done.

My point is that passwords shouldn't be stored in the "normal" database where some clever architect might expose entire DB schemas to external access.

If clever architects manage to expose the carefully segregated database of the small and secure authentication module, they cannot claim it was an accident or someone else's fault,

>Can’t quite put my finger on why allowing access to any column on a table might be a really bad idea.


Why not just query the database directly then?

+1 This is it. Great for internal API's, not so much for public facing ones.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact