
The GitHub GraphQL API - samber
http://githubengineering.com/the-github-graphql-api/
======
niftich
Neatly coinciding with GraphQL's announcement 'Leaving technical preview' [1].

In hindsight, sending a query, written in a query language from client to
server seems obvious. So obvious, that I think I've seen it before...

    
    
      select login, bio, location, isBountyHunter
      from viewer
      where user = ?
    

It's ironic to me that it took Facebook reinventing SQL (or a graph database
equivalent thereof [2][3][4]) and Github embracing it to legitimize this
practice, since if you were doing this before, you were judged in the eyes of
your peers and clients for not being "RESTful" (the fake-REST kind [5]), as if
everyone was just itching to PUT and DELETE blobs of JSON of your poorly
mapped resources to quasi-hardcoded, templated [6][7] URLs.

What's old is new again, but this time I'll take it.

[1] [http://graphql.org/blog/production-
ready/](http://graphql.org/blog/production-ready/) [2]
[https://neo4j.com/developer/cypher-query-
language/#_about_cy...](https://neo4j.com/developer/cypher-query-
language/#_about_cypher) [3]
[http://tinkerpop.apache.org/](http://tinkerpop.apache.org/) [4]
[https://www.w3.org/TR/sparql11-query/](https://www.w3.org/TR/sparql11-query/)
[5]
[https://news.ycombinator.com/item?id=12479370#12480408](https://news.ycombinator.com/item?id=12479370#12480408)
[6]
[http://swagger.io/specification/#pathTemplating](http://swagger.io/specification/#pathTemplating)
[7]
[http://raml.org/developers/raml-200-tutorial#parameters](http://raml.org/developers/raml-200-tutorial#parameters)

~~~
esfandia
Also reminds me of CMIP and GDMO in old ITU network management standards in
the mid 90s. They were RESTful, object-oriented, and you could make some
pretty expressive queries with them. The standards failed probably because
they were way ahead of their times: too many new concepts, too complicated
compared to SNMP, and the documents were a very boring read.

~~~
fowl2
Or CAML... _hides_

------
brblck
Special shout out to the open-source contributors and members of the community
who helped us build this:

[https://github.com/rmosolgo/graphql-
ruby](https://github.com/rmosolgo/graphql-ruby)

[https://github.com/shopify/graphql-batch](https://github.com/shopify/graphql-
batch)

[https://github.com/github/graphql-client](https://github.com/github/graphql-
client)

[https://github.com/graphql](https://github.com/graphql)

We <3 your work and are thrilled to have built this with you!

Please make sure to give us feedback during this alpha stage!
[https://platform.github.community/](https://platform.github.community/)

~~~
mcx
Are you guys using rails w/ graphql-ruby? Would love to see a blog post with
more details about the backend implementation!

~~~
mst
The post links [https://github.com/github/github-graphql-rails-
example](https://github.com/github/github-graphql-rails-example) \- "a small
app built with Rails that demonstrates how you might interact with our GraphQL
schema."

~~~
jaredsohn
This just shows that they've made use of a graphql Rails client; the GP asked
about the backend which could be in a different language.

------
salex89
I'm impressed, but for other reasons. For once, I have no idea how to properly
implement this. I mean, it really looks like a lot of trouble mapping this
from GraphQL to.. .SQL? And what if the system is using some kind of NoSQL
database which does not really have a very verbose query language, if any?
Complexity just seems to explode. Somehow I feel there is also a risk for the
client to make a quite sub-optimal query. So, probably some kind of policy
should be implemented. All in all, there is a level of management ability that
looks lost to me with if GraphQL is implemented improperly, and to be honest,
it looks like it is easy not to be. I'm really looking forward to some book or
guide, since the implementation is puzzling to me.

~~~
postila
Does anybody considered this problem at all? (Giving too much flexibility to
client and allowing non-optimal queries like joining several big tables or
data collections w/o proper index support.) It's so weird that all materials I
saw about GraphQL hushed up this question which is essential for the future of
this technology.

And it's so similar to ORM's issues all the industry experienced past 20
years. But perhaps more dangerous due to the public nature of many APIs.

~~~
orasis
Instead of thinking of it as parsing queries, think about it as nested RPC.
It's quite reasonable for an implementor to set a time limit or call limit to
keep algorithmic complexity attacks under control.

~~~
postila
Ok, but nested loop in so many situations is totally losing to other
algorithms to join data (like merge join, hash join) when you deal with large
datasets, right? So again, it will be inefficient by default, for so many
cases.

~~~
JaRail
If you're talking about GraphQL, the implementation is undefined by default.
It's just a query language spec. There isn't a "GraphQL Server" product. You
can resolve data from many sources. It could be a no-sql database, sql-backed
hadoop cluster, etc.. it's very much just the language the client talks to the
server in.

If the client requests exactly what it needs, that shouldn't be more stressful
on the server-side than spamming REST requests for all the same resources.
Plus, it's easier to optimize when you know what the client wants. If there's
something expensive, you could, for example, cache/index something extra. If
the client were doing it themselves with a series of REST calls, you wouldn't
be able to understand the real use-case. Even if you did know what aggregation
they really needed, you wouldn't be able to fix the problem without updates to
both the service and the clients.

Either way, it's easier to set sane limits than craft un-DOSable APIs. There
is always a cost to satisfying queries. If you're trying to run a free
service, it's a much bigger concern. If you're paying the bill, you're
incentivized to investigate expensive/slow calls.

~~~
salex89
Disclaimer: I'm just talking from a REST developer perspective.

The nice thing of REST calls in the current form is that they are that - just
calls. With proper monitoring you could just see which ones do you get more or
less and with these or those parameters. They can be optimized as best
possible, but separately. You are right, it needs more analytics to figure out
a series of calls (based on some token?) and maybe bundle them up, introducing
a new endpoint (thus not breaking old clients).

But yet again, that is that one "query". With GraphQL it could be anything,
and that's what bugs me. I find it challenging, in a good way.

Another thing what I'm also not sure about are the queries themselves, or
rather, the number of different ways you can write a query. Multiple users can
request the same data, or almost same, with queries written in different ways.
Backend developers should then guarantee that those queries will be executed
in a similar way, with predictable performance. I guess in a similar way SQL
query optimization does. I had the "joy" of working with a database that had
hugely different performance just with trivial changes in the query (it was
not relational, actually it is discontinued now, thankfully). It was a huge
PITA. I wouldn't like to serve an API like that.

------
aturek
I'd love to hear how Github is doing ACL here. We came up with a pretty neat
solution on my team, which we have not yet open-sourced, for JS. But it was a
lot of first-principles design work; there don't seem to be any good examples.

This was pretty much all the documentation we had, and it's more a design
analysis of edge-vs-node authorization: [https://medium.com/apollo-stack/auth-
in-graphql-part-2-c6441...](https://medium.com/apollo-stack/auth-in-graphql-
part-2-c6441bcc4302#.he0radbju)

Edit: Our eventual solution looked a lot like

    
    
        class SomeTypeOfResolver {
          @allowIfAny(rule1, rule2, rule3)
          someProperty;
    
          @allowIfAll(rule4, rule5)
          otherProperty = defineRetrieverFunction();
        }

------
Kwastie
Since the announcement of GraphQL I've been waiting for some 'real world'
apis. (Sure the Star Wars GraphQL apis are fun)

Does anyone know any best practices if you want to adopt this is in an
existing application using a relation database (i.e. PostgreSQL). I don't know
how to implement this without causing N+1 queries. (or Worse).

For example:

{ Post {

    
    
        title,
        content,
        Author {
          name,
          avatar,
        },
    
        Comments(first:10) {

.. } } }

A naive implementation would cause a lot of query, for each "edge" a query.

~~~
brblck
Yup. That's something we can handle internally, under the hood to batch
database requests into a single query for all edges. This problem is actually
_easier_ to solve in GraphQL then it is with a traditional REST API.

~~~
arianvanp
Exactly, now you put the onus to solve N+1 on the serverside, instead of in
the old case the client would do N+1 requests...

------
bonaldi
Fascinating this. But glosses a bit over the cost of generating bespoke
responses to every request. Wonder how it works if expensive queries are
implied in the request. You also need smarter caching, I imagine.

~~~
brblck
This definitely enables a lot of opportunity to do both smarter querying and
smarter caching on the back-end.

While you can indeed perform larger, more complex requests, GraphQL by nature
forces queries to explicitly ask for everything you want to get back. As a
result, we're not wasting any capacity giving you back a bunch of data for an
entire object that you don't need like we would in a normal REST API request.

The thing that I'm most excited about with all this is the fact that we're
building new GitHub features internally on GraphQL as well. This means that
unlike a traditional REST API, there will no longer be any lag time between
features in GitHub and the GitHub API.

API is a first-class product now. API consumers get features as soon as
everyone else!

Please make sure to give us feedback during this alpha stage!
[https://platform.github.community/](https://platform.github.community/)

~~~
postila
There is a lot of cases when user needs just 20 rows from billions but getting
them is very hard problem. One of such examples is well-known: it's twitter-
like data model, where a user can "follow" thousands of others and you need to
fetch top N recent posts from all that thousands streams.

In Postgres, straightforward approach to query such data is based on JOINs and
it's absolutely inefficient. This can be dramatically optimized with recursive
CTEs, arrays and loose index scan approach, but GraphQL by default it will do
straightforward JOINs, right?

I hope GraphQL has (or will have soon) ways to overwrite/redefine queries, but
again, it leads us to the same problems "patch driven development" that
everyone hated in ORMs during decades. That's why I'm saying that GraphQL is
"a new ORM", but it's more dangerous due to it's openness and proximity to web
users, that's why it can bring even more dev and devops pain to the world that
ORMs did during decades.

------
avitzurel
I've been looking into implementing something like this @ Gogobot as well.
This eliminates the need for all `/v/` type API versioning. The client
requests what it needs for this request.

Experimenting with this we often saw 50-70% reduction in the payload being
sent to the clients in some requests. If I only need the first, last and
avatar from the User object there's no need for my response payload to suffer
because other requests need 30 fields from the same object.

Implementing this without causing a lot of N+1 queries is the tricky part and
that's where we're currently investing most of our time.

Awesome to see Github adopting this and releasing it to the public API.

~~~
brblck
I _theory_ a GraphQL API can operate verison-less utilizing things like
deprecation notices and field aliasing to smooth over any rough edges. Once we
see calls on a certain thing reach zero and sustain that level, we can
actually remove it and never have to bump a version anywhere.

That's the dream. We'll see how reality plays out.

For reference, we actually launched with some deprecated fields (see
"databaseId" on the "Issue" type -- database IDs will be phased out for global
relay IDs eventually) if you want to see what they look like.

~~~
leebyron
That theory is Facebook's practice. Four years later and we're on GraphQL API
version 1. We add things which is safe, deprecate fields which we want to
remove, and delete when hit rates drop to 0.

Granted, our clients are all Facebook engineers, so we have some pull in
helping the migration away from deprecated fields, and GitHub will have to
find the right process which works for a broader set of API consumers, but not
only is this theory a good one, it's considered GraphQL best practice.

------
vcarl
Awesome to see GraphQL get some mainstream adoption, hopefully this leads to
some more community tools for consuming it :) Relay is an awesome concept, but
the learning curve is pretty steep.

~~~
dan_ahmadi
Check out these other API consumption clients:
[http://graphql.org/code/#graphql-clients](http://graphql.org/code/#graphql-
clients)

------
tiles
Is this then the GitHub v4 API? Should we expect the REST API to be deprecated
in the future?

------
dan_ahmadi
GraphQL increases speed: "Using GraphQL on the frontend and backend eliminates
the gap between what we release and what you can consume"

------
repole
I'm intrigued by GraphQL, but I don't understand what separates it from
passing "fields" and "embeds" parameters in a REST API. I don't see what about
it would be inherently easier to implement either.

I've sparingly in my free time been working on a project that does exactly
this with a REST API[1]. It's in an entirely unfinished state, but the linked
documentation is a decent example of the types of queries possible.

[1]
[http://drowsy.readthedocs.io/en/latest/querying.html](http://drowsy.readthedocs.io/en/latest/querying.html)

~~~
niftich
For me, it's easiest to pretend that GraphQL is a DSL (a domain-specific
language) that offers special syntax to make it easier to implement certain
things.

\- You can pretend that each GraphQL query is a JSON object (which, it
actually is)

\- You can pretend that each GraphQL schema that you declare is actually a
JSON-Schema document, which some people use to specify in a machine-readable
way your API's inputs and outputs will look

\- You can pretend that each GraphQL resolver, which the piece of code you
have to write (on the server) to actually dig up the result of a query, is a
function that parses your incoming JSON, validates it against your JSON-
Schema, and then reaches out to your datastore to produce a result. You'd then
have to construct another JSON document which matches your response schema,
stuff the data in it, and return that to the user. Except that in GraphQL, you
only have to supply the resolver, the rest is handled by the framework.

You can of course do this by hand and many people do (most obviously when you
see APIs that include arguments like "operator=eq" or "limit=100" or
"page=25"), but GraphQL gives you the tools to do this with less effort, and
end up with a cleaner API by passing everything in the query body. And the
GraphQL server saves you from having to manually build up the JSON text of
every single response.

Reading through your docs, I've seen this style of API in enterprise settings
where there was a backing relational database and the designers were basically
trying to expose the underlying database through HTTP. It can get the job
done, but GraphQL gives you nicer abstractions, a cleaner way of passing
parameters, and conveniences like a real type system (known both on the client
and server side) and you only have to supply your resolver function
implementations.

~~~
repole
I totally see the appeal on the surface level of a nicer/cleaner looking query
language and API.

Still, pretty much everything else you mentioned seems doable with REST. I'm
using Marshmallow schemas in Python which seem to act in a similar way on a
field by field basis as a GraphQL resolver does. I'm not sure what exactly I'd
be getting outside of a slightly nicer/cleaner abstraction by moving to
GraphQL, but maybe that's enough?

------
honzajde
GraphQL API Explorer wants this permission:

Public and private This application will be able to read and write all public
and private repository data. This includes the following:

Code Issues Pull requests Wikis Settings Webhooks and services Deploy keys

Why'O whyai?

~~~
helfer
Because GitHub's GraphQL API will let you do all of those things via the
GraphQL API Explorer ;-)

------
wehadfun
Slightly off topic but GraphQL vs Odata?

~~~
robzhu
They're similar; both technologies allow clients to specify the data they
need. I would say that GraphQL is more flexible and has a strong type system.
For example, the filter semantics are defined within OData, while in GraphQL,
you define how your data can be filtered within your schema.

As a personal opinion, I also feel OData exposes an API that is too tightly
coupled to the persistence layer. GraphQL objects and properties are all
backed by arbitrary "resolver" functions, which means you can stitch together
multiple/legacy backends to generate your response.

------
wehadfun
What method do you use to get data. You can't put a body in GET methods so I
assume with GraphQL you use POST to get data?

~~~
andyfleming
They use POST ( [https://github.com/github/graphql-
client/blob/9e1fa16cf88de4...](https://github.com/github/graphql-
client/blob/9e1fa16cf88de450483f79f89f2be19fb0e09b4b/lib/graphql/client/http.rb#L61)
) which, as far as I know, is common for query APIs like this.

~~~
robzhu
You can send a GraphQL query as part of the query in your HTTP GET request:
[http://graphql.org/learn/serving-over-http/#get-
request](http://graphql.org/learn/serving-over-http/#get-request).

