Hacker News new | past | comments | ask | show | jobs | submit login
How to GraphQL with Ruby, Rails, Active Record, and No N+1 (evilmartians.com)
171 points by progapandist on Nov 9, 2020 | hide | past | favorite | 84 comments



GraphQL is so much heavier, code wise, than REST for Rails apps that it feels like a bad fit to me. Notice that these examples have created a completely parallel schema - new classes for every object in the GraphQL graph, which are analogous but separate from the Rails models, each one acting like a combination of controller and presenter for its Rails model, and breaking the ORM encapsulation to write efficient SQL queries.

I think if GraphQL had existed in the early days of Rails, the natural assumption would have been that ActiveRecord model classes should double as GraphQL result objects, and there would be a nice DSL for specifying how to safely expose those objects to the API. But I haven't seen anyone try to build that - maybe the feeling is that Rails is _complete_ so new responsibilities need to live somewhere else


The problem is GraphQL, fundamentally.

GraphQL creates the illusion for the consumers that arbitrary queries are cheap, possible, and transparent. None of those things are true. Whether that reality is exposed to the client through more restricted REST endpoints or whether the backend has to support this fiction by handling all these performance considerations in GraphQL in the backend, the fact is that some queries are cheap, others are near impossible to achieve efficiently.

So GraphQL doesn't really _reduce_ complexity; it merely pushes it into other places. Whether the right place is the client or the server depends on your organization and the acceptable engineering tradeoffs, but there's no free lunch when it comes to data locality.


GraphQL, from what I see, is a solution designed by tech that is still best meant for big tech. Facebook had their reasons for coming up with it, because their user base is hitting the billions in scale.

"But my mobile app makes too many requests!" says the hopeful engineer, "and graphql means we can optimise the network!"

Well, yeah, it probably looks like that when you see one nice request from the browser dev-tools, rather than dozens, but that's not what it will look like on the server side, or in the database, when you can get into situations with quadratic queries, never mind N+1. And yes, I've seen that with my own eyes.

If you're sending a dozen or so requests to the server to render a screen in a mobile app, then the slowest thing might just be the stage where you spin up a connection and do the old TLS dance. Unless you're hitting Big Tech scale that's probably acceptable versus the insane amount of money you'll dump into building and maintaining a GraphQL-based architecture without any data proving that this approach would actually be an optimisation.

Just like Kafka, Kubernetes, etc. which have come out of incredibly large-scale projects and businesses...you don't have to and shouldn't use this stuff just because it's the popular thing to do. Startups and small businesses became successful without any of this.


The problem I've always encountered with this (which I 100% agree with), is that I've never met anyone in a decision making position who's cared or been open to discussing it. Most places seem to just pick something because they heard it's used by Facebook or Google, and they think cargo culting those companies might make them successful too.

Usually, any disagreement is seen as either contrarian or ignorant, and you end up building an over engineered monstrosity because that's what they want.


> The problem I've always encountered with this (which I 100% agree with), is that I've never met anyone in a decision making position who's cared or been open to discussing it.

As someone who is nominally a decision maker (in that my job title has "architect" in it), I have the opposite experience, I guess: most of the pushes for GraphQL have come from devs desperate to have a magic bullet, rather than to rework an overly-granular API.

(Of course there are also the folks who want a tick on their CV but that's a whole other problem)


I sort of disagree. The big win from GraphQL (vs REST) is flexibility: You don't have to define all your endpoints up front. The client(s) can make the sorts of queries they want.

If you're building a single page app frontend where most of the business logic is on the client side, it makes sense to empower the client to define the types of queries it wants to make.

If anything, running a GraphQL architecture seems like less maintenance burden to me, because it makes the server side API less rigid, so the coupling between client and server becomes looser.


If you control the back and front end, I don't buy into this argument. You don't have to define all your REST endpoints up front, you can add them as needed.

If you're serving a public API and you really don't know up front what endpoints people will need, or like facebook you have tons of people with different needs, then maybe GraphQL makes sense.


When you have separate backend and frontend teams, Conway's Law starts to rear its ugly head. GraphQL can help smooth over some of that organizational friction and maybe keep the frontend team from making N+1 requests because they gotta get this feature out by the end of the sprint and backend's busy with its stuff. They tell themselves, "It'll be fine. It's performs acceptably on my machine with my company internet connection."


And how often do you need that sort of decoupling at all?

Beside that, I disagree in principle because, like the OP said, it's just hiding the maintenance burden elsewhere. It doesn't go away.


The problem is the server has a connection to the database that’s infinitely more reliable than the client -> server.

I’ve dealt with garbage REST APIs with tons of sequential calls and in low reliability scenarios 1 call is a massive improvement over a scenario where 4-5 sequential calls need to be made and if/when one of them fails you need to start over.

Now the client has to handle result caching and all the different error states because some server dev couldn't be bothered to batch a few calls and return all the required models.


At my company, we benefit from loose coupling between components every time we'd like to release an update to one without a concomitant deploy to some other service.


I think there are some similarities between a "GraphQL vs REST" tradeoff and a "SQL vs NoSQL" one.

As a rule of thumb I would use a GraphQL API if I (as a server developer) don't know my clients and their API access patterns and do not want them to assemble what they want on their own in n+1 requests, particularly if they are somewhere in the internet and would suffer from network latency in each request. Likewise, I (as a database developer) would prefer a SQL database if its clients (e.g. independently developed inhouse applications) and their access patterns are unknown.

Conversely, as, e.g., a backend microservice developer, you mostly know the access patterns between a database and its clients, mostly because there there is just one database client per database and both constitute the microservice. So you are safe to choose a NoSQL database here with, say, a document-oriented datamodel that is optimized for that single database client. With regard to the API of that microservice, you also know your clients (i.e. other microservices), so you can also optimize a RESTful (or even GRPC) API for them in order to avoid the over-/underfetching/n+1 problem. Even if you cannot optimize your internal RESTful API to satisfy each of your clients' requests in a single response, network latency is not such a big problem here because clients of a back-end microservice are often somewhat close to it.


Additionally, most real-world applications fall into very predictable patterns which can be encapsulated by REST API's with hand-tweaked SQL access patterns for speed. You don't generally need arbitrary query patterns from the client, and Rails/ActiveRecord already has tons of problems minimizing SQL calls. If you've built a decent backend API framework, making a new route should be quick, it's just access/auth boilerplate plus whatever retrieval is the most efficient.


I'm not really seeing the GraphQL code he presented as fundamentally different than serializers. I've used a similar pattern at several companies and one of the biggest pains is making sure you're lookups stay in-sync with the serializers to avoid N+1.

This is simply recognizing the fact that many serialization patterns do not work in a black box. They inherently rely a coordination with SQL queries to load necessary data.


Schema duplication also exists in most of the popular python implementations I’ve seen although I don’t do much backend right now I might be mistaken.


At my last job, we just dynamically generated the GraphQL schema from our Rails models and our permission system. Once we had that in place we just developed as usual.


Any gem for that?


Unfortunately, I've switched jobs and they own the code I wrote. I could probably write something from scratch, using the same principles and be okay :thinking_face:


> have created a completely parallel schema - new classes for every object in the GraphQL graph, which are analogous but separate from the Rails models, each one acting like a combination of controller and presenter

And IMO this is good. Now you have dedicated place (namely `app/graphql` folder) for API-specific data mapping and validations. And in general GraphQL-Ruby gently pushes you to decouple your API from your database schema so you will be able to refactor and optimize things under your application hood later.


> the natural assumption would have been that ActiveRecord model classes should double as GraphQL result objects, and there would be a nice DSL for specifying how to safely expose those objects to the API.

GraphQL-Ruby Types _is_ that DSL. You specify what fields can be exposed to the world and this is also the place where you can hide too low-level specifics (e.g. join separate `amount` and `price` columns into specific `Money` type). Then you pass AR object to GraphQL-ruby and it does what you said in DSL.


In my experience, automatic eager loading is the most elegant solution to this problem. For ActiveRecord, I'd recommend Goliloader [1] (which is a gem similar to ar_lazy_preload but with emphasis on doing that automatically) while for Sequel, you can enable the TacticalEagerLoading plugin [2].

These tools don't require explicit `includes`or `preload` calls. They can infer the tables that need to be eager loaded through usage (e.g. if you iterate through users and every iteration requests the user's posts, the first iteration will eager load all posts for all those users). This fits really well with GraphQL because different clients might trigger requests for different associations. Automatic eager loading allows you to optimize for all those different cases without writing a single line of code.

The other solutions are either way too explicit or might be eager loading too much. Things like graphql-batch don't even feel like ActiveRecord any more (and for a good reason - it's a different kind of abstraction, meant to batch all sorts of things, including HTTP requests).

There are more caveats when it comes to eager loading, mainly there are many things that can break eager loading. I recently wrote a blog post [3] about dealing with those issues.

[1] https://github.com/salsify/goldiloader

[2] https://sequel.jeremyevans.net/rdoc-plugins/classes/Sequel/P...

[3] https://lipanski.com/posts/activerecord-eager-loading


Great post! By the way, the reason why ar_lazy_preload exists is that neither me nor anyone else around me heard about goldiloader at the moment when ar_lazy_preload was created


Came here to also recommend Goldiloader, but your comment and especially your blogpost summarizes it just fine.

We‘re using Goldiloader in a medium sized Rails application with zero problems so far.


I can speak from experience having built and supported a complex commercial React + Rails GraphQL app over the past year or so. The Evil Martians blog posts were really helpful in getting started.

I have to say though, it's extremely un-railsy and the N+1 problem is hard to solve without creating heaps of extra loader types. The overarching issue is that from the front end perspective all queries are equal, where in fact some are much more expensive than others. I end up spending most of my time optimising the query resolvers to perform well for the set of queries the front end actually makes as it's not feasible to optimise for all possible outcomes.

The front end development experience was made marginally better, but it created far more problems than it was worth. It's harder to debug, it's harder to isolate performance problems, and the Rails documentation and battle-tested experience (not just todo apps) detailed in blogs and on stackoverflow is thin.

Unless you have a really good reason I would steer clear for now. Better to spend your time solving business problems than wrangling an immature framework.


I built something slightly different when I was working at IFTTT.

The main idea was to feed the Active Record preloader with hints from the GraphQL queries.

Of course this only works well when your GraphQL schema matches your database schema (which is most cases anyway in my experience).

https://github.com/nettofarah/graphql-query-resolver

Here's a video of a talk I gave about the approach at GraphQL Europe in 2017:

https://www.youtube.com/watch?v=TIzEZJuDpIQ


> which is most cases anyway in my experience

Which is interesting, because every other comment here is saying it shouldn't.


I agree with you. Not many things in life are done as they were supposed to be done. Hindsight is 20/20.

I built this tool to help with existing codebases. At IFTTT specifically, we saw an instant 60% reduction in database IOPS. Not too bad for 1~2 days of coding.


Unless GraphQL implementations include some sort of query optimiser, it's very hard to see how they actually do anything, if you're left to constantly supervise the back end in this way.


I really wish GraphQL weren't named in a way that seems to cause confusion about what it's for and its relation to SQL.

It's not a replacement for SQL. It's a replacement for REST. And the intent is to provide a statically typed, holistic interface and encourage best practices in API design. It's also quite nice because there are clients that plug into it and offer substantial benefits as a result, like Relay.

There are lots of ways people write resolution methods for GraphQL, and this article provides a good number of them. You can think of them as query optimization approaches if it helps. But it's not quite like SQL, where in SQL you say whatever you like and it just gets figured out. GraphQL forces an amount of intention and asks schema writers to consider the most concrete use cases possible rather than just enabling super generic ones.


I don't think GP is confused.

GraphQL lets you write queries for objects. This fundamentally changes how you architect your app and database.

In a REST API, you turn known a finite list of known query patterns (endpoints) into DB queries, which you can test, hand-write, and optimize.

If you let end users query your object model, it takes far more work to ensure they don't do something very expensive.


My cynical take is that frontend devs will do this anyways, just with api calls from the client and probably repeat the same calls a few times per page load. But hey, the backend metrics look great, who cares that the page takes 10 seconds to load.


Totally agree, Graphql pushes the problem to the backend, where arguably, it's easier to mitigate after bottlenecks are discovered.

Seems like it postpones premature optimisation... which is a good thing in my opinion


The amount of work it might take is generally proportional to the amount of flexibility you give the user. You don't have to (and in fact are generally discouraged to) offer end users a degree of flexibility that makes your life harder. To be especially clear about this: do not mirror your DB schema in your GQL schema; it's not worth it.

But, even should you have to support a complex schema, the fine article showcases a number of great mitigations that cover basically every possible issue.

The only issue that I don't think is covered here is that collecting all this data up and sending it all at once can sometimes be slow or even time out, and there's no mechanism really to allow GraphQL to defer the collection of some fields until they're ready. It's coming very soon (in the form of @defer; to the spec, to graphql-js, to Relay, and to others) but it's not quite here yet.


There seems to be a lot of Engineering practices you need to understand and follow to use GraphQL effectively which makes me naturally suspicious of the technology. I do judge technologies for most use cases (not niche) by how easy it is to shoot yourself in the foot - the easier it is the less I rate the technology for broad usage.

The article assumes simple schemas as well - a parent with many child relationships in this article could lead to some complex resolvers. Unless I misunderstand the article many of the strategies here are in ORMs (e.g. lazy subselects) which often don't perform as well as a crafted query given the still multiple roundtrip's to the db. They also require more code, and exhibit branching which means that performance is a function of the users query. If you know your user that's fine, but then again I don't see the advantage vs a straight REST or RPC call which would result in simpler code anyhow. It seems to work IMO in cases I guess when service isn't critical, your model/schema is simple, or you prefer flexibility of data access over reliability/predictability.


> If you let end users query your object model, it takes far more work to ensure they don't do something very expensive.

How so? There are existing tools that prevent both N+1 problems (Dataloader) and complex, recursive queries (depth and complexity limits).


If GraphQL asks one to consider the most concrete use cases possible... what is the point? I'm aware it's just an alternative to bespoke endpoints, REST or otherwise, but the problem is it promises a lot of stuff that is really hard to actually fulfil in reality. At which point, what is it actually getting you?


+1 on this. GraphQL is a great way to actually decouple your API from underlying database schema. When you start thinking not about specific REST endpoints, but about API data model in general, it turns out that it is much better to hide many database-level specifics and provide more natural data model to clients.


But you're still actually getting that data from somewhere (maybe even several places). You absolutely have to consider deeply the performance characteristics of complicated operations across these datastores to be able to deliver a sane API. There's only so much you can hide when you allow arbitrary joins and filters etc.


It seems dangerous to assume that GraphQL decouples your API from the database, considering that there are still performance implications.

If your GraphQL schema requires expensive queries because it has a different schema than your database, is it really decoupled?


The same advice applies to REST.


While it may be tempting to do so, don't create wide open "flexible" GraphQL queries that you don't know the use cases of yet, supporting arbitrarily deep levels of nesting, aggregation, and parameterization. That's how you paint yourself into a corner, fast.

It's much easier to evolve from a very restrictive schema to a more open schema as the use-cases present themselves, than it is to move clients from an open schema to a more restrictive one because your backend is choking trying to support the cardinality of edge cases introduced by it.


That’s kinda hard not to support. The second you have a relation modeled bidirectionally in GraphQL, you support infinite nesting (well, there are complexity guards, but alas).


The primary “thing GraphQL does” is provide a data shape contract between servers and clients, with type safety, introspection, and a fairly human-friendly query syntax.


> with type safety

GraphQL provides one-way (server-bound) run-time type assertions.

I constantly see developers that should know better confuse validation with type checking. I've had developers, in earnest, ask how they can use TypeScript on run-time data inputs. I have to break the bad news to them and simply tell them: no, you can't do that. That's not how type systems work.

Out of the box, GraphQL provides no validation, that I know of, beyond the elementary data types. Anyone that knows a thing about XSS or SQL-injection can tell you that a string is not always just a string. Anyone using C/C++ that has dealt with stack overflows and code injection can tell you that an integer is not always an integer. Sometimes it's a pointer. Oops.

You're going to hand code validations, each and every time. Because no tool knows how many characters your DB is setup to handle for a username, for example. GraphQL and TypeScript won't save you.


I understand the distinction you're making, but I think it's fuzzier than you seem to think. Yes, GraphQL responses tend to go over the wire in a format without any type checking (like JSON). But GraphQL tooling and libraries can use GraphQL schemas to ensure that application code that consumes GraphQL responses will never see a value with a type that contradicts the GraphQL schema.

GraphQL tooling cannot literally prevent JSON from having a string value where there's supposed to be a number value, but it can ensure that application code never has runtime type errors. Is that "validation rather than type checking"? Sure, I suppose so, but my application code that consumes GraphQL responses will either get some representation of an error, or a representation of a successful response that is guaranteed to have the types I expect.

And yes, there are plenty of other data formats and tools that provide the same sort of thing: some representation of a schema, some non-application code that checks runtime data against that schema, and a contract with the application code that each response will be either an error or a type-safe successful response.


I don’t disagree but can’t you kinda get a lot of this with swagger / openapi aswell?


The discoverability and querying part of GraphQL seems to be a net loss, and the RPC part seems to be a net win. I think REST is going to lose some a big chunk of its popularity, but that it will be to gRPC, not GraphQL.


I found both the discoverability and querying in GraphQL much better compared to Rest. Can you explain what you mean in more detail?


Discoverability with GraphiQL and Intellisense aims to make it self-documenting. It doesn't work out that well and you wind up needing documentation anyway.

Queries over GraphQL often wind up being handled by custom resolvers and being about the same as remote procedure calls. It is kind of neat how it separates the queries from the mutations. Blitz does that, but with GraphQL being optional:

https://blitzjs.com/docs/query-resolvers

gRPC just provides an efficient transport and has the client library provide the interface. It works out pretty well for google APIs.


I agree that more comprehensive documentation is often necessary. But still, having GraphiQL that let's me discover all possible queries with their types and a small explanation for each field is a great win. I like that much better than for example generated swagger documentation. So how exactly is this a net loss? If anything it does not improve the situation - but making it worse?


In general I would not recommend bolting on GraphQL to an existing model unless you're willing to invest in either time to build out the backend query optimization or tolerance of suboptimal querying.

The ruby graphql ecosystem is also lacking for decent loaders, but such a loader would likely not integrate with ActiveRecord.

Also, remember, GraphQL is a language, you're responsible for locating the data on the backend.


Building a Firebase-like direct frontend access to your database tables is far from the only use case of GraphQL. Many (probably most) apps inevitably need backend logic that does more than just pass through and format data for the frontend. GraphQL can still have a lot of other benefits in terms of enforcing data structure and eliminating other types of boilerplate.


My understanding is that Hasura more or less "compiles" SQL, rather than chaining models together like Rails does.


Yeah, we use Hasura for some endpoints and it's great for rapid development, but still has a lot of sharp (or indeed dull) edges.


Hey! I'm from Hasura.

Would love to hear what's been missing / painful and factor that into the roadmap.


A lot of random issues, some I think already fixed, like migrations being run out of order. Remotes being hit even for null foreign keys kills off some queries for us. Remotes not being supported in subscriptions (sort of understandable but still causes inconsistencies in user experience). We've had to be very strict with naming conventions for relationships etc or we find we're constantly dealing with merge conflicts, but that's as much our workflow issue. There's poor support for Postgres enums. That's the stuff I can remember off the top of my head, and none of it's really a killer. Without something like Hasura my interest in GraphQL drops close to zero, to be clear, so it's providing a lot of value.


Crystallising architecture around the front-end is one of those anti-patterns my grandmother warned me about.


Why not? The front end is closer to your users. Shouldn't that be the focus?


Analogies about proximity are beguiling but misleading. They’re all hat and no cattle. Streamlining the ticket office won't make your trains run on time, and you can’t just put more lipstick on a dead pig, since ultimately it lets the tail wag the dog.

Which is to say, a great user experience comes from the heart, not the skin, of your product.


As opposed to REST which, in practice, seemingly defines the network interface by what was convenient to the database layout?


The naive bijection of REST to table CRUD is both commonplace, and deeply flawed. Fortunately, databases do not have preferences; people do. That's one illustration of why I've found oppositional mindsets unhelpful to design processes. Heck, it's why they executed Socrates.

My first recommendation in any design process is, assume that Conway's law will define your interfaces.


We use GraphQL heavily to power web apps and chrome extensions. To me the biggest benefit, as others have mentioned in this thread, is the strong contract it provides between FE and BE. We have a mix of CRUDy queries and queries tailored specificaly for certain pages.

The FE team loves that they can easily look up what exactly the BE returns and include tools in their build pipeline to validate the queries they write against our production schema. Combined with TS then checking that you use the returned objects correctly this is a major win.

We actually rarely go an optimise on field level because often fetching unnecessary objects from the database is faster than creating two queries that 95% of the time will both be called (for example adding a join to the main query instead of fetching additional fields by id in a different resolver). However, if the FE really doesn't need the additonal data then at least it isn't sent over the network and the FE doesn't have to parse it from JSON.


EdgeDB [1] is an opinionated database and language built on top of Postgres that solves the N+1 problem. Using a framework on top of that is a stack that I definitely want to give a try for a simple project. A lot can be done within edgedb, the language offers powerful functions and expressive types.

[1] https://www.edgedb.com/


There is also abandoned graphql-preload gem with absolutely amazing API (it wraps graphql-batch under the hood):

https://github.com/ConsultingMD/graphql-preload

It is very sad that its creators are left and current owners aren't responding.


The proper solution is using well-designed GraphQL-to-PostgreSQL software like Hasura or PostGraphile.

There's no reason to use Rails for that (or, well, use it for anything at all, given the terrible language it's based on and its outdated architecture).


How do you feel about using SQL for everything, eg using Postgres policies for RBAC, or PL/pgSQL to generate passwords?

In principle, I love the idea of Postgraphile but this is what turned us off.


In my experience, GraphQL enables smaller, less experienced teams to build complex web apps faster. (As long as you use decent frameworks and tooling.)

There are definite downsides versus eg REST (notably performance, which becomes harder to reason about), but it’s an acceptable trade-off for us.

I’m also optimistic about the great tooling that is improving all the time - eg Hasura, Postgraphile, Graphene-SQLAlchemy all solve N+1 today.


The Python Graphql library Graphene-Django includes built-in support for dataloaders, and you can just tell it how to load a list of objects, given their id's (by default, use the foreign key id as the primary key). Works beautifully.


I haven't had a chance to dig into dataloaders much. Is the basic idea you perform one query per "layer?" Maybe not as great as just using something like A.objects.prefetch_related(related_objects__other_object), but better than N+1


(Sorry for the delay here) At face value, it can issue one more query per layer, but you get a lot of other benefits by breaking down your logic in this way. For example, you're less inclined to use database joins that may not scale as you grow. Another benefit is that you can very easily switch storage layers. For instance, maybe you fetch one layer from your SQL database, and another from a KV store database. Or maybe one layer comes from the cache, etc.


Another way is to run the Super Graph servie alongside your Rails app. Super Graph can decode and use Rails cookies and it's in Go so very fast and lightweight. Super Graph automagically converts GraphQL queries into a single efficient SQL query. https://github.com/dosco/super-graph


huh. "Open your SQL database to the world" is the thing that Rails has carefully avoided... not immediately obvious to me how this tool deals with, like, security. or any kind of data hiding.


Postgres supports json_agg which solves n+1 and also cartesian product problem. It's neat, hasura does it nicely for example.


How does json_agg solve the n+1 problem?


It can eagerload more complex queries.

If you are eagerloading multiple joins relationships, normally that would create a result set of n x m x k x ... rows. With json_agg each relation could be single column in a row etc.


In this case the row size can become huge. Also, how to make sure it's consistent? (I mean foreign keys/unique indexes/etc)


i was mostly talking about reading side, not mutation side.


Ok, hear me out:

Graph databases + simple API > GraphQL

Practically every important scenario uses related data. Graph databases are awesome at that.


Get most of the benefits of graphql while still using rest: https://www.graphiti.dev/


so much this.


Has anyone gone the route of leaving GraphQL out of Rails, and using a dedicated GraphQL server like Hasura?


We looked at that, but with a large Rails app that already wraps the db in a ton of business logic, we needed to wrap the Rails app in GraphQL, not the db. To start with some other server would mean either recreating that business logic, or creating yet another API to wrap the Rails app. So we went with graphql-ruby and it's doing what we hoped.


I don't get the need for GraphQL


[flagged]


If you're gonna go that direction, at least get the geography right. Ruby is Japanese, not Chinese.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: