Hacker News new | past | comments | ask | show | jobs | submit login

“ Application developers should care about data – not SQL”

Anyone who has been bitten by an ORM generating inefficient SQL (and subsequent having to learn and care about SQL) knows the above not to be true.




Hard disagree. If you’re not leveraging the features of your SQL database to make your life easier, your queries faster, and keep your data consistent & valid, then you probably didn’t need SQL to begin with.

Which, yes, is a common state for many applications to be in. Wresting with SQL while using it very poorly in an effort to act like they’re not.

[edit] disagree with the quotes statement, that is, agree with the post. I worded that poorly.

[edit edit] I've been thinking about what you get from using an RDBMS while avoiding writing or knowing much about SQL or using any of the probably-extremely-nice features of your particular SQL DB (why is everyone always so eager to be DB-agnostic? If you do it right your DB will survive, and greatly ease, several rewrites of your application!) and I'm coming up with:

1) basic locking (I assume you don't want to know how to actually use transactions, beyond what your ORM does automatically, so you're just getting the basics), and

2) some probably-badly-insufficient indices, and

3) an ORM should at least get you a little normalization if you just follow patterns from its docs & examples, I suppose, though you're gonna need to understand some SQL to get much benefit out of it in your DB design and in your use of the DB, so...

An entire RDBMS seems like serious overkill if that's all you're really using.


This statement is certainly provocative (great that it was the first thing picked up here :D) but I'm happy to explain our rationale for this a bit more.

SQL is an impressive technology and has stood the test of time! Yet, we claim that it's not the best tool for application developers who are paid to implement value-adding features for their organizations.

SQL is complex, it's easy to shoot yourself in the foot with and its data model (relational data / tables) is far away from the one application developers have (nested data / objects) when working in JS/TS. Mapping relational data to objects incurs a mental as well as a practical cost!

This is why we believe that in the majority of cases (which for most apps are fairly straightforward CRUD operations) developers shouldn't pay that cost. They should have an API that feels natural and makes them productive. That being said, for the 5% of queries that need certain optimizations, Prisma allows you to drop down to raw SQL and make sure your desired SQL statements are sent to the DB.

I see Prisma somewhat analogous to GraphQL on the frontend, where a similar claim could be: "Frontend developers should care about data, not REST endpoints". GraphQL liberates frontend developers from thinking about where to get their data from and how to assemble it into the structures they need. Prisma does the same by giving application developers a familiar and intuitive API.


> SQL is an impressive technology and has stood the test of time! Yet, we claim that it's not the best tool for application developers who are paid to implement value-adding features for their organizations.

I'm not sure about that. We've recently switched from JavaScript based querying code to mostly raw SQL, and we've reduced our code to about 25% of what it was, and it's much simpler to understand than it was before.

> I see Prisma somewhat analogous to GraphQL on the frontend, where a similar claim could be: "Frontend developers should care about data, not REST endpoints".

I'm not sure about Prisma, but IMO that GraphQL model isn't great. Realistically (for performance, etc) it will make a difference where that data came from. Not for super simple queries, but super-simple queries are super simple to do with REST/SQL anyway. I also feel like this distinction between front-end and back-end developers isn't great.

The GraphQL approach also leads to deployment issue with non-web clients. e.g. it can take a day to get an iOS build released, and there is no way to force users of older clients to update, so this can take months. So fixing a bad query is difficult. With a proper backend you can just swap out the query.


I think the strongest argument for or against an ORM is that it changes the structure of your code, how you reason about your data.

Things like performance, ease of use, reusability, schema generation etc are minor points and not the big issue you need to think about.

Personally I feel that an ORM will lead your project in the wrong direction and your database schema will suffer.

ORMs will usually mix both reading and writing within the same class, but those are not the same thing.

Readers will have different views of your model depending on who they are or what time it is. Writing on the other hand must fulfill your constraints.

Other common mistakes of ORMs is mapping one class with one table, but your data consist of series of relationships, that can’t be explained with a class.

An ORM creates an illusion that your data is actually the class entities in your repository. This can have the effect of creating constraints in your own mental model of your data, thus making it harder to evolve your schema because you are to fixed on how your classes are designed.


Great comment! I’ve been thinking about a tangentially related subject:

TypeScript is like a relational database (and other structural statically typed languages) of your application behavior, allowing you to query the current possible states of your data at any given point in your application code.

I’ve been thinking about this more since TypeScript got template-string generics, and folks have experimented with eg type-checking SQL strings back into application code.

This is the piece that draws me to an ORM: that it brings the “relational model” into application code, NOT the “object” part of an ORM.

Does anyone have any opinions about which language ecosystems create the most effective type-mappings between a RDBMS and application language?

Part of me also wonders if there’s a ton of time lost on mapping between languages like this when the tooling that’s really missing (afaik) is better tooling for producing, consuming, and generally interacting with SQL and db schemas as the data runs through your application pipeline.


> We've recently switched from JavaScript based querying code to mostly raw SQL

Curious what your use case is, because in every codebase I've worked on, people pretty quickly get tired of writing out SELECT * FROM [table] WHERE id=? and UPDATE [table] SET field=? and looping over database cursors all day, and you end up with a half-done, buggy non-ORM.


We're using knex.js, so we're not messing around with database cursors (we get an array of objects), and we don't have to write out the columns for inserts/updates. We also sometimes use the knex query builder for super trivial selects from a single table.

But for anything more complex than that (e.g. SELECT queries joining several tables) we're using knex.raw so the query itself is raw SQL where we're very much thinking in the relational model rather than in terms of objects.


Ahh that makes more sense. My preference has been to use an ORM for your normal CRUD, but write more advanced queries using SQL (or sometimes just write the where clause using SQL, if the select is otherwise simple.) I am not a fan of query builders.


> Yet, we claim that it's not the best tool for application developers who are paid to implement value-adding features for their organizations.

You have two options when marrying RDBMS SQL and OO: either mismanage the relational data so that developers can use a class hierarchy, or stop using a class hierarchy to represent data.

IME, applications come and go. They get rewritten, thrown away, obsoleted. The database, however, is there to stay. Mismanaging the data purely so that the application developers don't have to touch all that icky relational stuff almost always results in more work for less returns.


Prisma does not use a class hierarchy that are mapped directly to tables. Instead we take advantage of structural typing to give you a very smooth developer experience that is faithful to the relational model.


I think SQL gets a lot of undeserved praise that I’m having a difficult time understanding. The only impressive thing about SQL is its prevalence but that’s a pretty poor yardstick unless one thinks that an appeal to popularity is an indicator of quality.

Now let me count the ways in which SQL is bad:

- it composes poorly due to its unwieldy cobolesque syntax

- it is a leaky abstraction revealing a lot of underlying implementation tradeoffs

- it doesn’t properly implement Cobb’s relational model

- it is poorly standardized with a ton of proprietary extensions and alterations present in virtually every implementation

- it is still poorly supported by tools because the model metadata lacks any standard interface to make universal tooling possible

A lot of the praise for SQL is just bandwagon hopping and cargo cult behaviour or a lack of vision by most people of how things could be much better


The great thing about SQL is the relational model. You typically can't compose parts of SQL queries easily from the programming language calling it, but SQL queries themselves compose incredibly flexibly. The result of of a SELECT query is itself a "relation" that can be queried just like a table.

I do wish someone would create a compile-to-sql language that adds variables and basic conditional support. I think T-SQL (SQL Server) has a lot of this, but it's missing in the Open Source world.


> I think T-SQL (SQL Server) has a lot of this, but it's missing in the Open Source world.

Every database with a procedural language based on SQL (Oracle PL/SQL, Postgres PL/pgSQL, etc.) has it, T-SQL just doesn’t segregate procedural and declarative code the way most engines do. Mostly, I think this is a negative for T-SQL, but sometimes segregation means there’s no way to do one-off procedural scripts without making and then calling a procedure; but Postgres, for example, since 9.0 has the DO statement for this use case (which supports any procedural language installed in the DB, but defaults to PL/pgSQL.)


I don't think SQL does compose well. Queries aren't first-class values in SQL.

I can write a query that queries the result of another query, but I can't store a query and then pass that somewhere to query from.


Maybe the WITH statement and CTEs can help you, if I understood correctly your complaint.

Check the example at the beginning of https://www.postgresql.org/docs/13/queries-with.html


What are you comparing it to? What is the better alternative?


SQL isn't the only RDMS language, but it is the one with 50 years of incumbency.

Postgres before it was called 'PostgreSQL', used something called QUEL, which is quite a bit cleaner. There is also Microsoft's LINQ.

A typical programming language can express all that SQL can. In a way, an ORM is a cross-compiler from your programming language to SQL.


> A typical programming language can express all that SQL can. In a way, an ORM is a cross-compiler from your programming language to SQL.

SQL has always been a sort a magical, black-box in that you have very limited control over the query planner, and have to hint at it to do the right thing. (I guess `EXPLAIN` allows you to peak in the box a little)

I found [Apache Calcite](https://calcite.apache.org/docs/algebra.html) rather interesting. It provides the primitives that a DBMS is built from.

I would be interested to compare how PSQL is built in comparison. I would love a more layered/pluggable database that everyone can build off of. I think FoundationDB had this approach.


LINQ is pretty much a clone of SQL. Its nice, I agree, but do you think its materially different?

SQL could benefit from being language-integrated, yes. Very few languages can model relational algebra types though, if thats a thing you care about (it certainly is for me).


An ORM API on the other hand is completely meaningless adhoc "whatever works". It doesn't have grounding in anything.


I'm making no statements about ORMs. I'm just picking apart the mindless "SQL is the bee's knees" mantra so prevalent on HN.


Would be interested in hearing what database and query language you think are better?


I think Datalog as implemented by Datomic has a much better syntax though I don't like Datomic per se.


Oh don’t get me wrong - ORMs have their place and do enable higher agility. They do seem magical the first time you encounter them.

What I object to is the blanket “shouldn’t care about sql” statement because that’s what empowers people to use the ORM indiscriminately without understanding what’s under it (an understanding for which SQL is relevant) and then it’s the non-value-adding developers’ job to come in and untangle the mess, usually could have been avoided by dropping down one level, looking at the SQL that was generated (or maybe analyzing the query plan - again kind of hard if you don’t understand SQL) and realizing the ORM is doing something crazy.


Exactly. ORMs, especially powerful and mature ones like DjangoORM or Active Record, can help with productivity and maintainability and for more complex use cases you have the chance to switch to raw SQL; However, it’s important for developers to know what’s happening under the hood and to know what will happen if they use a certain feature of an ORM. I can’t even count the number of times that when one of my coworkers and I tried to fix a performance issue, and after digging deep into queries and mechanics of the ORM, we’ve realized how ORM heed so much complexity from them and, how a certain data structure design and coding in a certain way can result in an inefficient data flow.


Actually nested objects kinda suck even for frontend devs, if you're trying to keep things in sync efficiently.

Often times it's much better to be able to lookup some object by ID of the entity from some Map, than consuming endpoints returning some crazy nested partial data for the current view.

Depends on how much your app relies on client side caching and incremental sync of data.


As soon as you need a lot of complex data all displayed in one place plus high performance, you're gonna find you need a list, not a hierarchy. You need to be able to treat the data coming in like a stream, not to traverse anything. Read the row, maybe do some state-machine stuff to decide how to treat it, then drop all that on the floor and move to the next row.

This is more-true the less efficient the language is that you're using. I see people screw this up when writing e.g. dashboards that source their data from SQL, while leaning on an ORM, all the time, and it always kills performance (talking tens of seconds to minutes for things that should take a single-digit count of seconds, at worst).

What's good at turning complex relational data into a list, and fast? Yep, SQL.


That too. Though I used efficiently in a sense of minimizing the communication between client/server by normalizing data and just sending changes.

Which is easier if you keep data normalized on the client side too, and look up related objects in a Map when needed, instead of keeping multiple copies of objects representing the same entity everywhere in your client code in some/vasrious denormalized forms.

It would actually be very nice for some use cases if I had a SQL interface to local data on the client side too, so that I can query/join them up arbitrarily as needed from what's loaded up to client storage. That's unpleasant to do in the browser with current platform APIs.

Basically I want WebSQL back in some form, instead of this IndexedDB thing. :)


> if you're trying to keep things in sync efficiently

I think we are all abusing GraphQL in this sense. Caching GraphQL data is a world of hurt.

The easiest way to sync is if your data is in the same model as the way it is stored, which is normalized if using SQL. Even when we do normalize most people will represent a 1-M as an array of foreign keys: `posts.comments = [1, 2, 3]` when in the db we use a foreign key on the `comment` entity: `comment.post_id = 1`. This causes such headaches for optimistic UI updates. It's a mess.

We should be using GraphQL to retrieve data and then render it directly.

Ultimately, I think we should be implementing our GraphQL API client-side against a local SQL DB that acts as a pass-through cache. Then your entire app runs offline and instantly respond to queries eliminating the need for a separate client-side cache, and you don't have to worry about about data model mismatch anymore.

> Depends on how much

I think all apps want this - unless they are SSRing with <100ms on every action.


Ha :) Yeah, I just replied in a similar vein in the other reply to my original comment. Agreed!


I keep sayin' it - SQL in the browser!

I'd love to get rid of all these client-side state management libraries and just drop an SQL DB in there and allow components to subscribe to changes to queries on record updates, and then sync it to a cloud DB.

One interesting problem I came across is how to know when your local DB has enough records to be able to run a given query. E.g. If your team has 100K tasks in their cloud DB, and you run `SELECT * FROM tasks WHERE tasks.assignee = user.id`, and then another query of `SELECT * FROM tasks WHERE tasks.assignee = user.id AND tasks.completed = true`, the second task is a subset of the first, so it doesn't need to hit the server except to check if any updates were made to the tasks table. Then apply this to all queries including ones with joins. Actually some fun relational algebra can be used figure it out - one of the perks of working with SQL. The problem is actually similar to how to [incrementally maintain a materialized view](https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...) which Postgres actually does not support (Oracle does).


If everything is simple, then so are the SQL queries.


I agree.

That statement is so wrong. All ORMs suffer from leaky abstraction. And good ORMs would never hide that fact, nor would they try to deny native access to the database and force a all-or-nothing principle onto the developer.

If "application developers" care about data that is stored in a relational database, they have to care about the database itself and the access patterns (SQL).

ORMs _only_ provide convenience! They cannot substitute knowledge about the underlying technologies. And they cannot hide complexity, but shift it to the ORM.


I agree, and find the reluctance to care about SQL interesting.

Sure, SQL is not perfect, and has it’s flaws. But instead of learning an established, declarative language designed for data access, there are developers that would much rather learn new tools and libraries to avoid it. If you need to use another programming language for data access, you’re off looking for another ORM. Or the next, better ORM of the year comes out.

There are stored procedures at my work that are more than 20 years old. They don’t need to change when the application code calling it is uplifted.

It’s not that much different than a modern React dev not wanting to understand how a plain HTML form works.


Right. Data management is a hard problem. Just look at SQL query planners, they are insane - but they are not perfect, and they only work on the premises you provide (schemas, indexes, memory, etc). The idea that you could introduce a library that fixes all data management issues for you is not well considered.


Yeah, ORMs were marketed as "if your database changes, you don't have to change anything!". What about if your programming language changes? You have to learn an entirely new ORM each time. What about all of the people who have to read your code? Which ORM's do they know? Everyone knows SQL.


> What about if your programming language changes?

Very rarely does an organization completely rewrite their products in completely different languages. And if they do, I'd hope they take the due diligence to investigate the tools they plan to switch to and make sure they are a good fit.

> You have to learn an entirely new ORM each time.

You have to learn an entirely new everything each time. ORM's aren't anything different. New language = new paradigm, new routing, new templating, new everything.

> What about all of the people who have to read your code?

What about them?

> Which ORM's do they know?

I'd assume that the team would have conversations to gather information on what ORM's everyone knows and decide which is best for their team. I don't see how this is an issue. Do you just randomly pull people into your company and just like, let them wander aimlessly?

> Everyone knows SQL.

No they don't. And if they did, a lot of stuff isn't huge massive complicated apps, most are basic CRUD apps where ORM's do all the lifting for 98% of the stuff needed done. A lot of us haven't touched SQL in so long that we couldn't construct a simple SELECT statement if we tried. I know because I wrote SQL from 2000-2014, but then tried to write a simple SELECT statement in a `pg` CLI to fetch some stuff and completely forgot how to do it.


The statement is true, IMO. The problem is that ORMs don't ever solve the problem (and I don't believe they CAN).

We should only have to care about data and not SQL. But the reality is that we very much do have to care about SQL.


I agree. Clean and efficient abstractions (oxymoron?) to handle data management just doesn't exist. ORMs are great until they're not, at which point you do need to understand SQL and idiosyncrasies of the underlying relational database.

I tend to prefer avoiding ORMs as a preference because of this but unfortunately, development expectations in the current culture don't pad time enough to do this, so you're often reliant on an ORM and wait until things break before someone paying the bills will be swayed to allow developers to go in and clean things up. The "wait until it breaks to go fix and redesign everything" mentality isn't unique to ORMs though, it plagues any abstraction, framework, library, whatever that gets enough done to supply something mostly functional in a shorter time scale.


Not just this - I'm currently working on an application built by people with no understanding of databases beyond "they're kind of like key value stores". No attention is given to transactions, stale updates, locking mechanisms or anything more complicated than the equivalent of SELECT and UPDATE. It is an HTTP service running on many threads, so the problem is only amplified.

If you're going to use a SQL database, you need to understand what it is and what rules to obey to get its benefits. Hiding that complexity away makes people think they're writing a fast, concurrent high-availability (TM) service while it's actually a ball of mud that sometimes gets into the right shape.


While some people are hard lined enough on their anti-ORM stance that it sort of becomes weird, I agree with you that my beef with ORMs comes from being burned before a couple of times by a super inefficient aggregation ActiveRecord did on GROUP BY queries that it 1. not only took a really long time to figure out why a particular page in our app was loading slow but 2. we ended up having to write raw SQL to fix it.

I think the answer of whether to use one depends on the type and load/volume of app you're working with combined with the dynamics, size, and skill level of your team(s). I'm extremely comfortable writing, profiling, query planning, and debugging SQL queries. Others aren't, and therefore having an ORM to query data in the DB with the syntax of the language you're using in your projects makes way more sense, if nothing other in order to speed your team up.


Not sure when you used ActiveRecord, but slow query logging helps a lot to identify these issues. I also appreciate that in the docs for ActiveRecord they make it very clear that jumping to raw SQL is absolutely fine and normal, and they make the interface for doing that very nice.


I have yet to find the person who, being comfortable with both raw SQL and a particular ORM, to, when faced with the choice of learning a new ORM or using raw SQL, reach for the new ORM. To be honest, I haven't met in person anyone who reached for the familiar ORM, either, for anything complex in a team environment, but that's a more aggressive statement.

I mention this because to my mind it reinforces my belief that learning an ORM (which I've done in the past, and then never used again) isn't worth the effort; learning SQL is, if for no other reason than it's portable (and you'll have to learn it anyway when the ORM's abstraction leaks).


The answers is both. Devs cannot work on a system (involving "data") without knowing about that data/how it works/how it is structured and how to store and retrieve it. Pretending you can do one without the other leads to a boatload of issues in the medium and long term.


if you define yourself as a developer do yourself a favour and learn SQL. It's probably much easier than whatever programming language you use and will avoid you many headaches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: