Hacker News new | comments | show | ask | jobs | submit login
How I Reduced My DB Server Load by 80% (schneems.com)
201 points by itsderek23 on Oct 9, 2017 | hide | past | web | favorite | 110 comments

Note that Postgres supports functional indexes, and the case in the post (a `lower(column)=`-clause not being able to utilise the column's index) is the example used in the documentation: https://www.postgresql.org/docs/current/static/indexes-expre...

Was coming here to post that.

Being able to create indexes based on the results of a function is the solution to so many problems. You can even index the result of an XPath function on a huge, compressed XML document.

Partial functional indexes are amazing. You can create an index that not only elides execution of the function, but automatically only includes the subset where that function would be relevant. Makes the index smaller, to boot.

    create index repo_name_lower on repos(LOWER(name)) where name IS NOT NULL;

Partial unique-constraint indexes are useful too

This story supports my growing theory that you should put as much of your app's rules in the database as you can.

There were three problems with having the rule in Rails:

1. The need for an index was easily overlooked.

2. The rule would be bypassed if a different app used the same database.

3. The rule wasn't even foolproof. Only a database constraint would guarantee uniqueness when two users are saving at the same instant.

The problem is, SQL is hard. We should not forget how much a programmer must learn. For example: Ruby, Rails, Linux command line, HTML, CSS, JavaScript, vi, how to exit vi, etc. Each of these takes years to master.

SQL is especially SQuirreLy. However, it can't possibly be worse than learning the myriad JavaScript frameworks and complicated server-build tools that are completely optional for 99% of us. My advice: Don't do a SPA. Spend your time on SQL instead :D

Maybe I'm crazy, but I thought the whole point of putting validation and constraint capabilities into RDBMSs was because it was more efficient to do so, and that this has been pretty well understood for like... eons.

If you didn't want to write SQL and use constraints, then why would you even use a database (over just plain files)? That's what they are designed for. RDBMS developers are probably all sitting around giggling (nihilistically) over experiences with users who think their database is slow on account of their failure to actually use it in the way it was optimized to be used.

I believe the primary driver here is the overwhelming reliance upon ORMs when building anything with your average web framework—be it Python, Ruby, JavaScript, or whatever. Sure, these ORMs may, and usually do, offer ways to write and execute queries directly in SQL, but a vast majority of current developers seems to always reach for the ORM. Now, for simple CRUD apps, sure, that’ll help get you going quickly without having to think outside your chosen language. But this then snowballs into never thinking outside your chosen language. When ORMs are the average developer’s introduction and sole interface to using databases, we wind up with all the StackOverflow questions about how to write query X in ORM Y, when it could have been written as an SQL statement, sproc, or function, and executed directly (and probably more efficiently). It’s both amusing and sad when I see or hear of someone being unable to interact with their db directly. I can only suspect they actually believe the ORM is the way to interact with their db. Oh, they may know SQL exists, but that’s about it.

This problematic state of affairs is only compounded by the fact that ORMs don’t properly and easily handle taking constraint, validation, and other data-focused application logic and actually writing it into the schema (some ORMs are better/worse at this than others, of course). They all seem to just default to letting as much of it be application-level code as possible. This then leaves new developers with the impression this is where such code belongs, so they never experience anything telling them it should be anywhere else. It’d be great to see more education offered within ORM docs, but the really important stuff seems to often be there only if you already know what you’re looking for. I’ve seen so many models defined over the years without indexes on the most obvious columns that are queried a bajillion times. Or maybe you can set a unique constraint on certain fields without a bunch of effort—sqlalchemy is decent at this in the Python world—but then you have to have unique validators for forms, rather than relying on the db to complain about a violation of the the constraint, and the ORM catching it in a friendly way devs can bubble up when needed. Oh, and then some insist on adding client-side validation of that same constraint. So, you take a constraint that matters at the db level, and is already easily handled at the db level, and people are writing client and server validation to enforce a condition the db is already supposed to be enforcing.

There’s madness here. Maybe it’s just me, though.

Well, there are problems with that approach too.

At the far end, any programming logic you put in the DB will be impossible to scale if needed. You can only have one DB, but any number of Rails servers.

Database constraints are a matter of preference and skill set. They make some things much harder when you have to delete things is certain convoluted orders, and I can get the correctness through tests. Your decisions may differ. That's fine.

You should always put indexes on things you query by. Be vigilant. This is the one case where "only optimize when you have a performance problem" rule doesn't apply.

You can certainly have more than one db.

  > any programming logic you put in the DB will be impossible to scale if needed.
  > You can only have one DB, but any number of Rails servers.
The difference in processing time between a database with no rules and one with all of them is usually negligible.

  > You should always put indexes on things you query by
I have put indexes on things, and PostgreSQL still decided it was more efficient to do a sequential scan. There are cases where they don't make a difference ("low cardinality" or something like that). Also, indexes slow down writes and eat disk space. Usually each is just a little. But it can add up sometimes.

The advice I see most is to run EXPLAIN and judge from that. Even after you've added the index and run ANALYZE, run EXPLAIN again to make sure that the database is using the index and that it makes a difference.

Indexes are useful for speeding up select statements but they run a significant cost on insert and update.

Sometimes it's better not to index, if you have a table that is updated often and queried in varying, complex, rarely repeated ways.

Yeah, I'm aware of that, and I'm sure these cases exist, but after 20 years around these things I have yet to see one in real life.

FWIW I've worked for multiple companies where tables couldn't reasonably be indexed.

Special cases though, usually write-only analytic tables that are queried once in a blue moon and have incredibly high traffic (~millions of rows per hour).

> Don't do a SPA. Spend your time on SQL instead :D

I wholeheartedly agree that developers learn SQL way too little and way too late (I blame ORM/AR overuse early on) but SPA/SQL are possibly the least mutually exclusive issues I can imagine.

All business / data rules go into and are enforced by, the database.

This was learned the hard way: creative sales people manipulating sales data (via ODBC & Excel on NT3.51 no less). Since all logic was in the applications, they had a field day when they bypassed those.

This was 1996. Not much has changed.

...well why wouldn't you put your constraints in the DB? Really, why wouldn't you?

Off topic: learning to exit vi takes "years to master"?

No way: you can just kill -9 it from another terminal.

The generic editor agnostic solution: kill pid -1

If exiting VI takes years... I shudder to think how long it'll someone to learn the "Write with Sudo" trick.

vi, how to exit vi part made me laugh.

Joke aside, the basics of SQL can be learned in a very short time. Before fancy ORMs all novice programmers managed to get by using SQL in big-ridden PHP projects ;). I for example, got my introduction from querying a national database leak and than writing a simple blog with tricks I've learned from sqlzoo.

Yes on before my first oracle project I did a weeks course on Pl/SQL and a week on Forms and that worked out fine

I've seen a litany of these kinds of posts and I'm always amazed by two things when I see them posted on HN:

1. A cadre of diehards can't wait to post how amazing Postgress is or would be for whatever it is the OP is doing (as an aside, why isn't Postgres more popular if it's so amazing?); and

2. How averse people are to actual SQL.

Years ago I dealt with this crap in the Java world back when Hibernate and the like were all the rage. I was always amazed at how much confirmation bias there seemed to be. People decided these ORMs were amazing and then completely ignored all the bugs introduced by this layer and effort spent trying to figure out what the ORM was doing and how to make it do the right thing.

Back in the day I always liked a Java data mapper framework called iBatis (now dead, replaced by Mybatis it seems), which was pretty simple. Write some SQL in an XML file and call that SQL from your Java code. It was parameterized (so no SQL injection issues) and you could still do some funky things with discriminated types and the like. Plus, analytics were super easy because you knew how often each query was called and how long it took. Also, you could easily EXPLAIN PLAN those queries if you even had to (usually needed indexes were obvious).

Compare this to the auto-generated SQL from the likes of Hibernate. ugh.

I've come to the conclusion that people have this tendency to decide X is bad and then go completely out of their way to avoid X. You see it with SQL and ORMs. It largely explains (IMHO) thing slike Javascript and GWT.

At least half the time "X is bad" really means "I don't understand X and I don't want to learn it".

Joel Spolsky's "leaky abstractions" is good and time-honoured advice.

Take the Hibernate example. Once you bought into that framework you had to do all your data access that way or you broke the caching. That's mostly bad.

People also overestimate their needs. They rush to create Hadoop clusters and distributed NoSQL solutions because, you know, relational DBs can't keep up with their "Big Data" (which means, millions of rows) when in fact you can dump billions of rows into a single MySQL instance.

To be blunt, I see a lot of irony in your post.

> People decided these ORMs were amazing and then completely > ignored all the bugs introduced by this layer and effort > spent trying to figure out what the ORM was doing and how > to make it do the right thing.

For every person that did this, I've run into at least one other that was convinced that ORMs were the most evil thing ever and that they should roll their own little Object Mapper. Every one of them would then completely ignore all the bugs introduced and effort spent trying to train new developers on their slightly unique thing and how to coerce it into doing the right thing.

"I've come to the conclusion that people have this tendency to decide X is bad and then go completely out of their way to avoid X. [...] Take the Hibernate example. Once you bought into that framework you had to do all your data access that way or you broke the caching."

There is nothing about Hibernate that requires you to fully buy into caching, or even to fully buy into its abstraction. In fact, I've generally avoided caching in Hibernate in order to retain the flexibility to do what I needed without it when I needed to.

Usually about 90% of the code ends up being really boring CRUD with trivial queries. The last 10% could be implemented with whatever crazy approach made sense.

And this isn't to knock those other frameworks either. iBatis and quite a few other early Java ORMs were quite good and arguably better than Hibernate was at the time, but Hibernate was marketed more effectively.

+1 for MyBatis. Good library that. If you're looking for something similar these days, JDBI [1] is great too.

[1] - http://jdbi.org/

> 1. A cadre of diehards can't wait to post how amazing Postgress is or would be for whatever it is the OP is doing (as an aside, why isn't Postgres more popular if it's so amazing?); and

I'm a happy MySQL user, all of my side projects and on the job work is done in MySQL. That said, how else would Postgres become more popular if there isn't some level of evangelism to spread the word? I like reading about Postgres features and maybe someday I will switch.

For now, with my/our needs, MySQL is fine

Wasn't aware of Mybatis, thanks :) With Clojure, I really like HugSQL for similar reasons (You're dealing with real SQL, rather than another fat layer(s) of abstraction).

Same here, rapid growing startup, Hibernate introduced many perfomance problems, replaced with JDBCTemplate which was much easier to reason about. Also Hibernate upgrades were a pain.


http://codemonkeyism.com/beware-magical-code/ (2010)

I liked iBatis quite a bit too. Basically mapping hand bombed SQL to functions instead of objects (using XML ugh, but that was the norm at the time).

Hibernate was just horrible and I had to watch my entire team adopt it (EJB 1 era).

Exactly why ORMs are a bad idea. I've always wondered whether ORMs help or harm. I feel like the one reason to use it is if you have a development team that isn't capable of writing SQL which in itself is bad.

Well the problem is, if you don't use an ORM, you'll invent one yourself...only poorly. And new hires will end up taking a ton of time to learn this "custom" orm framework of yours.

And there are many bad, or simply, "too simple" ORMs out there that really don't help you much. I haven't really found one better than Perl's DBIx::Class (though outside of Java, Python, Ruby, I have haven't really looked).

Case in point, I recently found this little gem[0] allowing easy correlated subqueries. I basically took a webpage that was loading in over 2 minutes, and reduced it to about 100ms and the outputted SQL was about 2 pages long (from about half a page of custom resultset orm code). I used several correlated subqueries to sort of pivot part of a table (well several tables actually). The original author of the code I was working on was fetching entire tables of data, with each row fetching (joining) to another entire table (it did this in several levels really) just to sum some values. This was all valid ORM code (even with handy 'if' statements checking every row id to do in ORM join (yes he didn't even use a dang where clause!)) but it showed a true lack of knowledge of the ORM at hand and even SQL in general. Nonetheless, I saved the day :)

[0] https://blog.afoolishmanifesto.com/posts/introducing-dbix-cl...

You can absolutely build a product on top of an SQL database, by just writing the SQL queries you need.

As a bonus, if you're thinking ahead, you can avoid writing queries that will perform poorly. If you're not thinking ahead, at least you can more easily fix the ones that do.

I've never really understood why people want to interact with their database outside of its native query language. There's just so much mismatch between an ORM and the underlying interface, it feels like fighting all the time. And I don't feel like it's really saving any time either. The one thing that's sort of nice is it's easier to write a general filtering function, but SQL filtering lets you write so many things that won't perform well, making that easier isn't really helping anybody.

I've been thinking about writing SQL queries again recently because I'm using too many ORMs and at least one of them is unnecessarily complicated.

Writing queries is easy. Validating data is not too difficult. Building objects/whatever from the resultsets is not so easy anymore, especially with joins.

By the way, the mistake this post is about is a classic with Rails. You have to use a unique index in the db, not making queries. You want to protect your data anyway, regardless of which application you start building your db for. Others will follow, possibly in different languages.

I don't want a full-fledged ORM, and I sort of like SQL, but just normalizing nested objects to put them in the database and then unnormalizing them on the way out is really annoying to do in raw SQL in my experience. Every time you want to store a new object in the database, it feels like so much effort to write a bunch of boilerplate functions. Or do you have a solution to this?

I enjoy using SQL in an environment with first class hash/map/ whatevers; I don't like to build up an object. SELECT X, Y, Z from .... gets you an array of things with X, Y, and Z keys.

Turing your hash into parameters for an insert isn't going to be great fun, although perl hash slices aren't bad. I think I've seen something about named SQL variables instead of just ?, that might be usable with a hash directly?

If you're using python, you can use named tuples for everything. It's really nice! Check out the psycopg2 namedtuple connection.

I just glanced at this, and maybe I don't understand, but it sounds like psycopg2 namedtuples only work on one table at a time; is this right?

The namedtuple is just a datastructure interface with psycopg2. For instance, you could write an arbitrary query "SELECT left.x AS cats, right.created_at FROM left JOIN right ON etc" and you'd get namedtuples with attrs `cats` and `created_at`. Similarly, you can insert namedtuples just like rows, since they are iterables, or as namedparameters with the nt._asdict() method. It's great!

> if you don't use an ORM, you'll invent one yourself

Not true. You may use a graph database, or write SQL statements & apply functional transformations. Or you might be using a platform like smalltalk & just save your image. There are other paradigms besides OOP & relational databases.

> new hires will end up taking a ton of time to learn this "custom" orm framework of yours.

Or your new hire might really know how to leverage the particular database you're using, but you force him to throw away a lot of his experience & to start over from scratch & instrument the database through an ORM layer. There are a lot of ORMs, their APIs may even differ, so someone who knows Propel still has to stop & read the docs if you're using Doctrine, for example.

> it showed a true lack of knowledge of the ORM at hand and even SQL in general.

Quite the opposite, anyone who can craft a query that complex obviously knows the tool well. Their real problem is knowing when to not use that tool. Or they just write bad code, regardless of what tool/langauge they're using, for whatever reason.

I think the scope of the discussion is whether to use an ORM when using a database that supports SQL.

Or businesses that write their own database wrappers "because we might switch to Oracle one day and this will isolate the code from that change"

> You may use a graph database

There you go. I need 6 more months to deliver now.

ORMs try and solve a hard problem, and the devil is in the detail.


It surely is hard. But is it worth solving?

The price you pay for a one-size-fits-all solution is that you lose the flexibility of arranging your queries the way you want. You'll automatically lose a lot of performance (that is not always important), but you also lose a kind of code clarity.

If you don't get a one-size-fits-all solution, you will have to write the translations by yourself. That leads to more code, but in a better structure because you don't need to beat your code until it fits the ORM's model.

All said, I do think an ORM is not something useful by itself, but may be part of a very useful solution. Python Django's CRUD generation and Haskell Persistent (not really ORM, but very similar) type checking are examples of that.

>And new hires will end up taking a ton of time to learn this "custom" orm framework of yours

Or your new hire actually learns how to use the tech that makes their code work. I could use fewer devs who have no idea how the code they write actually works.

> Well the problem is, if you don't use an ORM, you'll invent one yourself...only poorly

There is always a trade-off. The code you wrote, even if it is poorly re-implentation, you know it inside out, since you wrote it. The third party code, may be better, but people usually make a lot of implicit assumptions and end up not using it correctly, so you end up in situations like this.

For DB layer, I usually prefer custom code so I know exactly what is going on under the hood.

The best of both worlds is just to... actually understand the ORM you choose to use.

I'm always confused by how reluctant folks are, in general, to read underlying libraries.

You don't even have to read the library source, just verify the SQL statements generated. Most ORMs let you drop into raw SQL for advanced clauses and sub-statements. For me there value of an ORM is composable business logic. For example, if whereHasLotsOfTrees is a complicated SQL subquery and has it's business logic definition changed, I can edit only one function and have the new logic work everywhere.

I think you might be confusing 2 different ideas. An ORM is an Object Relational Mapper. Lots of companies build very big software without using an ORM. And the very concept of an ORM becomes meaningless if you use a language that rejects Object Oriented Programming. For instance, when using Clojure, some teams use Korma:


Korma is a DSL. And I think you are confusing the concepts of ORM and DSL. I'd agree that most teams, over time, tend to develop (or borrow) their own DSLs for interacting with the database. And I've seen teams that use the term DSL loosely to refer to any code they write to interact with the database, so long as they are doing more than concatenating strings of SQL. But they don't necessarily use ORMs.

While I agree with this, the quality of ORMs vastly differs. For instance, the ORM style that wraps sql in a dsl is strictly better than one that encourages model lifecycle callback hell, as activerecord does. The latter is never necessary to work with a database and is a terrible pattern long-term for composing database interactions: it is extremely difficult to reason about what will actually happen inside the transaction (or immediately after).

Tl;dr you aren’t wrong, but many home-rolled ORMs are simpler and much easier to use than, say, activerecord.

I think that QueryDSL on top of JPA on Scala is just about as good as it gets.

It is also complex and requires that you really master all the ingredients... If SQL is a mystery to you, it will just be a painful experience.

> if you don't use an ORM, you'll invent one yourself...only poorly.

My favourite SQL library is Clojure's HugSQL, which avoids this issue by letting you write SQL in .sql files and then generating functions for you that execute these queries. https://www.hugsql.org/

> Exactly why ORMs are a bad idea

I'm not a full-out fan for ORMs but... this is not an ORM issue. It's an issue with an ORM of the ActiveRecord variety that pushes database-logic back to code level for some strange reason. The ORM I'm most familiar with (Doctrine2, Object Mapper pattern) doesn't do this. It does other weird stuff, but not this.

Just so. SQLAlchemy and many other ORMs allow you to simply define an index. ActiveRecord always reimplemented far too much of the database. I'm not sure whether that was due to the state of MySQL functionality at the time or what.

ORMs are a great idea and can save a lot of time and prevent a lot of errors. However, they are not a silver bullet. If you have performance bottlenecks you still need to know SQL and should be looking at the SQL that the ORM generates. There is also a 'mixed-mode' ORM or a lightweight ORM such as Dapper. Dapper allows you to write the queries in SQL but it makes it trivially easy to convert those to actual objects and allow you to parameterize them.

I don't really understand why people throw the baby out with the bathwater like that. What's the problem of using an ORM for 99% of your queries and doing the other 1% in raw SQL? Many ORMs (like Django's) even help you by mapping the results back into models automatically[1].

[1] https://docs.djangoproject.com/en/1.11/topics/db/sql/#mappin...

I use something like Dapper on one of my services. I don’t consider it “throwing the baby out with the bath water”. My experience with ORMs is that they waste far more time debugging odd behavior and working around quirks and poor generated queries than they save in coding effort. I’ve never felt that ORMs actually saves me that much effort except for the boilerplate of deserializing into objects. Everything else I feel is a net negative. Writing SQL generally isn’t that hard. For the cases where it is hard, ORM generally does a poor job anyway.

The search for the holy grail continues..

SQL strings in source code are worse than ORMs, even if you move it out to a resources file of some kind.

No one likes copying the SQL command to a SQL IDE, making some changes, running it to make sure it works, and then copying it back to source. If you don't do that, you're dropping all of the productivity that syntax checkers do for you and risk losing productivity to typos or other dumb mistakes.

There is a middleground too - for instance sqlalchemy in python allows you use code to build queries. It's python, it does quoting right (lowering lots of sqli concerns) and it has an orm layer on top of the query builder.

You get to use the orm for a lot of braindead simple, and mild complexity stuff, and can drop down to the sql builder as it makes sense for complexity or performance reasons.

It also understands the various databases very well, so if you choose e.g. postgres, you can access most of the features without needing much (if any) "raw sql".

I adopt a idea (not remember the original library where I say it) where ALL the sql strings are in a single .sql file.

Exactly as with sql upgrade scripts.

It look like this:

    --name: post-event


    --name: get-location
    SELECT * FROM "Location"
        id = @id


    --name: list-location
    SELECT * FROM "Location"
    ORDER BY country, state, city;

Then I just parse this file (note the names with --) once and use this alike (in F#):

    module Location =
    let ENTITY = "Location"
    let SQL_LIST = SQL_CMDS.["list-location"]

    type LocationRecord = {id:int64 option; address:string; country:string; state:string; city:string; version:int64}

    type LocationQuery =
        | All
        | ById of int64

    let query q =
        use con = openConn()

        let toRec = Db.toRecord<LocationRecord>

        match q with
        | All ->
            Db.select con SQL_LIST []
            |> Seq.map toRec

        | ById(theId) ->
            let id = [P("@id", theId)]
            Db.select con SQL_BYID id
            |> Seq.map toRec
And plug a micro-orm (mainly just a very thin layer over ADO.NET in .NET. I do similar over swift and python).

This take me like a few hours. Let me test easily the sql. I can build the sql exactly as I want.

The only cons is the repetition on the scripts - because SQL is a terrible language that not allow composability, like similar to CSS - . Probably I will later use a template parser (like mustache or similar) but I think this is the closest to the holy grail ;)

So, you update a 5,000+ line file every time you need to change a query?

What do you do when you have five similar queries, and each needs to be changed if one changes?

This seems like smaller projects might work, but I'd be concerned about using a strategy on a larger project.

Here's a clojure library designed around it. (https://github.com/layerware/hugsql)

Not a fan of this strategy in particular, but I can see arguments for it.

>What do you do when you have five similar queries, and each needs to be changed if one changes?

In short, I truly utilize a DB with full effects.

I don't follow the mantra of using a DB "only" as a dumb datastore. Something that I get validated as when people use NoSQL and have not option that use all of it.

This mean, that I use views and stored procedures/functions.

This cut a lot. Also, proper modeling of tables also help (a lot). I refactor DBs.

I mean, I truly treat a DB as "code".

Obviously, this strategy is not flawless, and it will benefit from some kind of ORM-ish layer to auto-generate the mechanical parts of the querys. I'm building that part myself.

But so far, is have show to be good.

We are happily using JOOQ with Java + Postgres.

I strongly recommend it. Having statically typed representation of the schema has many pros and the trade-off is a little overhead when changing the db schema to regenerate classes.

Some of the benefits include knowing exactly what part of the code stops compiling after changing the db schema, easily map POJOs to-from query results.

Also, there's simply no pressure that you have to be absolutely certain what each and every knob on your ORM does else you have bad data in your database. Been burned before with Hibernate defaults, and the thought of a junior dev creating wrong conf is just not worth it.

The database is basically the place were you least want a risk of misuse to have any unseen side effects.

>No one likes copying the SQL command to a SQL IDE, making some changes, running it to make sure it works, and then copying it back to source.

Tbh thats currently my preferred methodology, just because it's usually nicer to clean up a mistaken query in something like sequel-pro, and its generally easy to verify correct output visually

And I've had to do the same thing with ORMs like sqlalchemy, and the only real difference is I first have to get the ORM to tell me what it generated (which is a pain, sometimes).

Intellij tools are capable to parse, lint, and autocomplete sql in strings.

To my own surprise, it has become my favorite way to handle queries with multiple/variable joins.

Also, in languages that support HEREDOCS, if you call the HEREDOC "SQL" then VIM and the Intellij editors will syntax-highlight the SQL.

It's not only that: It connects to your database an you have autocompletion for tables, columns and any schema objects, as well as linting. All that inside any string.

By example:

$sql = "select myColFromTableB from tableA";

Inspection will give you have a "unknown column" warning.

(ps: sorry for late reply)

Thank you, I did not know that! It is not working for me, but I'll poke and peek around for a setting.

Maybe I'm doing it wrong, but I like to keep simple queries in an ORM, and then execute anything complicated as a stored procedure.

No, that's a perfectly practical approach. You're much less likely to make mistakes on procedure names vs. full queries. The trade off is managing procedure versioning and deployments of procedures.

Ultimately it's a style preference but I've seen larger projects with thousands of lines of SQL in source with some of those queries spanning 6-7 joins with dozens of where clauses (plus string manipulation functions, ugh), so that is why I'm pretty fiercely against SQL in strings.

Sometimes I see monster queries like that and think that they could have saved themselves several lines of code if they were willing to export the data and run some of the logic in the native language.

Sometimes it seems like people get hung up on doing everything in one query and end up making a Frankenstein's monster that nobody will want to touch when it inevitably breaks on the next database upgrade. Although sometimes the system forces this behavior by ingesting the output of the query automatically.

Well, it depends.

At least in my experience, assuming those queries are written "correctly" (that is, in a set-manner than a C-in-SQL manner), leaving it in the database means is usually vastly more performant and easier to verify it's correct. I've written a lot of many-hundred-line monsters with joins, CTEs, table-valued functions, and a whole lot of other stuff piled into them that were both more clear and much faster than the arcane application logic they once consisted of. Usually if it's used in the application, some management type will also want some kind of report based on pretty much the same query, too. And having it as SQL also makes it much quicker to compare the before-and-after if any changes are made.

Of course, YMMV and what I said isn't true all the time, and would certainly also depend on your database server - if queries did break between upgrades I'd be much more hesitant to rely on the db server.

But either way, I wouldn't dump a monster like that right in source code - that's crazy!

What do you mean by "set-manner than a C-in-SQL manner"?

(Not OP, but I feel like I've seen this phenomenon before.) In C, the natural idiom would be to read in a giant file line by line, doing something with each line and eventually doing something else based on some summary of everything read so far. Lots of dumb SQL code also does something like this, e.g. by generating some ginormous join and then whittling it down to something usable. Better SQL code filters before the join, and often uses indexes to do so.

There are people in this world who do things like use cursors in place of joins, for example.

The latency of the network is comparatively very high.

This is why the N+1 is such a big deal and you can get results magnitude faster of you query through the database.

> No one likes copying the SQL command to a SQL IDE, making some changes, running it to make sure it works, and then copying it back to source.

And how exactly are you making sure your ORM code works?

I rely on the simple operations to be pretty reliable. Most have some way to get to the actual SQL that will be sent to the DB though, so you can check that if you're suspicious.

That sounds more like a tooling problem.

Actually, I started to put SQL directly into my source code. IntelliJ, for instance, has really good support for embedded SQL code (syntax highlighting, auto-completion, executing queries including placeholders etc.).

SQL strings should always be in the hands of a DBA, not a developer. The RDBMS should get your request in the form of a SP call and get the data you need in the form of cursors.

So, SQL strings belong to the metal, not the UI.

There are plenty of devs out there perfectly capable of writing sql (and plenty of DBAs who are actually just developers). It's not rocket science and is entirely based in CS.

Sounds like you should face your fears and learn SQL if you think it's so arcane that somebody else needs to write every and all queries for you. It's not as hard as you think.

Though there are certainly cases where it helps to be able to find someone more experienced and knowledgable than you.

Or perhaps you're a DBA that's tooting your horn so loudly that you don't believe anyone can do what you do.

So wrong.

I thought this thinking was 10 years dead at this point.

That ... depends where you work. Some places (I've, uh, heard, yes) have DBAs that won't let the devs touch the database and either enforce ORM usage or vet every query before allowing it to release.

And, to be fair, having seen some of the abominations people have committed with SQL, I only half blame them.

Well, it's simply an organizational decision that a company is making at a certain scale. Division of labor and expertise. Fair enough.

But the issue is with saying "no developer should be writing SQL strings, ever".

It's like saying "no designer can handle HTML, leave that to the professionals".

I think ORMs are there to make developing easier - but doesn't obviate the need for the user to understand how the underlying database works. For one based on my understanding of their use-case, could've used a validation :on => :create only. Likewise, if case-insensitivity is needed, they can enforce that on record insertion or, if they're using something else like MySQL, just use case insensitive collation. The fact that I know how to write SQL doesn't mean it's always easier/faster/better than using an ORM...

> I think ORMs are there to make developing easier

Only for object oriented apps, where you want to tightly link behavior to your objects. For other kinds of app architectures, there may be no relational/object oriented mismatch to smooth out.

ORM's let you write a lot less code since you don't have to do the object-relational mapping yourself. That's just one of several reasons I usually recommend them. The issue is, as you've pointed out, that people sometime use ORM's as an excuse to not understand SQL, which is a terrible idea. But the bottom line is that you have to optimize all your database calls, including ones implemented with ORM.

ORMs aren't the problem, they're simply a tool. Developers who never bothered to learn SQL, indexing, relational theory or schema design are the problem. Incidentally, this aversion to learning is also how we ended up with MongoDB.

I think ORMs are very nice if you take the time to sometimes profile your DB and also look at the SQL it generates. You still can optimize slow queries by hand writing SQL. I definitely prefer an ORM codebase over one with direct SQL statements. Best in my view is to abstract all database access into a separate layer. A little more work but much easier to maintain and tune if needed.

Sounds like stating that we should all use assembler.

No, you could write raw SQL (we do this alot in our Rails app) or perhaps push out your abstractions to stored procedures (not a good option IMO, but maybe better than letting my framework come up with the best SQL)

ORMs are great for prototyping new business systems. The performance is not great, but when your priority is to quickly model business logic into something functioning, there are few better tools for it.

However, if you have well-defined requirements and are expecting high-volume traffic from the get-go, then it could be wise to skip the ORM.

ORMs aren't a bad idea; they just aren't a drop-in replacement for in-memory data structures for traditional imperative languages. They drop in rather well for functional languages though. Also, many ORMs are simply flat-footed, but we shouldn't fault ORMs generally any more than we should call C "slow" because some compiler is particularly bad.

it can be nice to have the whole stack type checked, so you know the query is good as you write it. There are a few languages/libraries where you can do this while also writing SQL directly, so you can kind of get the best of both worlds. Like F# sql type providers, where you can get compile time (or editor time) checks of your queries against the DB or a schema.

If your in that position run away and join the foreign legion to forget the "horror".

All web developers ought to know the basics of SQL ie how to SELECT INSERT and UPDATE

Well this actually had nothing to do with using an ORM. If he put that same validation into an SQL call and ran it on update it would have been the same effect.

> Rarely do we consider how one query or a series of queries could interact to slow down the whole site.

That doesn't seem right to me. It's almost always something to consider when designing the data model. Maybe I'm being uncharitable, but this seems to me to be equivalent to claiming we rarely consider the use of an algorithm or interactions between algorithms and data structures when writing some code. I mean, for toys that's fine, but I wouldn't defer this discussion for something I intended for public use.

> It turns out it was coming from this line in my model. > This innocuous little line was responsible for 80% of > my total database load. This validates call is Rails > attempting to ensure that no two Repo records get > created with the same username and name. Instead of > enforcing the consistency in the database, it put a > before commit hook onto the object and it’s querying > the database before we create a new repo to make sure > there aren’t any duplicates.

I still can't believe that Rails even attempts this. It's simply not possible to do this kind of enforcement in a race-free manner using SELECT statements with Postgres.

I really like all these debug and analysis reports coming into HN lately.

Yeah - I read some of them, and while I don't necessarily 'get' all of it, there's a lot to be learned in all these things, and it's so good that people spend the time to cover in depth what the issues were and how to fix them.

For all those ORM's are bad people:

`I’ve been seeing that stray 30s+ spike in request time daily for months, maybe years. I never bothered to dig in because I thought it would be too much trouble to track down. It also only happened once a day, so the impact to users was pretty minimal.`

TL;DR - spend time configuring it, write efficient code, and understand what your 3rd party code is actually doing.

Tldr: Your ORM might generate heavyweight queries.

Edit: in this example, case insensitive uniqueness validation in Activerecord

Rails is a pretty popular framework and widely used. How come something like this was not caught earlier, presumably its affects all Rails applications.

Why would anyone code this kind of inefficiency instead of using inbuilt constraints. Has the code been reviewed, tested? Too many questions.

It's surprising given how popular Rails was and still is that something which should have been caught in the early days of Rails is discovered now years later. Aren't all the production apps seeing this? Didn't Twitter see this?

The real concern is a lot of highly promoted technologies in HN do not get the proper technical scrutiny that one should take for granted in a technical forum and increasingly hype is conflated to quality.

Those who ignore CODD will learn it the hard way, forcefully.

That just defines what relational is, it doesn't prescribe it as a best practice. There are use cases where its ill advised, like EAV.

TL;DR guy has a scheduled job which puts load the db, removes a query which normally runs for 1.9ms instead of implementing rate limiting

I do this everyday for the past 10 years.. This is my bread and butter. http://www.jonathanlevin.co.uk

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact