Hacker News new | past | comments | ask | show | jobs | submit login

My experience is strongly flavoured by developing and optimizing on the edge of what the database is able to do; and my performance condition isn't scaling up millions of tiny trivial queries, but getting big hairy dynamically generated queries to execute efficiently within a deadline.

No ORM comes remotely close to being able to solve this kind of problem. The kinds of things I need to consider:

- analyzing select / predicate / order / etc. for tables used and selectively including tables

- forcing query evaluation order by creating a subtable query with conditions, projecting out an id, and rejoining in the next query up

- analyzing predicates and rewriting predicate trees to push predicates down into subtable queries, or convert predicates into joins, or replace references to columns on other tables with denormalized columns

- gathering heuristics about query set size and changing query generation technique

- splitting queries into parallelizable queries and composing result set client-side

An ORM doesn't really help me. Even a library tuned for symbolic manipulation of SQL doesn't help hugely; my life is made much simpler when I can freely extend the SQL AST classes with my own analytic and rewrite methods.

Isn't this the type of thing that 'the' (or a) query optimiser is meant to do?

With a suitably tweaked ORM, you can pass platform-specific hints down to the DB. The platform's query optimiser should be (obviously up for debate!) the one that's taking a query and working out the most efficient way of executing it.

Not saying it always will always pick the best execution, but if you're getting to the point of supplanting the optimiser you're starting to replace the database itself.

At which point you're probably beyonds the realms of what an ORM can and should reasonably doing, you're starting to writing your own custom persistence platform.

No, query planners are generally hopeless; for my use cases, it's trivial to get orders of magnitude better performance with different query structures.

If you have just a predicate, or a single join, most query planners are just about ok; with multiple tables and predicates touching those tables, there's ever increasing risk it will start the join on the wrong table and produce too many rows somewhere in the plan. Engines like PostgreSQL are unpredictably stupid (but sometimes very smart), while MySQL is predictably stupid (but consistently fast when you keep it simple and avoid specific constructs).

When designing for interactive use, you want to design for a maximum run time on the query. If that means using a query which is often not the fastest, but is never slower than X, it's the query to use. Smart planners are more of a hindrance than a help, because they reduce predictability.

The worst case is when a planner starts with a small row set (planners usually like starting with small row sets) and joins against a big table expecting a similarly small joined set, but ends up with an enormous result set because of the cardinality of the join. Estimating join cardinality is a tough problem for database vendors to solve, and it's critical to figuring out the correct join order - particularly since the number of join strategies grows with the factorial of the number of tables to join, so the planner can't spend much time on it. Meanwhile, the application developer may very well know the designed cardinalities and a join sequence that minimizes the chance of a blowup, the most important thing for interactive use.

Well yes, this is exactly what a query optimiser is meant to do.

What you're meant to do is add a few indexes, maybe sprinkle some hints into your query (platform specific), based on your knowledge based on your rough knowledge of the statistics of your data structures and let the optimiser/planner work out the best way to execute it.

Now yes, the optimiser may sometimes get it hilariously wrong, but I guess that's the price you pay for a relatively portable (across DB platforms, with some tweaking of the hints) solution.

The risk of going too far down the rabbit hole of dynamically re-writing the SQL is that you're essentially trying to second-guess any query optimiser/planner and for most of people making software they're going to do a poor job building a 'shadow optimiser' based on 'experience' of a specific platform and how it behaves in observed scenarios. If the people building databases are finding it hard, there's probably a reasonably good reason.

I'd also note that this isn't really a problem with ORMs per-se, but with relational DB platforms and their performance when running arbitrarily complex SQL.

You can concretely measure how much time it takes to scan X million rows of a table; how much slower it is for every join you add; how much slower an index is to scan for every column that's added to it; how much slower it gets for every column you add to the select clause; etc. And determining optimal join order in my case isn't particularly hard (it's a lot like a star schema; the key thing is not to bring in the whole of the big table in the middle).

I was a compiler engineer in my previous job; transforming and optimizing expression trees are my bread and butter. I'd give myself good odds of improving MySQL for my particular use case - I'd know where to look, what stats to collect and how to approach things differently. But likely my optimizations would be pretty specific to my situation, and not everybody would be happy with more predictable, sometimes slower performance.

The query optimizer faces the same limitations as an optimizing compiler - improving the query as much as it can given limited information, in as short a time as possible.

And it's frequently unsuccessful, for the exact same reasons as optimizing compilers - proper query optimizations require more information and time than what is available to those optimizers.

Hand tuned code created by an expert will always be faster than that produced by an optimizing compiler; hand tuned SQL created by an expert will always be faster than that produced by the query optimizer.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact