Hacker News new | comments | show | ask | jobs | submit login
How Does the SQL Server Query Optimizer Work? (xameeramir.github.io)
86 points by xameeramir on Jan 19, 2017 | hide | past | web | favorite | 49 comments



Tangentially, I've always wondered why databases today still aren't capable of being self-tuning. In 2016 we're still manually creating indexes, and databases still use the exact same physical data structures for everything they store, no matter if they're storing 1,000 rows or 1 billion.

The problem isn't too dissimilar from the challenge that tracing profilers try to solve. One key difference, of course, is that dynamically adapting the physical storage strategy to usage patterns is relative expensive, resource-wise. But not every problem needs to be solved for those users with 1 billion rows. I would argue that most applications — web apps, anyway — have datasets that are large enough to optimize, yet still small enough that a dynamic, self-tuning system could be entirely feasible.

The resource problem can be mitigated in several ways. One is to off-load processing horizontally, so while a master is chugging away, a slave is building a table using a different strategy. Another is to simply perform expensive optimizatons during off-peak hours.

Also, I find it interesting that databases also still operate in a mode where reads and writes happen in the same physical store. I have an idea for a database that separates the two: You have the transactional, highly consistent, non-horizontally-scalable OLTP-style "transaction store" where a client carefully constructs reads and writes against the absolute truth, but where reads aren't super efficient; and then you have the "query store" which is a read-only, highly replicated and sharded set of mirrors that uses highly optimized data structures that are created on demand based on the topology of the data and usage patterns (in particular, the same data can potentially be indexed in multiple ways (B-tree, compressed columns, etc.) until the profiler finds the best strategy). The mirrors' replication should be tunable so that some "hot" subsets can be immediately consistent, and others can use slower batch transfers to reduce lock contention on updates.

Right now, when people build scalable systems, they often apply this design by putting the truth in PostgreSQL or whatever, and then indexing it into Elasticsearch or Solr. But there are huge benefits to combining the two into a single, unified system.


Because it doesn't really matter. A tree works just fine as an index, be it 100k or 1 billion records. The other part you mention - self tuning - is also quite clear imho: either the DB is so small that indexes genuinely do not matter (and this can be surprisingly large today) or it's large enough that you don't want the DB to reorganize itself on a whim. We all know "smart" systems, and we all experienced how flaky their outputs are. You don't want a DB that quite literally scratches the scratch storage because of that.

And if we are talking about serious need for indexes, then we almost certainly are talking about subtle tradeoffs that are unlikely to be made like we hope by algorithms that optimize aggregate metrics (eg. average/median query time, cache efficiency etc.) -- consider complicated reporting queries that don't run often, but take a while (but since they are seldom run they are not influential on the databases' performance metrics). A "smart" DB could decide to drop an index, because it accelerates a common write operation, and no important queries needed that index anyway. But now your complex reporting queries are all turned into nested full-table scans. Oops...


There could be a lot of room for optimization if instead of saying "create index blah" you said "I want these queries to be fast: ..."

It's easy to lock an optimization plan in place. It's absolutely trivial, so there's no reason to be afraid of your DB reorganizing itself. Just tell it not to if you are satisfied with how it runs.

The problem is that people tend to need a ton of expertise and awareness when they restructure their data manually. Did they build the right indexes? Well... the DB doesn't know. It just builds what you tell it to. Even if it could know how to optimize access, if only you gave it an example query rather than a directive over its internal structure.

This is the advantage of using declarative language in lieu of imperative language. The DB is supposed to be a data manager. You should say "I want to run these queries, and I have this much memory for you" and it should know how to get your results as fast as possible.


The SQL Server Database Tuning Adviser will happily create all the indexes, stats, and what not, for a given workload. And yeah, you can speed up your queries (the workload) as a result. The problem is that there's always a trade off - those queries may now be fast, but inserts and updates end up being slow due to the burden of index maintenance.

There's also maintenance views that can be queried to see if SQL Server is detecting any missing indexes, or if there's any indexes that are not being used - but again these have to be taken with a grain of salt, as the stats get reset each time those indexes are rebuilt.

> Did they build the right indexes? Well... the DB doesn't know.

The DB does in fact know - looking at the query plan for a given query will outline if there are any obvious missing indexes - but having the DB go ahead and build indexes by default for every given query is dangerous. When you go through 10 iterations of an ad-hoc query refining it down to what you actually want, how's the DB to know that the first 9 don't count? Etc..

> You should say "I want to run these queries, and I have this much memory for you" and it should know how to get your results as fast as possible.

That is in essence what the query optimizer attempts to do, though it has breaking points where it will settle for 'good enough' because optimization of the query itself becomes too expensive. Sometimes people just write bad queries.


My point is that you don't build indexes for every query. You build it for the important queries, and you let the user tell you which queries are important, rather than which indexes to build.

Because honestly, the only reason any user cares that an index exists is if they need it for performance. It should be an implementation detail. Something better than an index might come along, and we've all got to run victorian-age schemas because we've written "create index" all over the place.


Exactly! We're so stuck in the implementation language of the 1980s: index, tablespace, partition, hash-vs-merge-vs-nestloop join .... that we lose sight of what matters. "I want these queries to be fast" ... genius.


No. Not true at all. At around a billion records, you start getting into distributed/scale-out systems, and distributed indexes are performance death whether for OLTP or analytics. Even before you cross into distributed query processing, the random access nature of indexes levies a heavy tax. Indexes are a tiny piece of this issue. They address the problem of finding a single needle in a haystack, but if you need to aggregate lots of data, indexes are largely worthless.

What about different compression schemes for different sets of "column" values - with the idea that you execute against the compressed values, rarely ever decompressing. Even column store systems that allow you to specify a compression scheme aren't adaptive. Why the hell not? This is almost 2020.


> At around a billion records, you start getting into distributed/scale-out systems, and distributed indexes are performance death whether for OLTP or analytics. Even before you cross into distributed query processing, the random access nature of indexes levies a heavy tax.

I assume you mean once process/aggregate a billion records. Having a billion records in a single table on a single, non-distributed system is not an issue at all.


At around a billion records, you start getting into distributed/scale-out systems

Uhh, a billion-row table you can handle on a laptop easily. Probably in SQLite.


> At around a billion records, you start getting into distributed/scale-out systems

Yeahhh.... no. I remember running a 250-300,000,000 row postgres table of Wikipedia links on my 2010 Dell XPS and it functioning fine, if not a bit slow (it had a huge index on an array column). Good machine, but ancient now.


A profoundly important observation. Yes, using a limited set of physical structures simplifies the code and, in some cases, streamlines execution. But that's putting lipstick on a pig. With a handful of exceptions, database architectures were set in stone in the mid-1980s in Wisconsin. Which is ridiculous. The machines that they were designed to optimize don't even exist any more. Take simple storage tier-ing as an example. Why don't infrequently accessed column values migrate to slower cheaper storage automatically? Sure, databases have concepts of tablespaces and partitions, but those are WAY too coarse grained for today's environments. And there are more than 100 issues like this. In short order, NVMe SSD will be ubiquitous, with 3Dxpoint on the horizon. The data structures that make sense for spinning disk are just not equally optimal for the performance characteristics of SSD, Ram, and cache.

Regarding your point about reads and writes on the same store, you might take a look at the new "HTAP" systems (hybrid transactional and analytic). The VectorWise guy has a "new" (probably old by now) project called HyPer


”I've always wondered why databases today still aren't capable of being self-tuning”

I think that’s because that only would be truly beneficial once you have huge tables, and you really wouldn’t want your database to suddenly decide “let me store this table order by its first column, as that seems to be faster for the queries I run now” if that choice leads to degraded performance for hours while the database reorganises tables, creates indexes, etc.

And it can get even worse; that decision might make the queries it runs _now_ go from “more than fast enough” to “way more than fast enough”, but also slow down the queries you run infrequently from “will do” to “glacial”.

I think we would be better of with a database that can easily generate a report showing queries run in the last months, with suggested schema changes and their estimated impact on performance. SQL Server has the query analyzer, but I think that could be taken much farther and be made easier to use.

Edit: I see that, while I wrote the above, jws wrote that Digital's RDB had something like that.


"Tangentially, I've always wondered why databases today still aren't capable of being self-tuning."

It's still quite hard to optimize queries especially if there are numerous JOINs involved. For reference, please check Dr. David DeWitt's PASS Keynote slides http://www.slideshare.net/GraySystemsLab/pass-summit-2010-ke... and his additional papers on database systems: http://gsl.azurewebsites.net/People/dewitt.aspx.

I can't find the video of the keynote online, I believe it's only available to PASS Summit attendees, but if you can find someone with access it's a superb talk. And he didn't even address parallelism or distributed processing as optimizations. Microsoft has already added a new cardinality estimator to SQL Server 2014 and it's still not perfect.

"In 2016 we're still manually creating indexes, and databases still use the exact same physical data structures for everything they store, no matter if they're storing 1,000 rows or 1 billion."

As of SQL Server 2014 there are 3 different physical storage methods: row store, columnstore, and in-memory hash table store (formerly called Hekaton). Each type of storage has completely different optimization methods and are best suited for specific types of data and queries. Dr. DeWitt's page has additional detail on those, and you can watch his PASS Summit Keynote on Hekaton on Youtube: https://youtu.be/aW3-0G-SEj0?t=1336

With all this in mind, self-tuning is going to take quite a while to get to something that provides consistent improvement over detailed analysis and testing. And new technology can completely change how one would approach optimization: https://blogs.msdn.microsoft.com/bobsql/2016/11/08/how-it-wo...


> It's still quite hard to optimize queries especially if there are numerous JOINs involved

Sigh. In the real world (meaning 99+% of applications) there are a few big sets of data and lots of smaller sets of reference data. The thing that matters for performance is reducing that cardinality of the big stuff as quickly as possible. Which is a huge constraint on the search space.

Sure, in some theoretical sense, join ordering needs an A* algorithm or simulated annealing or some other esoteric approach to search. But in practical sense, it is almost a non-issue.

And here's the thing. Those systems with advanced search and costing for join ordering STILL make incredibly stupid decisions more than 3% of the time. And a bad decision can take hours to days and dominate all the good decisions.

DeWitt made incredible contributions to database architectures. He's a great guy and I'm a huge fan. But that was 30 years ago. Where are today's DeWitts?


DeWitt made incredible contributions to database architectures. He's a great guy and I'm a huge fan. But that was 30 years ago. Where are today's DeWitts?

He just left Microsoft's Jim Gray Labs 1-2 years ago and is now at MIT. He also led the teams that developed Hekaton and PolyBase for SQL Server in the past 8-10 years. He's still making plenty of advances.


There have been a lot of tuning systems that take a workload and try to optimize things like index structures. They are tools that get your database "tuned" rather than "self-tuning".

However, I think this (very recent) research paper is a much better peek at how self-tuning might work:

https://blog.acolyer.org/2017/01/17/self-driving-database-ma...


Tangentially, I've always wondered why databases today still aren't capable of being self-tuning

All the mainstream commercial DBs will happily generate you a script with self-tuning advice, you just need to read it and if you agree, run it.


This is probably the most hot topic for database service providers. Operating hundreds of thousands and millions of databases yields a valuable collateral: operational data (what users are doing, metadata, table shapes, query patterns, application behavior, reasons for blocking, IO patterns, memory patterns, everything). Take the tuning wizards of 5 years ago, add machine learning on vast amounts of operational data, add 'shadow' A/B testing (run a secondary instance shadowing the same workload, on a copy of the database, but, say, apply an index recommendation to the shadow instance and measure the impact). When I left Microsoft one year ago, this whole area was seen as a gold mine and there was massive investment into it. You can already see such offerings as Azure SQL Database Advisor[1]. I'm convinced Amazon is doing its own research in this space, specially for Aurora.

We're talking much more than just index optimizations. Think automatic OLTP vs. OLAP pattern detection and migration of data into columnstores or in-memory tables. Automatic detection of cold vs. hot data and migration to-from expensive fast near storage (SSD) vs. cheap remote storage (S3 like). Automatic resizing of instance size. Materialized views. On-the-fly query rewrite. The field is wide open. And you have the best experts in the field looking at your database.

By contrast, the traditional on-premise auto-tuning did not made much progress over the past 10-15 years. There are reasons for this: small operational dataset (one site only), high risk of regression (difficult to do A/B testing), scarcity of experts. The best avenue for on-premise I predict will be the trickle-down effect of migrating service know-how and technologies to on-premise offerings. We're also going to see a push up the stack of the know-how and experience from operating the service making its way into the app stack (think EntityFramework v. 2018?) and helping developers avoid the common pitfalls.

BTW, only 3 days ago Adrian Coyler's morning paper discussion was Self-driving database management systems [2]

[1] https://docs.microsoft.com/en-us/azure/sql-database/sql-data... [2] https://blog.acolyer.org/2017/01/17/self-driving-database-ma...


Very interesting, thank you.


Back in the mid '90s Digital's RDB (an early SQL database) had a companion product, Rdb Expert, which would analyze a live database over a period of time and make a script to set up its indices (ordered and hashed) and table placement rules to optimize a database.

It wasn't as good as a really good expert, but it was at least a 90% solution.

As I recall it was pretty good at getting a tuple and its one to many associated tuples into the same disk page so you needed fewer disk reads to access data. These were disks that were capable of about 35 seeks per second, so it really mattered.

(The drive I'm thinking of is the RA90. 1.2GB in a half of a 6U or so space. 32kg. 280 watts.)


mssql has this in the Database Engine Tuning Advisor

https://msdn.microsoft.com/en-us/library/ms174202.aspx


I'm running SQL Databases on Azure SQL, and it gives you recommendations to add indexes, and various other optimizations, that take 1 click to have happen. I get an email, and I can have the index created via a link in my email.

It's pretty fantastic.


It based on functionality that all exists in the traditional on-prem SQL Server as well - certainly not the e-mail and point a click creation, but you can definitely query SQL Server DMV's for missing index information[1].

As with any suggested indexes though, they have to be weighed against their effect on other workloads. Additional indexes can have a big impact on inserts / updates / deletes.

[1] https://blogs.msdn.microsoft.com/bartd/2007/07/19/are-you-us...


Definitely true - it can't be mindless, but it's nice that it's put in front of my face w/ no effort.


> Tangentially, I've always wondered why databases today still aren't capable of being self-tuning.

http://pelotondb.io


It's still trivially easy to trick Sql Server into using the wrong index, let alone being able to create it's own.


If this tickles you fancy, you should check out this site http://use-the-index-luke.com or a book by the same author https://www.amazon.com/Performance-Explained-Everything-Deve...

He goes into great detail to explain how exactly SQL indices work and how to leverage this to write queries.


I have a blog article that covers a bit more details: http://rusanu.com/2013/08/01/understanding-how-sql-server-ex...


I remember the first time that I read this. I still reference it many years later.


The reality is that, even after more than 30 years of research, query optimizers are highly complex pieces of software which still face some technical challenges.

Yep. It still amazes me how queries can go from absolutely great to scanning tables with a row added in the wrong place. WITH RECOMPILE still scares the heck out of me. It can really help when distributions change, but it can also kill your performance with a bad query plan choice.

It addition to the hint show, you can also force certain indexes. This didn't seem to be as big a problem on SQL Server as its parent code Sybase ASE. Sybase made some really, really bad choices on big tables.

No amount of optimization will fix a bad schema design.

This is also a field where talking to people about how you built your machines and where the indexes are located can make a world of difference. If you cannot have the whole database on the fastest storage, then at lest put the indexes there.


> It still amazes me how queries can go from absolutely great to scanning tables with a row added in the wrong place.

I think that this is a bit of an oversimplification, though I understand the sentiment. Statistics skew is one of the primary reasons for such a dramatic shift in query planning and adding a row can create such a problem if you're already at some sort of skew tipping point and that one row pushes you over the edge.

It would be nice to know more about where the edge is and how to plan to ensure you don't drive your application's/server's performance off the cliff, though.


> I think that this is a bit of an oversimplification, though I understand the sentiment.

No, it really isn't. Cache / Memory isn't a sliding scale, and once you exceed it you fall off the cliff. Databases queries can fall off a cliff if you exceed your memory. I've watched a query that was executing in a couple of seconds drop into many minutes territory because it had to start paging.

There is a real machine with limited memory running your queries and it has been my experience that warnings are rare.


Even if you have tons of RAM and the data IS there, performance is still disasterous if you have to perform seven, eight or nine digit logical reads. (millions or billions). Happens when SQL Server assumes it will use loop join because it thought there is only 1 row to join. Oops.


Oh yeah, memory fails aren't the only cause, but they do tend to hit you very quickly. Nothing like doing production patches because you went from comfortably able to get done in the time period needed to "won't finish until next week". I can just see the developer face when seeing the loop join where it shouldn't be. Almost as bad as seeing FSM (full sort merge) on Ingress or full table scan on ASE or SQL Server.

That's really why I recommend SQL writers really study the plan generated. Its the closest thing to studying assembly to teach what the optimizer is thinking.

I still think the old Ingress plan viewer was pretty neat.


I appreciate your response and I agree with you. However, you don't really address my point, which is that statistics skew can cause variations in query planning. Your scenario assumes a fixed data size with a homogeneous (or at least "known") distribution of keys. In your original comment, you noted that adding a row "in the wrong place" can cause seeks to go to scans.

As I thought through it, I realized you can get a good sense of statistics skew by getting the counts and distinct counts of each value for your keys. If you have a lot of variation, then you may be able to reasonably expect query plan variations.


I was looking at my original post you responded to. I was talking about catastrophic changes to speed based on data and machine considerations. They are not the only causes of problems but are often overlooked because people forget that this stuff runs on real hardware with definite limits beyond the query plan statistics.


There is a nice post: SQLskills procs to analyze data skew and create filtered statistics [1] that links to a session where statistics skew is explained and some scripted solutions provided to kind of cover more rows with filtered stats.

[1]: http://www.sqlskills.com/blogs/kimberly/sqlskills-procs-anal...


There's a really nice article about the Postgres query optimizer, which goes into much more detail about the algorithms used (it's likely that at least the basic ideas are shared with SQL server, though I can't say for sure). I really like the exposition in this:

http://jinchengli.me/post/postgres-query-opt/


As a "newer" engine, SQL Server has a couple of tricks that aren't found in most other systems, including intra-query parallelism across multiple threads. This capability requires occasional reshuffling of data between threads, which makes SQL Servers internals a little more like a distributed DBMS (think Netezza/Teradata) than the wonderful, but ancient, Postgres. All of which means that SQL Server has a larger palette of execution options and some different optimization decisions than PG.



thanks for that!


There's very little actual data here...

Oracle has a bit more data on their website.. https://docs.oracle.com/database/121/TGSQL/tgsql_optcncpt.ht...


I think "SQL Server Database Engine" refers to MS SQL here, I'm not sure how useful the Oracle stuff is for MS SQL Server (I have never used Oracle)


They all operate the same way give or take


I have a guide that shows the equivalent for MySQL. It's kind of split between these two pages:

http://www.unofficialmysqlguide.com/introduction.html

http://www.unofficialmysqlguide.com/server-architecture.html


Is there a Postgres equivalent to reoptimize prepared statements?


I think you can just prepare them again, no? Of course the onus is then on poor you to know when that might make sense because of changes to the referenced tables. Though postgres is so incredibly open, you could probably do that with a trigger or a hook into analyze or vacuum or copy-from.


Prepared statements are re-planned on use if the underlying statistics change.


Ah. Thank you for the correction!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: