> > The same query's behaviour can be utterly different depending on the distrib...

zasdffaa · on March 30, 2022

Thanks, that's indeed my point.

To clarify an earlier one where I said "The same query's behaviour can be utterly different depending on..." what I was saying was that the same query might be executed completely, totally differently depending on what the optimiser deems best. It might use a merge join with an index, or it might (depending on the data distribution) use a loop join and ignore the index(es) entirely. Same query, different execution.

This is why you should (almost) never use hints unless you know a lot more about the data than the DB server, which is rare (or unless the optimiser is being consistently stupid and very sub-optimal, that happens). If you add an index hint the DB will be forced to use it even if it reduces performance, and it can reduce performance a lot. If you force join orders explicitly you can catastrophically fuck performance - I think I've only ever done that once in production.

So profile, add indexes accordingly, but typically don't force them to be used; let the DB use them, or not, as it feels. Save hints for specially problematic situations.

jve · on March 30, 2022

Thanks for clarification :)

I see you deal with these issues too. In my case, SQL Server is providing storage for 3rd party application (Microsoft Dynamixs AX). Although it is customizable, but up to a point. I want to share some experience.

What we have is a huge covering index, consisting of 6+ key columns and many INCLUDE columns. The query executes millions of times per day, actually per hour (that implies query plan MUST be cached and query parametrized), the table itself is close to billion rows. SQL 2008 R2. And what SQL optimizer may sometimes think about - "hmm, that index takes up many bytes, let's use clustered index - not so effective on lookup, but anyways.". And then performance tanks, because, yeah, for that particular parameter value the reasoning was true, but not for 99% other cases. And in these cases, forcing particular index is a lifesaver. Along with a plan guides, which actually enable forcing that index on that query without touching the application.

So what I wanted to share? Sadly, even covering index in some cases may not be chosen by query engine.

zasdffaa · on March 30, 2022

OK...

If the covering ix is as almost fat as the table there may be little benefit in it. If the original table can just fit in the ram but the orig table + covering index together can't, they may be fighting for space in ram which means hitting disk which means slooow. But hard to diagnose from a distance.

> The query executes millions of times per day, actually per hour (that implies query plan MUST be cached and query parametrized) ...

Not at all! See below (Edit: I see what you're saying. Still, see below. Opt. 2 might be best here, but it depends)

> And then performance tanks, because, yeah, for that particular parameter value the reasoning was true, but not for 99% other cases

We had billion row searches and the cost of a recompile might be a second or two but the cost of a bad query (from previous, cached plan) can be vastly larger. Esp. here as you talk about it not suiting the other 99% of queries.

2 poss solutions:

1. Force a recompile every time. <https://docs.microsoft.com/en-us/sql/relational-databases/st...> (... WITH RECOMPILE clause). This really worked for us. Note that cpu is proportionately cheaper if you have multiple cores, and who doesn't these days. Recompiles are cheap on such machines. Recompile works very well IME!

2. More advanced. If you know the type of data the query then write identical queries in stored procs with different names, and consistently use a given proc for given expected parameter(s). Each proc compiles and stores its own query plan, so you have multiple query plans ready to go.

Further, make sure your stats are a) present and b) up-to-date. Bad stats = train-wreck query plans.

Further redux, don't assume that a covering index is all good news. I'd have to look at the query plan (edit: what I'm saying is multiple 'thinner' covering indexes may be better than one fat one).

1 & 2 may be complementary rather than totally exclusive but nevver used both together.

If you need any more, shout.

jve · on March 30, 2022

> Force a recompile every time.

Nah. That may take multiple milliseconds. When fine tuned, this query runs in <1ms. It would be an option if it would execute thousands of times per hour, but not millions.

> write identical queries in stored procs with different names

Changing a 3rd party vendor application is not an option. At least for a DBA. Well, maybe it could be an option, when there are really no other options and business starts cashing out to vendor tons of money. But that's a hack - better hack SQL directly and don't bother implementing these ugly hacks in a vendor app that may be applicable only to particular case, particular data distribution with hardcoded stuff.

> Further, make sure your stats are a) present and b) up-to-date

Yep, that's valid - for some tables, there is even a agent job which do this fairly regularly.

> Further redux, don't assume that a covering index is all good news. I'd have to look at the query plan. Query is very simple: SELECT SUM(COL1),SUM(COL2),...snip..., SUM(COL20)) FROM X WHERE OTHERCOL1=@P1 AND OTHERCOL2=@P2 AND OTHERCOL3=@P3 AND OTHERCOL4=@P4 AND OTHERCOL5=@P5 AND OTHERCOL6=@P6 AND (OTHERCOL7=@P7 OR (OTHERCOL8=@P8 AND OTHERCOL9=@P9)) AND OTHERCOL10>@P10 AND OTHERCOL11>@P11 AND OTHERCOL12<=@P12 AND (OTHERCOL13<=@P13 OR OTHERCOL14<>@P14)

I don't know exact purpose of query, but I think its part of a process that rollups some transactional data into less rows (recalculates item quantities).

Covering index provides a simple seek from a single index. Having clustered index with that much key columns would be bad for every other index, which would have to carry all those columns to other indexes and use them for lookup. But this is a business critical table and this is what it takes for SQL Server and application keeps ticking. Of course, there are probably solutions if you own the application that you can rearchitect etc. But here, operations must be transactionally correct, locking must be minimized etc. If execution time grows to 20ms (I just checked how much time it takes for this query to generate query plan), that means we get more than 20x more total duration and MUCH longer process to execute. For a particular day, i see ~15,2mil executions between 08:00 and 20:12. That is 3h2m total duration with average 0.7ms execution time (Exellent). Generating query plan for every execution would be disaster and I'd get a call for an incident. For other queries 20ms execution time is not a big deal.

I'll add that for this particular query, it is rare it would cache a bad plan. But in those cases, we get an incident from customer and thus we had to prevent it from happening from time to time.

I don't want to argue, because we have different experiences, situations, applications, possibilities, designs and data. What you write are valid points, applicable to most cases. But I keep going with replies, because I'v learned a ton from HN and I think our discussion may help someone learn a thing or to about SQL Server. And I like to finally talk to someone that experiences this stuff.

zasdffaa · on March 30, 2022

Gotcha. It's a very different scenario from what I imagined and my suggestions don't apply. Agreed also your limited control on rewriting is limited and with the comment on clustered indexes.

I do wonder how is it possible to run that query on a billion rows and typically get sub-millisecond response, but however you did it, well done!