Reproducing a bug and showing query optimizer hotfixes at work (11 minutes)

Here’s the bug we reproduced

KB 3198775 - FIX: Inefficient query plan is used for a query requiring order by partitioning column of a table with a single partition

Transcript

We are going to create a test database, to test out different settings with query optimizer hotfixes, and I’m going to go ahead and name my database “QueryOptimizerHotfixes”, so it’s completely obvious what this database is being used for.

We’ve created the database and we’re using it. And we’re going to reproduce a bug. This is because we’ve had, even since SQL Server 2016 RTM, we’ve had fixes already released, not a ton, but a few where to fix the issue, you need to not only apply an update to SQL Server, and oftentimes these are released in cumulative update, but to make the optimization change take effect we have to not only have the update applied, we also have to enable the query optimizer hotfixes setting.

What we’re going to do is set up a repro scenario so that we can see the bug, then prove whether or not different things fix the bug by enabling query optimizer hotfixes

The KB article for this bug is 3198775, and it is about a query plan that you get if you’re using a partition table, but the table only has one partition. And this is totally an edge case, right? That’s what these are, is they’re edge cases.

We’re creating what I think of is the loneliest partition function, it is empty, there are no boundary points in there; we’re just going to have one single partition because we haven’t applied any boundary points to actually partition things by. So I have my partition function, that’s my algorithm. I’m going to create my partition scheme for my partition function, and I’m just putting everything in the primary file group; this is not a best practice, right, but we’re going super simple for our bug repro here.

And now, we are going to create a table on our partition scheme that references our partition function and our partitioning column is actually named “Partitioning Column.”

This is not meaningful data in this table at all. Our table is called “Let’s Talk About Query Optimizer Hotfixes.” We’re going to put a million rows into that table just so that we have some data in there for our SQL Server optimizer to look at as it’s planning out our query, so put in some rows in there and we’re going to create some indexes on it.

This specific bug is that SQL Server has a non-clustered index that it can use for a query, but it doesn’t realize that the ordering of the non-clustered index supports the ordering in the query, and it erroneously adds a sort operator that it shouldn’t need to the query. Now I’m going to start out at compatibility level 130, which is the latest and greatest for my database. And I have query optimizer hotfixes off. So what this means, I’m at compat level 130, so any fixes released prior to SQL Server 2016 that required 4199, those fixes are all just on by default, but post-RTM 2016 fixes, like the one I’m reproing, they are not on, because I have it off at the database scope.

We’re going to use a low permission account for our testing, just so we can prove that that hint doesn’t require elevated permissions…

…so I am going to create a login, if it doesn’t exist, for a low permission user. Then create a user in my database that relates to the login, grant select on my table to the login, and also give it showplan, so that we can see execution plans when we’re running from that login. So let’s set up our low permission user, and now we are going to impersonate low permission user.

I have said I want everything after this, until I revert and back it out, I want to be low permission user, that’s us, we are low permission user now.

Let’s repro that bug!

Here is my repro query that I’m going to use, and I’m using a recompile in here only for the purposes of demo.

I’m looking at this plan, and we’re going to be looking is there a sort operator in it, or not? When we see the sort operator, we’re seeing the bug, and I just want to make it clear that I’m not reusing the plan ever, we’re seeing the plan for the exact syntax that we’re using. So just to make that super obvious is the only reason I’m using recompile hints here.

We have the database scope off for hotfixes. We don’t have any hints on; we don’t have any trace flags on right now, and let’s turn on actual execution plans and run this query.

And here we are, this is how we know we’ve hit the bug.

SQL Server says, okay you’ve got an order by in your query, I need to do a sort. In actuality I have an index that physically has that sort in the index so it doesn’t really need to do this operator, it’s doing an extra step it doesn’t need to do here.

So we have the bug.

Can, can we fix the bug?

Well, let’s first try out the old fix of using query trace on, with trace flag 4199. It’s that sysadmin thing.

Only sysadmins can turn on trace flags, and we’re running as low permission user.

Low permission user isn’t anything close to sysadmin, so let’s go ahead and revert and let’s try out that workaround.

The classic workaround for this, I’m going to create a procedure…

…and now I’m back running as my Kendar account, who is a sysadmin, Kendar sysadmin. I’m going to create my workaround procedure name, “dbo.workaround”, only the, I use only the worst naming. And I’m putting the query trace on hint in the procedure. I’m going to grant execution for the procedure to low permission user.

I haven’t raised, I haven’t made them sysadmin at all, I just gave them permission, and now let’s go ahead and once again impersonate low permission user.

Can they run the procedure? We still have our execution plans on. We’re going to see our actual execution plans. So we run it; they have permission to run it, and that sort operator is gone.

For this stored procedure, it has the trace flag on. It isn’t enabled at the instance level, and the bug is fixed, so as long as we can use stored procedures, we can work around this, but we don’t always use stored procedures. Let’s look at the properties of the select operator here. Looking at the property of the select operator, notice that it does actually say “hey, there is a trace flag on here.” You used trace flag 4199; it was applied when generating this execution plan, so that’s kind of cool that we can see that in the properties of the execution plan.

This new syntax in SQL Server 2016 Service Pack One, “use hint” is really cool…

… even if Intellisense doesn’t believe in it yet in my version of Management Studio. I’m still running as low permission user, and this gives me the ability to say I want to turn on query optimizer hotfixes just for this query, even if it’s not in a stored procedure, and the bug is fixed.

The hotfixes took care of it. Looking at the properties of my stored procedure though, notice that we don’t have that trace flag. I mean, I didn’t use a trace flag, I used a query hint, and I can see that, right in the T-SQL snippit in the plan, if I highlight here, we can see that enable query optimizer hotfixes, ope, let’s rehover and get our tooltip back is part of the T-SQL there. So it is in my query, it doesn’t show as a trace flag in the same way in the plan.

Let’s go back to my high permission account

I’ve reverted out of the impersonation. I’m no longer low permission user. And I am going to turn on query optimizer hotfixes at the database level, and say for every query compiled in the context of this database, turn on query optimizer hotfixes so that I don’t have to apply a query to, or I don’t have to apply a hint to every query.

We’re impersonating low permission user again, and now when I just run the query, in the context of query optimizer hotfixes database, it’s on for the entire database and I don’t hit the bug any more. Now I, every query compiled in this database is going to use the optimizer hotfixes if they are enabled.

If I am in that scenario of where I want to test a query, what if query optimizer hotfixes weren’t on?

I’m going to use tempdb as my context, and now I’ve changed my query to use three part naming to say okay, I am going to compile my query in tempdb because I used tempdb, but I actually am using a table over in the other database. How does this compile now? Looking at this and looking at the plan, I can see oh, if I compile this in the context of a database where this isn’t on, that isn’t using query optimizer hotfixes, now I actually am getting a different plan and I could compare the performance of one way versus the other. So really like the fact that we have the database level scope of this now, to give us more options, not only on implementation, but also on testing out those optimizer hotfixes.