Solutions with index hints and local variables (13 minutes)

Tradeoffs, tradeoffs!

The downside of hinting an index by name

In the demo I show hinting an index by name. If the index is renamed, or if the index is dropped and replaced by an index with a different name, our query would start failing.

I don’t show this in the demo, but it is also possible to hint a clustered index by the number 0 or the number 1. If you are interested in doing that, read up on why Paul White recommends that you use 1 as the hint (or avoid hinting the index): http://sqlblog.com/blogs/paul_white/archive/2010/09/23/a-tale-of-two-index-hints.aspx

Why do FORCESCAN table hints exist?

I remember being a little puzzled by this hint when it was added in SQL Server 2008 R2 SP1, which is probably why I like this demo query so much! FORCESCAN hints are useful for situations just like the one we have here: where SQL Server estimates the number of rows.

The situation in our query is interesting, though, because the rows are only underestimated for some of our FirstNames – like ‘Matthew’. For a FirstName like ‘Fay’, the estimate is actually pretty good!

That means that for our query, FORCESCAN is a pretty blunt instrument: it’s better for some names than others!

To read about the details and limitations of FORCESCAN hints, check out Books Online here: https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table

Transcript

There are many ways that you take a crack at this query and change it.

I’m going to show you some solutions for this, but not every possible solution.

Know that not everything I’m showing you is good!

I’m going to show you some weird stuff, and these have pros and cons. Some of this stuff I’m showing you here is NOT stuff I would do. I’m not advocating all these solutions, but they are things that I have found people doing both on the internet and in real life production environments, so we’re talking about what the pros and cons are.

And some of them are just plain interesting I think!

You may come up with even more that are interesting, too. One thing we could do is: we could use an index hint on the index name itself.

Index hint #1

I could say, when you’re accessing this big detailed table, I want you to use the clustered index, which means don’t go use that non-clustered index and do the key lookup on it. If I compile this procedure, which is called NameCountByGender_indexhint, I am going to make sure that I am running from a cold cache. Now I’m going to run this for @FirstName = ‘Matthew’, and I do have my actual execution plans on. This finishes in about four seconds.

I have forced SQL Server to use the clustered index, because I hinted it by name. If I look in the properties of my plan, I can see that I didn’t force a scan, but I did force the index. Forced index is true here. This makes everybody use the index, it’s true, even for “small” names.

I’m going to clear my cache again. Even these smaller names like ‘Fay’, I’m saying ‘Fay’, go ahead and exec with recompile, come up with a fresh plan for this, don’t just reuse the plan that I left in my execution plan cache for Matthew.

But even when I run to run this for Fay, I’ve said you’ve got to use the clustered index. So she has to scan the clustered index, too. She was really fast when she was allowed to do those lookups because there just aren’t that many Fays, but now she’s stuck doing that clustered index, scan, too, because I forced the index. So I do get consistent performance for this. Before that nonclustered index was created, Fay took four seconds then as well, so I’ve made things back to normal and it’s as fast as it was before, but it’s not as fast as it could be for people like Fay.

Here’s the things that I don’t like about what I did

I specified the index by name. If that index gets renamed, or maybe the clustered index is dropped and a different clustered index is created, my query’s going to fail, because SQL Server is going to be like: “You forced me to use an index and an index with that name isn’t there.”

Also, what if our nonclustered indexes changed and columns are added to it, and it becomes the perfect index for this? Our query won’t just adapt to it. It’s stuck using that clustered index.

So I would avoid this.

There are better way to do this.

One way that is still kind of forcing things, it has a little more subtlety… I’m not going to say this IS subtle, but it has a little more subtlety… is to use a different hint.

We’re going to create this procedure which uses the FORCESCAN hint

This is a table hint. I am forcing a scan just on the table where I have this hint and I’m not forcing a scan on ref.FirstName.

When I run the procedure NameCountByGender_FORCESCAN, I’m going to cruelly obliterate my whole data cache for the instance and then run this procedure for Matthew. I’m letting SQL Server use what index it wants to use for that table, but I’m saying you’ve got to scan it.

When I put that requirement in, in this case, it says oh, well, if I’m going to have to scan an index, I may as well scan the index that has all the data I need, which results in me getting the performance that I had before for Matthew.

SQL Server is also saying hey, I could use a better index here with the green hint as well. I do still have that problem for Fay. I just dumped out my data cache for the whole instance irresponsibly, but for the sake of my testing, and I’m going to say hey, don’t reuse Matthew’s plan. It’s still in the execution plan cache, I didn’t clear that, but compile your own plan for Fay. I forced her to scan as well, but {her query took 4 seconds}, which is the performance you would have had before.

So as compared to hinting the index by name… I mean this does still have some cons. This isn’t great for Fay, and this will adapt a little bit if the perfect index is added.

Let’s say the perfect nonclustered index is added. Well, SQL Server can use it, but it is forced to scan it. That may be a little faster than scanning the clustered index, depending on the different sizes of the indexes, but it’s probably not awesome.

This is a little better, but still not awesome

If I’m going to do an index hint, I would prefer this one because it’s a little more adaptable.

I think the cons are a little less risky, but if I am going to start using hints like this, I better have a practice where my whole team knows that whenever we change an index, we need to look in our codebase for the queries that use that table and look at the hints they have, because we may want to at that point remove those hints. Because they actually don’t help it at that point. They might slow it down instead, so there is definitely is a process overhead to this.

Another solution that might be a possibility is to break the query into two steps ourselves. One way to do that is by using a local variable

Now beware, this isn’t necessarily going to be awesome.

I’m creating this new stored procedure named NameCountByGender_variable, and in it I declare a local variable in the body of the stored procedure. I then figure out what the FirstName might be by querying ref.FirstName by itself. then I plug that into the WHERE clause of my query so I don’t have a join anymore. I’m saying: I’m going to figure out the FirstNameId, then put that in the where clause. This might seem like it’s the perfect answer, but wait, hold steady, it’s not as simple as it seems.

We’re going to clear out our buffer pool for the instance, dropping those clean buffers, then we are going to run our variable procedure for our good friend Matthew, all 1.45 millions of our Matthew friends. Oh, we’re at six seconds….. This didn’t finish in four seconds! Matthew has taken 12 seconds to run.

We have a two-part query now. Our query plan has definitely changed, because, you know, we don’t have a join anymore. We’ve rewritten this.

We have an index seek in our first query that looks up the name in ref.FirstName, and looking at our properties, we can see our query times stats since we have an actual plan.  This was fast. Our first query is NOT why this is slow, our elapsed time is 1 millisecond there.

Our second query is where we spent 12 seconds of our elapsed time, and it’s the one that took a while. Looking in it, we did a key look up into FirstNameByBirthDate_1966_2015. We don’t have two nested loops in here because we took care of the first one in our first query, but we had to do a key lookup to get Gender. We did an index seek on that index on FirstNameId, then we did a key lookup to get Gender.

Let’s look at our properties here, let’s bring up the tool tip and zoom in a little bit here. Our estimated number of executions was that familiar 6,016.2. Of course it actually had to do it 1.45 million times.

Oh, what’s up here? Well, what’s up here is the way we wrote the query.

When you declare a local variable in the body of the procedure – this is not a parameter {like the parameters at the top of the procedure}. This is declared in the body of the procedure. These are anonymous to SQL Server.

In other words, it does not ‘sniff’ the value of that parameter when it it set by this first select query. It has no idea what it in this @FirstNameId variable at this point because we declared it in the body of the stored procedure and then populated it/ So it does our familiar dance of saying: “Well, I don’t know what it is. So for any average FirstNameIdea about how many rows do they have?” Our old familiar, it’s anonymous, much like it was anonymous when we did the join. There is a work around, but it has a notable problem.

The work around for this with a local variable…

I’m going to recreate our stored procedure with the work around. The work around is to add an OPTION RECOMPILE to the query where we used the local variable.

That option recompile is saying– for this query, we have a query hint of option recompile on this query. For this query, optimize it for this run, and don’t reuse anyone else’s plan. Don’t let anyone else reuse this. When we’re doing that, SQL Server will look at the value of @FirstNameId for that run, but it means every time we run this query this part of the query is going to burn CPU by compiling. Let’s first make sure this works.

I’ve recreated the procedure name, NameCountByGender_variable. Recreating it means we’re going to get a new plan. I’m also going to make sure we’re running from a cold cache, and now I’m going to go ahead and run it for Matthew. When I run it for Matthew, it takes five seconds for it to run.

Our first query does that index seek and going on down to our second query, let’s see if we can drag this up and get a little more room on the screen.

Sure enough, it says oh, we should do a clustered index scan for all of these Matthews. Our estimate is, you know, it was able to look at that FirstNameId and say oh, I should use the histogram here and look it up. It is going to, for different names… if we run this again for Fay, the recompile hint means Fay will get a different plan, but the problem with this is:

Once we start doing this like this, our scenario is that this query doesn’t run super frequently, but when we start using recompile hints to solve problems, it sneaks into more and more queries. It’s like a fever.

It spreads among our code and sure enough, soon we are burning more CPU.

It’s really tricky, especially before Query Store in SQL Server 2016, if we aren’t running Query Store at the database level in 2016 or higher, it’s really hard to monitor these queries with recompile hints: how slow or fast they are or how much CPU they’re using.

It’s very tricky, and more CPU usage equals more licensing dollars. I tend to avoid recompile hints, they’re a last resort for me. When I do use them, I’m always thinking about: is there a better way I can tune this for consistent performance without having to use recompile, because of these challenges?

There are more use cases that we’re going to look at, and we’re going to talk about them in the next video.