When I’m tuning a query, how do I figure out where to start?
One of the first questions that I ask when I’m tuning a query is I look at the context of the query. Is the query in stored procedure? Is it part of a function?
When the query is normally run against the environment, how is it run, is one of my first questions, because this particular query uses a local variable for year to rank.
This isn’t in the context of a stored procedure, this is just declare year to rank as a naked statement in a session in Management Studio, which makes it a local variable, not a parameter. Parameters are treated differently by SQL Server. And if this query is something that is a part of a larger stored procedure and has just been copied out of it, and year to rank is a parameter in that stored procedure I want to test it in a context where it is a parameter. I want to test it in the same way that it’s run.
If this is run by an application that connects to SQL Server and says I have a parametrized query, and here are the parameters with their data types and here are their values, I need to find a way to mimic that if I’m testing it in Management Studio, and using a local variable is not the same.
Because these can behave differently, for fun we are going to test this in different ways
So we are actually going to test this once, we’re going to start out assuming it’s run as a local variable, like we have here in Management Studio. We’re going to tune it that way.
And then after we find some quick solutions with that way, we’re going to switch over, we’re going to turn it into a stored procedure, we’re going to see, is it different when we make it a parameter in this case? Sometimes it can be dramatically different, it is not always dramatically different, so we’ll test and see in this case.
Is it dramatically different if we make this a parameter and then we will tune it for this stored procedure there. That’s one of the places I generally start is saying, okay am I testing this, as the same, with the same if it has parameters like this. Is it run usually with a parameter or is it run with a local variable, ’cause what I have here is a local variable.
The next question I ask is…
I look at the query, and especially if I’m trying to do something like reduce logical reads. I look at the query and I ask myself the question, “Okay where does this narrow down the rows that it’s using?”
And we are using this year 1991, we don’t want the rank for every single year. Because we have data on this table from like 1880 to 2016. I don’t want the rank for 1880, I want the rank for 1991 and then some years offset from that. So that’s where we’re filtering out. Let’s take a look at our execution plan. I’m going to enable actual execution plans. The box is already checked there but it is this little box up there. I’m going to say: Run this query and then show me SQL Server, how you executed this query. What went on behind the scenes in running this?
This query hasn’t been tuned yet, this is our original version. We just want to get the execution plan. Which as you can see, is a very simple plan. This is one of the fun things with tuning queries, is that we look at a plan like this and we’re like, “Oh, where do I start?”
There are many branches to this plan…
It’s not the most complicated plan ever but you know there isn’t just like five different operators to look at, we have choices. Well let’s return to that question. Where is it drawing the data from and where is it narrowing down? A natural place to look for this, it can often be useful to go to the top right, operator slash branch of a plan and look there. This is sometimes referred to as the driver table of the plan, this guy at the top left or, I don’t know if it’s a guy. It contains babies of multiple genders actually. But we’re doing here at the top right, operator and the plan we’re doing a clustered index scan of first name by year.
Are we narrowing down the rows there? If I hover over this and look at the tool tip there, I notice that there are no predicates here. It is a clustered index scan, so there’s no seek predicate, but in addition there is no hidden filter or another predicate just hanging out in here. It can hide filters in here, and they just show up as predicate, where it’s like, okay I scanned everything but then I was checking what year are you in. There’s none of that here.
We can see that, if we hover over this line in fact, 1.8 million rows, almost 1.9 million rows are flowing out of here. And all 1.9 million rows in the table are being sorted, and then passing through multiple segment operators and a sequence project. Those are a part of our windowing function who’s doing the rank.
And then past that we go into a compute scaler and then finally a filter operator. We have all the rows. We have big arrow going into the filter and then little arrow coming out of the filter. So if we hover over the filter, and look at the filter’s predicate it is filtering on, are you the right report year and is your rank less than or equal to 10?
We are narrowing down the rows kind of late
Now looking at is your rank less than equal to 10, after you do the ranking makes sense. We have to rank things before we can filter on it. But are you in the report year 1991? We are ranking rows for every year from 1880 to 2016, and then filtering them, by saying, “Oh, I only want 1991.” I would rather filter out 1991 and then do the ranking for that and then grab the top 10, right? So this one predicate, we have two predicates on this filter, we want one of these to be pushed deeper down, and to happen back here before we push all the data into the windowing function operators. That is the top branch of the plan.
If I look at other branches of the plan, hey there’s another clustered index scan of first name by year that if I highlight it, does not have a predicate anywhere to be seen on it. There’s no predicate here. And it’s pushing all the data into, it’s sorting it segment, it’s doing the ranking, it’s doing window function operators and so on for every branch of the plan. Even though I only have one CTE in the query. Even though in my query, I only have dense rank here at the top and I’m pulling back from this CTE and aliasing it in different ways.
Just because I wrote dense rank once in the query, doesn’t mean it’s only going to appear once in the plan because look, I’ve got, for different branches of this query plan, it is scanning the whole table, figuring out the rank for every row in that table again and again and again. And the beauty of execution plans is that it makes it clear to me that that is happening. It isn’t just running the CTE a single time. Okay, well that’s interesting.
One thing that maybe strikes me as a quick fix… (recompile)
If I’m looking for quick fixes, and I’m thinking about how local variables are treated, local variables are treated by SQL Server as anonymous. When SQL Server looks at this local variable, it doesn’t look in there and be like, “Oh, you contain the year 1991, I’ll optimize for that.” It doesn’t do that. Local variables are a mysterious realm where it’s like, “Oh I won’t peek in there unless you tell me to.”
And the way we can tell it to, is by adding a recompile hint. I’ve got it here as a query hint.
When I put option recompile, what SQL Server’s going to do is it’s going to say, “Oh, I’m optimizing a plan just for you. I’m not going to reuse this for anyone else, this is just your plan. I’m going to go ahead and look inside your local variable.”
It changes the way it treats that local variable and it’s going to look in there and notice that the year is 1991. And if we do that, if I, just with this change. The only thing we’ve changed is adding on the option recompile. If I run this now and look at the execution plan, I may see something different in that top branch.
Now I get my results back, hello Michael. You’re still number one with me Michael. If I go over here to the right, I can see that I do have a change in the top branch of my plan. The branch below it still has a clustered index scan, but notice that the arrows, the size of the arrows in this branch with the clustered index scan are larger than the arrows in this branch. The top branch, my driver table, is now a clustered index seek.
And if I look at what the seek predicate is, I can see that the seek predicate is on 1991. I added the recompile hint. The recompile hint, had SQL Server look at the comp, crack open the local variable and say, “Oh, I’ll optimize you for 1991 and switch that in.” And so now it’s like, by switching that in and making that explicitly a literal value, SQL Server could figure out, “Oh, I can push this down to this branch of the plan.
For 1991 I now have a seek, but all my other branches still have a problem. They still have clustered index scans on them. So I am still doing a lot of IO on this. I still haven’t succeeded in, reducing IO. I’m going to turn on set statistics, IO on for my session, and let’s just see how far recompile got us. I don’t think we’re going to, still scanning the table multiple times in many of those branches, so we haven’t gotten below 500 yet, but how far have we gotten? Well, we’re at 20885 for first name by year.
Next step: simplify the joins
With recompile, we’ve gotten it to take care of the rank by year aliased as start year. How about these other guys? Well, when we look at this join here, when we look at the join right below this, what we’re doing here is we’re saying startyear.reportyear plus 10. We’re taking a column, and we’re doing addition on that column.
Now we know that startyear.reportyear, looking at our predicates, we know that startyear.reportyear equals year to rank. We can simplify that in our head, but SQL Server doesn’t do as good a job simplifying it. But we can simplify it for it, by saying: Okay, I’m going to change the way that I write these joins. I’m going to simplify these predicates and instead of, you know before I had a column name plus 10, now I’m just going to say year to rank plus 10 because we know that startyear.reportyear equals year to rank. So instead of having SQL Server try to do math on a column, which it is not good at in a comparison like this, we’re going to just say, okay look at year to rank. And further we still have the recompile hint in here.
Which means that year to rank won’t be treated as a local variable, SQL Server will look at the value and see, oh year to rank equals 1991, and it will swap that in. With recompile changed as well as, wow that was fast. With recompile changed as well as the join simplified, this gets us down to under 500 logical reads. Looking at our execution plan, we have lots of changes in here. We still have a clustered index seek on our top branch, and we narrow down and say, we only want to rank rows for 1991 in this branch. And now we have clustered index seeks on all the other branches as well. And if we look at the properties of them we can see this one is for 1971. The way that we rewrote the join to put the arithmetic not on a column, we’re not comparing a column that we’re adding something to work, we’re comparing a literal, we’re comparing, a local variable that we have done math on.
And the recompile hint lets SQL Server crack open the local variable and see the value, so it’s just doing 1991 minus 20 on this branch and it says, “Oh, that’s 1971, I will compare that “to firstnamebyyear.reportyear.”
So it’s having to in each of these branches, it’s only taking the rows it needs for that branch through the window function operators now. It was doing that before but it was taking every time it was taking all the rows from the table through there. And that makes the query, much, much more efficient.
If we don’t want to use the local variable and recompile hint, a very similar approach is to just go literal
In this query I have commented out. We do not have the local variable anymore. And instead of using a local variable, we’ve just said where, startyear.repotyear is 1991. And then in every single join we, I for fun left the math in there. Of course we could do the math, but I left it in there so it’s 1991 plus 20, 1991 minus 20, or minus 10. Putting literals in, similarly if we look at the execution plan and the performance for this, we are, and we need to actually have our statistics IO on to see the performance there in the messages tab. We are at less than 500 reads, and when we look at the plan, we do have all of our nice clustered index seeks. You get a clustered index seek, and you get a clustered index seek.
This literal approach is essentially kind of where recompile got us with the local variable. But these approaches, they do work, they do meet the challenge. These are quick fixes that meet the challenge, but they aren’t the greatest tricks to use in tuning your production code long-term. Because what we’ve done here is, by putting, by changing our query to use literal numbers, if I change the year, I’m going to change the query and get a completely different plan.
If I put a recompile hint on my query I’m saying, compile a fresh plan every time you run and don’t let it, nobody’s going to use it, this is just for me. That can be fine in some situations. If have a data warehouse where occasionally we have queries run and they’re long queries, but occasionally we compile a new query, it may not be a big deal to compile them fresh if the number of queries run is relatively low. But there’s fewer and fewer environments like that.
More and more we have lots of queries running, and we have mixed workloads with OLTP happening and larger queries happening too. And if in those environments we start putting recompile hint everywhere, or we don’t parametrize our queries properly, we start getting lots of compiles and lots of execution plans and cache if we don’t use parameters at all. And that literally can cost us money because compiling’s very CPU intensive.
We pay for SQL Server, by the CPU. So if we have this, if one of our tuning tricks we use all the time is recompile hints, that does over time in these busier environments, that does literally cost us money so, it’s not that I never use recompile hints, but they are really one of the last resorts I would use.
I’m interested in trying to find a stable solution that doesn’t require a recompile hint, if I can.