Demo: Stepping through a tuning process (11 minutes)

Part of "Speed Up the "Popular Names" Query SQLChallenge (46 minutes)"


Let’s return to that first question, that first question after figuring out okay, is it run with a local variable or not.

That first question was, where the query narrows down the data, how can I make that more efficient?

And we found that that was our driver table in the query and our big question is, with that filter for report year, can we push the filter for report year down into a clustered index seek?

Well, can I, one step in tuning that can help with that is to isolate out the part of the query that’s having the problem and just work on it as a unit on its own, get that tuned and then add back in more of the query which may require rewrites based on our changes. The part of the query that we’re working on here, that driver query part of branches start here.

I’ve got this CTE

This is unchanged from our original version, because I just want to see if I simplify the query down to this part and just work on this part, does it reproduce the same problem I’m seeing in the larger query? So we’ve got our CTE, we’re selecting from it where report year equals year to rank, we only want the top 10.

This is being created as a procedure, named rank by year core, which I’m doing as a temporary procedure. And I’m only doing that because I don’t want to have to clean up later. Temporary procedures will, like a temp table, when I disconnect, they’ll be automatically cleaned up, and I just don’t want to, I may want to keep around my existing procedure that I have for rank by year. I may not want to alter it, I may want to keep it there, I may want to reference again so, I’m creating, as I work through the smaller bits, I’m using temporary procedures, just for convenience for testing. So we create this as our temporary procedure.

When I look at this temporary procedure, I want to know, am I reproducing that problem where all of the rows are going into this filter, and it hasn’t pushed down the report year filter. And yes, I have reproduced the problem in this smaller bit of code.

That’s great, so now I can see if I can tune it, in this bit of code. What might I change?

I could move my predicate that represents the filter…

I am now saying I am going to rank data only for the report year, 1991. This means this is going to change a lot of other things in my query when I add them back because before, this ranked everybody, but now I’m saying no, I only want you, I want you to narrow it down to 1991 and then just rank the rows for that. I also only want to return the top 10.

That filter is still out there because it’s looking at the result of this column and putting a predicate on that. So having rewritten the query this way, I’ve named this one ranked by year, core predicate moved, I can now look at this query, and look at the execution plan and see, is my filter being pushed down to my clustered index seek? And yes, yes it is.

My clustered index seek has, it’s not a scan. It is a clustered index seek. And the seek predicate is year to rank.

Rewriting the TSQL there, is helpful. I can actually simplify it as well.

I still had in here partitioned by report year, as part of my rank, but I am limiting it to just one report year, so partitioning it by report year when I only have one report year, isn’t needed.

The SQL Server optimizer in this case was actually smart enough to figure it out. I don’t get execution plan change by doing this, it’s the same cost, the same operators but it does, by removing this TSQL, and then eventually removing the comment, it makes the TSQL easier to read, and I may as well simplify what I can.

This version is keeping report year moved into the CTE itself, and we’re just removing partition by report year because we don’t need it when we have put where report year equals year to rank in there and we’re only looking at one. So now looking at our execution plan once again, we’ve only got that 25000 rows flowing through there instead of 1.8 million.

At this point, in my tuning process, I decided that I wanted to rewrite this a little bit

And this was just a personal preference, I said, “You know, I don’t think “I really want to work with that CTE. “I think when I rebuild the query, “I want to work with this as a derived table.” So I don’t have a CTE up here anymore, I just took that logic from the CTE and I said I am going to put this as a derived table named start year. So I still have, in my drive table inside it, I’m still doing that filter and only ranking one year in there. I’ve still got the logic saying, okay I want the top 10 outside of there.

Looking at this and testing it, I now want to know, okay after I’ve rewritten it as a derived table, is everything still cool?

Yes, I still have a clustered index seek, and yes, my filter is still only at this point in the plan, on the rank, so this did what I wanted, even after rewriting it as a derived table.

Now it’s time to add more of the query back in

I’ve got the core of the query tuned, so now I need to start adding back and rewriting as I add back, other parts of the query if they need to be rewritten based on the change I’ve already done.

I’m going to add in, in this case, I’m going to add in the table aliased as 10 years later, which has an inner join to it in the original query, and ref.firstname which also has an inner join in it in the original query, so it’s easy.

Simple to add ref.firstname back in. That’s just a simple join on first name ID on start year to it. For the table aliased as 10 years later, I’m now making that a derived table, and similarly, I mean this used to be a reference to the CTE. But now I’m saying nope, I’m going to explicitly put in here, the windowing function rank. The windowing function’s being run in all the branches in the plan anyway, I may as well be explicit about it in my TSQL, and I’m explicitly saying, I want you to do this window function for 10 years after the year to rank parameter.

It’s going to limit it down to one year, and just rank for that year based on name count descending. Adding in this logic, my question is, and I’ve added in their related columns in the select query as well. So I have startyear.gender, I have startyearrank.thatyear as rank. And then I have the 10 years later rank as well as the first name.

Because the columns in the select can change how the query’s executed and change the performance as well, so don’t save those all to add them at the end, because you can just get a big surprise and everything changes when you finally add the select columns back if you wait ‘til the end. I like to add them back in as I go, and then examine performance as well.

Now we’re getting not only the rank this year, but the rank 10 years later as well as the names.

Hello Michael, nice to see ya. And our question is, yeah we are getting seeks on 10 years later, as well as the start year. This is lookin’ good, lookin' really good.

At this point we are well below 500 logical reads, but we haven’t added all the joins back in

The next step, I’m going to add the outer joins. So I’m following the same pattern here. The difference is that these tables had left joins in the original query, so I’m replicating that. For 10 years prior, we have the select query with the window function in there doing the rank. And we have the pattern year to rank minus 10 and so on for other derived tables, so in this pattern I have a lot of derived tables here. And I’m following the same pattern to add them in.

Now that I compiled this procedure I’ve gotten all the tables back in, with their related columns from the select as well. All the ranking columns there.

Let’s give it a go and test it and see how we did

We’ve got data back for all of those different years. Looking at our messages tab, yes! We have 375 logical reads, we’ve got 29 there, we are under 500 logical reads with no recompile hint and no literal values. And how does that look in our plan? Let’s go over here and navigate over to one of the clustered index seeks.

We can see when we look at the seek predicates, it hasn’t looked at the, you know it hasn’t, we have no recompile hint saying, pull out that value and make it a literal. This is using parametrized year to rank. And this plan is reusable, so that is really, really cool that we now have an actual parametrized query that follows best practices for SQL Server, so here’s the one for 20 years later. Very, very cool that we got that done.

Would this approach also work with a local variable?

We tuned this in the context of a procedure. If I do change this around and take what I just tuned there and say I’m going to go back to using a local variable that’s anonymous, does it still get performance that good? We could’ve followed a similar tuning path with the local variable, we just didn’t. We proceduralized it.

But if I test this now, with a local variable, I can see that yes, I do get under 500 logical reads as well. It can meet the challenge with a local variable, without any recompile hints, but a significant TSQL rewrite. I’ve got all of my seeks goin' on there so that I’m, effectively what I’ve done here is made sure, and wrote the query in a way that forces SQL Server to say, I’m only going to seek to the data that I need, to rank for that. I’m not going to rank any of those years that I don’t happen to care about in this query.