Dig into the plan and test a nonclustered index (11 minutes)

This video has captions you may enable in the viewer, plus a written transcript below.

Now, let’s dig into that execution plan for the baseline and see what’s going on! Once we know the lay of the land, er, query plan, then it’s time to test and see how much of a benefit a disk based rowstore nonclustered index will give us.

Looking at our execution plan

This top branch of the plan up here is looking at ref.FirstName.

I’m joining and saying, “okay I want to decode the first name.” That’s what the top block (region) of the plan up there is. I am getting parallelism: we can see that from all our double yellow lines. But look at this bottom branch! I keep going to the right. I mean, wow!

So we are doing – I haven’t given it any nonclustered indexes – we are doing a clustered index scan on our clustered primary key of agg.FirstNameByYearState. And now we’re doing a big old sort, and that’s leading into segment operators and a window spool operator. I have four cores, we are using all our cores.

So our sort: it thinks – this is estimated, cost is always an estimate, even in an “actual” plan. It thinks that sort is going to be really expensive.

Then, so, segment: what are we doing here? Let’s look at some properties. So for our sort, it’s thought this was going to be fifty nine percent of the cost of the plan.

But let’s look at our actual time statistics

By the time that this finished – this is in row mode, so our actual elapsed time here is cumulative for this operator and its child operators.

So this sort, and the clustered index scan and everything: our elapsed time is five-point-six seconds. Our total elapsed time for this whole query is more than twenty seconds. Our total elapsed time for this query with thirty three seconds. So, only a small fraction of this was on the sort.

What were we sorting by?

If we expand the order by, down here– I’m in the order by for my sort– I’m sorting by my partition by stuff: my FirstNameId, my StateCode, my Genderl and then I’m ordering in my window function by ReportYear. So all of this stuff in my ‘over’ clause, where I’m partitioning by FirstNameId, StateCode, and Gender, and ordering by ReportYear, is what is having to sort in here.

And this makes sense! I’m I want to see– for each row, I need to add up in a group, for that FirstName, StateCode,and Gender, okay what’s my running total, you know, for that year. So, I’ve gotta sort by those things.

It then feeds that into a segment operator. This segment operator, what’s this one doing? Look there’s a group by on this segment operator! We are grouping by FirstNameId, StateCode and Gender.

I’m doing a running total for this year, and – running total right– I want this year and all the years prior. What I need to group by: so I’ve sorted by all of these things plus ReportYear, and now in my first segment I’m starting to figure out my windows.

What’s our actual time now? We are once again in row mode, and we can see that at the top – let me zoom out here– and prove: we are in– actual execution mode is row.

So when we’re in row mode, our actual elapsed time is it and its children. So it includes that sort. Now we’re up to six seconds. The next segment operator – okay, we’re up to six-and-a-half seconds. Everything is in row mode in this plan.

And, now when I look at what I’m grouping by here. And, it’s like the battle of the column widths now! This one is saying: okay I’m working more on my running total, I’m bringing ReportYear back. And we’ve already sorted by that make it easy.

Now we go to our first window spool

What is window spool? This is expanding that set of rows. So, we ordered everything, we’ve been segmenting the data, and now we’re actually using our window spool.

What is taking up all our time? Well, wow: okay this is it and its children. It’s child had six-point-five seconds. Now, by the time we get done with this window spool: awww, twenty four seconds!

The work of sorting is a lot, BUT actually working through the window spool and expanding all the rows, and figuring everything out: even though the thing is parallel, we’re using seventy five, almost seventy six seconds of actual cpu time because we’re parallel. We get done with it in about twenty four seconds.

We’re still pushing, as we finish different windows, it is able to push things into the stream aggregate but, you know, it takes a long-time to finish up that window function work.

So, maybe, by creating a nonclustered index, maybe we can get rid of that sort operator, and make the work of that segment operator more easy.

SQL Server thought that sort operator was gonna be a lot of work. It looks like that estimated cost is a bit high compared to everything else, to me.

Maybe we can make the sort go away by pre sorting the data in a nonclustered rowstore “POC” index

When you’re designing these, Itzik Ben-Gan wrote a great article about this: and essentially what we’re going to do is… you know how we saw those columns that it’s grouping by? We’re just going to index based on that.

In Itzik’s article, he said: there’s an acronym you can use to remember this. And I think acronyms do make things easy.

Here’s the acronym: P.O. C.

P - We’re going to index for the partitioning columns, and put those in the key of our index.
O - We’re going to index for the ordering columns. We’re going to put those in the key for our nonclustered index.
C - And then we’re going to index for anything else we need to cover the window function.

So let’s look back up at our window function here in our top pane. Expand our code in our procedure.

Here is ‘partition by’ these columns: FirstNameId, StateCode, Gender are suitable for the key.

Then we’ve got our order by column: ReportYear, let’s put that in the key.

NameCount: now we need to cover other things in the index. We’re summing up NameCount for our running total, we also need that in the index. But we don’t need to order by it, so we can put that as an included column in our index.

That’s what we mean by partition by, order by, covering.

So we’re going to create an index!

For FirstNameId, StateCode, Gender as our key columns. ReportYear: tack it onto the end of that key for our Order By, and then we’re going to cover NameCount as our included column.

This rowstore index isn’t huge, it just takes a couple seconds to create. And, what we’re hoping to do is to pre sort the data to make this all a lot faster.

Now how fast is this going to make it? I’m going to turn this on we run this with actual plans. That– what we saw when we were looking at the query time stats: and this did take the thirty three seconds. It often takes about twenty six seconds on my machine, the thirty three was a little bit slow. Twenty six, twenty seven seconds is normal.

When we were looking at the elapsed time in the execution plan properties, by the time we were at the sort, we were only six-and-a-half seconds in. So, if all the index does for us is eliminate that sort, Sure, enough… Now we’re at twenty nine seconds.

So we’re a little bit faster, but this is not like a massively fast query at this point

Looking at our plan, I do have a little warning here: let’s look at what the warning is. A warning on my select operator there. And sometimes with the properties pane, you have to click around for it to actually get the property of we want. The property of the select operator– there– it’s showing us what we want now.

If I go down to warnings here, it will give me details. I’ll pop it up, and it’s just saying that I allocated more memory to do things like in-memory sorts for all that data. I didn’t use much of it.

So it’s not that we– if we too LITTLE memory, I might be like, maybe that slowed us down. Nothing else is running on my test instance, I didn’t have to wait to get my, you know, my memory grant was excessive. I’m not arguing that. But I didn’t have to wait to get it, because it’s not like I had to compete for that memory grant with a super active server. So, not super concerned about the memory grant being what made me slow.

Looking at my plan, let’s go all the way to the right here – there’s a lot of windowing function in there!

Now, we did use our index

We’ve got our nonclustered index in play, and it is totally fair that it had to scan it. We want to do a running total for every single row in here and figure out, okay which rows have passed the threshold we passed in for a given, FirstNameId, Gender and StateCode. What year did it happen in? So we’ve got a look at every single row.

We do not have a sort of operator before we go into our windowing function. We go straight from a parallelism operator into our window function. Fun with drawing straight lines here– and our windows, when we are on this segment operator, let’s look at our query time stats.

Our actual time statistics at the point of the segment operator we’re at, it’s four point four seconds before this operator finishes.

Looking at the window spool operator, sure enough it’s taking a long-time to complete this window spool. So that hasn’t changed.

By creating this index, we did we did improve something! We got rid of the sort in here. That had to happen before things got fed into the window spool. But the window spool operator is still slow. Well, what can we do to speed this up?