A hint leads to a bright idea for a rewrite!
One of my favorite things about using hints is that they can lead to fresh ideas.
If our current code won’t do something, the hint will often raise an important point about why SQL Server is avoiding that option. This can lead to a new view about how to do things differently — and often that’s faster.
What’s batch mode?
Batch mode is a way where operators can process a “batch” of 900 rows at once. By using this batch, CPU overhead is reduced, and doing things like grouping and sorting big ranges of data can be much more efficient.
Which operators can use batch mode execution and the aggregate pushdown features?
As of this writing, aggregate pushdown works with MIN, MAX, SUM, COUNT, AVG, in specific situations. (Data types matter!)
Learn more about aggregate pushdown and batch mode in this article from Microsoft.
Batch mode and columnstore indexes are available in all SQL Server editions as of SQL Server 2016 SP1, but Aggregate Pushdown is an Enterprise only feature. Learn more about columnstore features by Edition here.
Gotcha: batch mode is currently only available when a columnstore index is present
This is weird: SQL Server can use batch mode on operators in a query even if it does NOT choose to use the columnstore index anywhere in the plan. Things like sorts and hash joins and even windowing function operators can use batch mode. But it can only happen if you’ve created a columnstore index.
This means that you may see different performance on a table after you add a columnstore index, even if the columnstore index isn’t being used.
This also means that there are hacks you can use to get batch mode with rowstore indexes, without a “real” columnstore index.
What does that mean for this demo, and for the future?
Batch mode doesn’t make a huge performance difference for the “NameRank” query in this demo, although some operators do get batch mode due to the presence of a nonclustered columnstore index.
There are other cases where batch mode can make a significant performance benefit, though! (See the “hacks” link above for a demo of that.)
Microsoft has announced plans to make batch mode more widely available in future versions of SQL Server (post SQL Server 2017).
One thing I might wonder looking at my table is…
Why isn’t SQL Server using my nonclustered columnstore index?
On every execution plan we’ve looked at for this query, I haven’t seen my nonclustered columnstore index in there, and there is one on agg.FirstNameByYearStateWide.
Well, whenever I’m wondering why SQL Server isn’t using an index, if I have the ability to test– an interesting thing to do is to use an index hint to force it to use the index in my test.
Looking at the execution plan, I may see something that points out to me exactly why it wasn’t making that choice, so it can be very, very interesting. I’m using my table variable test here, and this step where it puts the data into the table variable, that’s the one that takes so long, so I’m saying, when you’re calculating the rank, use the columnstore index on that part of the query. I haven’t applied any hints to the second part of the query that joins back to it.
So, giving it a run, let’s see how it performs but also what looks different in our execution plan that might explain choices it’s been making. Our run time isn’t too fantastic.
We’re over seven seconds of duration and CPU time. Looking at the plan, it did use my columnstore index, but then it pushed everything out into a sort operator.
Huh, let’s take more of a look at this columnstore index scan
Looking at the properties, sure enough, we were in Batch mode, which is a super fast, cool mode it can use, but there’s something interesting that’s missing from this look at our columnstore index scan.
I am running Developer edition, which has the same features as Enterprise edition, and columnstore indexes in Enterprise Edition have this really cool aggregate predicate push-down feature, where it can actually take a bunch of the work of grouping and aggregating and push it down into the columnstore index in itself, and I don’t have any indication that that happened here. Looking at my sort operator and scrolling down, I can expand on the Order By and look at and see that the columns that it was sorting by were ReportYear, StateCode, Gender, and NameCount descending.
It’s doing all of this sorting in memory, so it’s really just able in this case to use the columnstore index for data retrieval, and then it’s having to do a big sort and then push data into a window aggregate, and then that filter. So it wasn’t able to actually use any of the magic from the columnstore, and it had to resort everything as an independent operator.
This makes me realize I have seen that magic happen in columnstore before when I was doing Group By queries
I’m looking at ranks of number one based on Name Count Descending. Maybe if I rewrite my query with a Group By and a Max Operator, I can get that columnstore index magic to happen.
So what I’ve got in this query rewrite is I’m grouping by ReportYear, StateCode, and Gender, and then for that I’m finding the max name count. I have this as a sub-query, and I’m saying, name this max count, join back to agg.FirstNameByYearWide on the ReportYear, StateCode, and Gender for the row where the NameCount equals my max name count. I’m grabbing all those reporting columns still, I’m still joining to ref.FirstName, it’s just instead of using a windowing function I’m using a rewrite that uses max name count.
Executing my query, wow, the data’s back already, and I have an elapsed time of just 685 milliseconds.
Looking at my execution plan, I don’t have any warnings, at all, I’ve got a lot of nested loop joints, and I have a columnstore index scan. I didn’t even hint that this time, it just naturally went for it.
Looking at the properties on my columnstore index scan, I can see that this time, yep, I’ve got Batch mode. This time I do have information for the actual number of locally aggregated rows on my columnstore index scan.
It was able to push some of that grouping board work down into the columnstore index itself
It fed everything out of that columnstore index scan into a hash match aggregate, and I can see on my hash match aggregate, looking at my hash keys build, that it did have to do some work in aggregating here. But some of that was also handled in the columnstore index. Looking at the number of rows that it was estimating in different places, I can see that it had a really good idea with this query syntax of how many rows it was going to have to deal with, and that it made those decisions to make the nested loop joint with pretty good estimates, I mean, incredibly accurate estimates in this case, of how many rows it was going to deal with.
So it had better insight using the current version of the cardinality estimator with a different syntax here.
Well, maybe I just have this nonclustered columnstore index here on my test instance
Maybe that index isn’t in production yet. What if I want to know, how does my rewrite perform without that? Do I have to drop the nonclustered columnstore?
I do not. There is a query hint I can use to say, just ignore any nonclustered columnstore index that you find on this table, and I’m going to go ahead and run the grid and say, how do you do if you use the new syntax which uses Group By MAX but no columnstore.
Well, in this case, I get a runtime of just over two seconds, better than I’ve gotten with any other hint so far except for when I was using that nonclustered columnstore, and looking at my execution plan, it’s quite similar to the one that used columnstore except I have an index scan of a nonclustered row store index feeding into a stream aggregate operator.
But even without that nonclustered columnstore, SQL Server is able to have a better execution plan for this case with a Group By and a max
Now I’m not saying that windowing functions are bad. Not at all, windowing functions are great and you can do some really cool things with them. In this case, I have a different syntax that gives me faster performance, and the idea to use that syntax came in my case from using an index hint saying, hey, why doesn’t the columnstore index work well with my current windowing function? But, sometimes, windowing functions will still be the right choice. I don’t want to turn you completely off windowing functions, it’s just that in my case they didn’t get the best performance.