Batch mode hacks, and what does the future hold? (9 minutes)

Note: following this recording, Microsoft introduced batch mode on rowstore as part of SQL Server 2019.

Batch mode hacks

Read about Itzik Ben-Gan’s cool hack with the filtered nonclustered columnstore index in his own words in the article, What You Need to Know about the Batch Mode Window Aggregate Operator in SQL Server 2016.

What exactly can SQL Server 2014 execute in batch mode? Read this answer by Paul White in which he covers the hack that works for SQL Server prior to 2016.

Microsoft has announced that future versions of SQL Server (later than SQL Server 2017) will expand support for using batch mode beyond tables that have columnstore indexes.

Transcript

Itzik Ben-Gan is kind of famous for this hack, he’s famous for lots of cool things, one of them being this hack. We got the ability to have filtered, nonclustered columnstore indexes in SQL Server 2016.

Itzik figured out that you could create a filtered nonclustered columnstore index with an impossible filter

In my case, I’m saying only put rows in my nonclustered columnstore index when FirstNameId equals negative one and FirstNameId equals negative two.

Now FirstNameId is never going to be equal to these two values at the same time so this nonclustered columnstore index which creates immediately is going to have no rows in it, ever. It doesn’t have any rows now, and nothing’s ever going to qualify, but it is still technically a nonclustered columnstore index.

So this is totally a hack, but it’s really genius, it makes me laugh

When we look at the metadata for this, in sys.dm_db_partition_stats sure enough it has zero rows and it just has a very minimal page count to it. Well let’s run just our original stored procedure against it the one that originally took over a minute to run. Well now it takes only two seconds, and sure enough there is no nonclustered columnstore index to be seen in here, but what we do see are, our very cool batch mode 2016 Window Aggregate operator.

We have this filtered nonclustered columnstore index that’s never going to have any rows on it because we put this crazy filter where nothing’s ever going to qualify and we were able to get those batch mode operators on it. So very, very clever hack from Itzik Ben-Gan.

There’s another hack that folks were using before 2016 and I’m actually not sure who came up with it

Itzik Ben-Gan has written about this, he had a note that he learned about this from Niko, I’ve also seen Paul White reference this, I don’t know if Paul learned it, I don’t know who came up with this, it is genius, and is also a little crazy.

I mean I think it’s wonderful, but…

These are all hacks, these are all technically not supported, I have never seen Microsoft endorse these!

This one is to create a table that we are not really going to use, and we’re not even going to put any rows in it.

I’m creating a table named hack and I’m putting a nonclustered columnstore index on hack. I could put a clustered columnstore index, I chose to put a nonclustered columnstore index, choose the flavor you like best.

We are going to create a variation of our procedure named PopularNamesWithABogusJoin. And PopularNamesWithABogusJoin is just like our original procedure, but it has a bogus join in it.

We are doing a left join to dbo.hack where one equals zero. Now when SQL Server actually optimizes our procedure, it is going to LOL at this and be like “Seriously, I can just throw that out. “I don’t need to consider your bogus join at all, but I will go ahead and optimize the rest of your query"because I’m very tolerant of your stupid bogus joins.”

We are not going to use hack at all, but hack has a nonclustered columnstore index on it.

I’m going to go ahead and execute PopularNamesWithABogusJoin with our usual threshold. This took more than a minute on the first run. When I look at my statistics IO, note that there’s no mention of my hack table.

We have a tempdb worktable, we’ve got FirstName, FirstNameByYearState We’ve got no… SQL Server looked at hack and it was just like, “I don’t need you.” It didn’t fall for that.

But it did say, “Oh, there was a table referenced in this query that has… I smell a columnstore index in the neighborhood so I can use our awesome batch mode Window Aggregate operator.”

You might wonder, as I did, could I do this hack with a temp table?

Well sure, let’s create a temp table named hack with a nonclustered columnstore index on it, and now I’m creating PopularNamesWithABogusJoinToATempTable on it and it is much like the previous query but I am doing a left join to temp table hack on one equals zero. So this will be optimized out as well and sure enough when I run PopularNamesWithABogusJoinToATempTable on it, I get no columnstore index in here, no mention of my temp table, I do still get a work table just for my temp table work of my Window Aggregate operators, still having to use some tempdb space, but I get my batch mode Window Aggregate operator too.

So should you use these hacks, and what hacks should you use?

Creating the bogus table named hack, whether it is a temp table or whether it is an actual table, that required changing the code in my procedure, even though I’m doing a bogus join, I had to actually put it in the T-SQL.

But if you really want to use this hack and you’re on SQL Server 2014 or SQL Server 2012, that’s your choice, right? Because in those versions, you don’t have the filtered nonclustered columnstore index.

Itzik’s beautiful, and let me get back to my code sample here of Itzik’s beautiful filtered nonclustered columnstore index here it is, if you’re on SQL Server 2016 and higher, putting this on any table, even a tiny little table that you’re joining to, would allow SQL Server to choose batch mode.

You are still, if you use it, choosing a hack that is not technically supported by Microsoft and if you run into problems, I think what they’re going to tell you is maybe you shouldn’t use weird nonclustered columnstore indexes that you read about on the internet that don’t contain any rows

So it’s a hack, if you use it, you bare the risks for that hack, but filtered columnstore indexes are a documented cool feature, so it is something that you absolutely can do if you’re ready to bare the risk. Looking to the future, will we have to think about whether or not we want to do hacks to get batch mode on queries with rowstore indexes? Well, quite possibly not forever.

Here is a bug on Microsoft Connect that Niko filed saying “Hey it would be great…” Or a suggestion rather, this is not a bug, you can file suggestions as well, and this was filed as a suggestion, didn’t want to imply that this was rude, but a suggestion of “Hey, we would love to have batch mode support for rowstore.”

Now, promisingly, there is a response from Microsoft on this that says “Hey I’m working on this right now, it’s not going to be in the next version.” And at the time this was written, that makes me think that the next version was SQL Server 2017. But it does say, “Hey, I’d like to talk to you Niko about use cases…”

It’s a little complex but the good news is this is someone who was actually working on this, which is very, very cool news and very promising, and who knows if we are going to see this, I don’t have any inside information on this, but I’m really hopeful based on this comment, and this makes me think that okay one, we may not have to do these wackadoodle hacks, or consider doing them, but also hey, it is absolutely worth our time to really get use to these execution plan operators and understanding the difference between batch mode and row mode operators in our plans because I sure hope we’ll be having more and more access to batch mode operators soon.

Note: following this recording, Microsoft introduced batch mode on rowstore as part of SQL Server 2019.