Columnstore: 0 partitions accessed- but rowgroup elimination! (15 minutes)

Part of "Execution Plans: Partitioned Tables and Columnstore Indexes (1 hour 30 minutes)"


So, why in the world would SQL Server lie to you?

Well, it doesn’t really lie to you. I think there is some sense behind this, but it kind of feels like a lie when you’re first looking at it.

We’re now going to let SQL Server use our nonclustered columnstore index

I’m going to run this query with actual plans on: let’s go ahead and turn them on. This time I don’t have a hint in there saying, “don’t use the nonclustered columnstore.”

We are allowing SQL Server the choice about whether or not it wants to use the nonclustered columnstore, and the FakeBirthdateStamp column is in the nonclustered columnstore– as well as it being partitioned by {the FakeBirthDateStamp column}.

Now, I am using in this first query, that incorrect data type– where this is going to be automatically changed into DATETIME2 precision seven, which is more specific than FakeBirthdateStamp, who’s DATETIME2 precision zero.

We shouldn’t get partition elimination on this first query, but wow! Our results came back super fast.

What happened in our execution plan?

Well, first up, SQL Server did use our columnstore index: We have a columnstore index scan here. Let’s see, actually, how long did that take?

I’m going to click on our select operator, I’m going to right click and say– well, I’m going to right click and get a tool tip :) I’m going to right click and say, ‘Properties’. And then, when we look at our– if we can make our properties window actually stick around there– when we look at our Query Time Stats, our elapsed time is only 293 milliseconds.

Wow! And our CPU time is 318 milliseconds. Well, that’s super impressive.

How did this happen? Did we somehow get partition elimination?

Clicking on my columnstore index scan, there’s something really weird going on here

I have a columnstore index scan, but my Actual Partition Count is zero.

And this is– when I first saw this I was like, “You can’t be telling me the truth here!” Because to me, my first view of this was the Actual Partition Count of zero implied that it didn’t have to use any of the partitions, which implies that it didn’t do data access.

But it DID do data access, because it produced rows! Actual number of rows is 819,192, and then actual number of locally aggregated rows is about 3.1 million. So, you definitely looked at the non-clustered index, SQL Server.

What the heck are you talking about with Actual Partition Count zero?

Well, these numbers are all numbers we’re going to decode here.

An Actual Partition Count of zero, what I think of that as now is: I wasn’t able to do partition elimination, but I maybe was able to work some other magic. That’s how I currently translate that.

819,192 rows is the number of rows that this operator is sending to its next door neighbor, the hash match. If I now highlight this arrow and look at how many rows are coming out of the columnstore index scan, 819,192 rows are being passed into the hash match for the hash match to count.

Remember, all my query’s doing is a SELECT COUNT *.

But, there’s more than 819 rows that are counted!

If we look at our result, right, our results are 3.9 million rows. Where are the rest?

The rest of them are in those ‘locally aggregated rows’

There is some magic in columnstore indexes where SQL Server’s able to push some aggregation down into the columnstore index, and it’s better at doing this in Enterprise Edition. I’m running Developer Edition, which has all the features of Enterprise Edition.

So, some of the rows, it was actually able to push down and count them while it was just looking at the columnstore index itself. It didn’t have to make the hash match operator count them, because it counted them in the columnstore index scan itself. But, 819,000 rows, it wasn’t able to do that magic with, so it had to actually hand them off to the hash match operator.

Give you a little spoiler here, that 819,000, these rows here that it couldn’t push down and count in the columnstore index, they were sitting in the delta store. They haven’t been compressed into columnstore segments in a rowgroup, they’re still sitting in that b-tree delta store.

They didn’t have the columnstore magic in them, it’s like oh, okay, these haven’t been compressed, they’re in an open rowgroup, I’m going to have to send them over for the hash match aggregate to go ahead and count them the old fashioned way.

Batch mode operators and Actual Time Stats

If we look at the properties of the hash match aggregate, it is able to use super cool batch mode, which is very, very cool. Batch mode is also on our columnstore index scan, right? We look at it, it’s doing batch mode, it’s the same as the hash match operator.

When we are looking at our Actual Time Statistics on an operator– this is our columnstore index scan, and since it is batch mode, whenever you’re looking at batch mode, these are reported just for the operator, never for any children of the operator. Now, this operator doesn’t have any children, so it took 273 milliseconds of elapsed time. The hash match aggregate, when we look at its elapsed time– see, it does have a child. The columnstore index scan is to the right of it, right?

So it does have somebody sort of beneath it, but it certainly isn’t reporting data for its child. It was able to count those 819,000 rows super fast. Super, super fast.

We then look at the compute scalar. The compute scalar is also in batch mode, and it also was just, incredibly– didn’t have a lot to do, it was able to be really, really fast. And then next we come to the parallelism operator. Now, here, you’re looking at this and we’re like, okay, 292 milliseconds.

Well, what does that mean? Well, down here, when we look at the actual time stats for this guy, 273 milliseconds, and then we added on one millisecond there, 278, zero there, we may have rounded up a little, so we went from 273. It doesn’t quite add up, but basically when we look at the parallelism operator, it is row mode, and this one is cumulatively reflecting the time of the child operators to the right of it, but it actually hasn’t done that for CPU time.

Then, when we look at the select operator we can get, in the Query Time Stats, the aggregate information for everybody, which is 293 milliseconds above elapsed time.

Row mode and batch mode operators display things differently

Batch mode thinks only of itself, row mode operators may think of others depending on where its position is in the execution plan.

We went up to the select operator to get the total from Query Time Stats.

Well, so this is kind of weird.

We had no partition elimination on this, but it was super fast

Part of that is because it was able to do that local aggregation, but the number of locally aggregated rows is still not the total number of rows in the table, right? The number of local aggregated rows on this guy was only 3.1 million. That’s not all of the rows in the table.

How did it not have to read far more rows?

There are some things that the execution plan isn’t showing us

We can see them in another way, I’m going to turn off actual execution plans, just because we’re going to look at something else.

In this example, I’m going to run the exact same query, same data type mismatch, letting it use the nonclustered columnstore index, but I’m turning on statistics IO and statistics time for my session– telling SQL Server to give me a messages tab with information about the IO and CPU usage for my query.

And I can see something really interesting in here. I mean, first of all, the query is, once again, really, really fast, which is great. I can see two rows about data access for the table. The first row says I had 2,757 logical reads. Those are reads that were coming from that b-tree delta store.

The second one tells me about access to that nonclustered columnstore index. I read from four segments, I skipped 230 segments.

Now, the terminology here is very confusing, because you might be thinking, “you said that segments are column segments, what’s up with that?”

This is using an incorrect term in the output here, and that was confirmed in a blog post by program manager Sunil Agarwal from the SQL Server team.

Rowgroup elimination

This should say rowgroups read was four, rowgroups skipped is 230.

A rowgroup is a collection of column segments, a rowgroup is a compressed collection of column segments, a closed rowgroup.

So, what it’s saying– these rowgroups have metadata. When SQL Server closes and compresses a rowgroup, it looks at those individual column segments and it knows metadata about them, it figures out information about the lowest and highest values in them.

So, it’s got partitions, and a partition may have more than one rowgroup in it– but for each rowgroup, just from columnstore, if I’m running a query that’s looking at: I want FakeBirthdateStamp after 2015, it can actually go through and SQL Server can look at all the rowgroups and just quickly check the metadata and be like, do you have any data after 2015 in you?

Based on the minimum and maximum values, because it compressed it, it can say, do I need to look at that rowgroup at all?

It was able to figure out, hey, there’s only going to be four rowgroups for the compressed data– for the closed compressed groups– there’s only four of them that could possibly contain these rows I need to count. I don’t even need to go into the others.

Which is very, very cool!

We couldn’t see that in the execution plan.

When I see, on a columnstore index, when I see partition count of zero, what I think is: okay, we didn’t get partition elimination, but, we may have gotten rowgroup elimination– but I can’t see that in this part of the plan.

So, that’s very, very awesome. But, let’s prove a little bit more.

Let’s look at the delta store and see if those numbers I told you pan out

We can look at sys.column_store_row_groups, which is a built in DMV. For some reason at this point, its name– Intellisense doesn’t give it the same name highlighting as other DMVs, but it’s still a DMV, it is documented. We’re just going to check that and say for our index, we want to look at the rowgroups and see what’s in them. We can see that we do have one open rowgroup.

An open rowgroup is that b-tree holding the delta store. It is partition number 52, and it has 819,192 total rows in it. We haven’t done any deletes. If you look down through the closed rowgroups, you’ll notice I haven’t done any updates or deletes in there.

We are having to track deleted rows in a special way. What we were seeing in the execution plan when we saw that 819,192 rows was: hey, there’s an open rowgroup. That’s just stored as a b-tree, I’ve got to go count those in the hash match operator because I can’t work my push down predicate columnstore magic on them.

Let’s run a similar query and say okay, I want to add up everything in partition number 51. Because partition number 51, all of these guys in partition 51, they are compressed and closed. Based on my boundary points that I looked at in the previous query, partition number 51 is all my data for 2015, so does that add up to our other number?

We’re going to sum up the total rows in partition 51 and sure enough, that is my familiar 3.1 million rows. Those were the rows in the compressed, closed rowgroup made up of compressed column segments– to get my terminology right– that SQL Server was able to count very efficiently with that push down predicate, and it was also able by looking at the metadata on those column segments in the rowgroup to be able to figure out: I can just eliminate all the other rowgroups.

Very, very clever indeed.