The "Max / Min" problem query (8 minutes)

Part of "Problem Queries in Table Partitioning (1 hour 30 minutes)"

Performance for this simple query changes dramatically when you partition the table…

And not in a good way.

Vote to improve this behavior

Do you think the optimizer should be smarter about this? Tell Microsoft by voting up this bug.


This problem query is the MAX, MIN query

This query is looking for the highest value of the FirstNameId column.

Now, it’s kind of an arbitrary query in our case. I’m just showing I’ve got a column in the table that’s an INT, and I’m looking for the highest value.

Both tables have a nonclustered rowstore index on FirstNameId

On my nonpartitioned table I’ve got a normal old, nonclustered rowstore index on FirstNameId, and I’ve put this same index on my partitioned table.

Let’s run the queries against both tables

I’ve got my two queries set up here in Management Studio.

The top query is against the nonpartitioned table, the bottom query is against the partitioned table.

I’m telling it for now, don’t go use the columnstore index. Let’s test that rowstore index on FirstNameId.

Make sure my actual execution plans are turned on and we’ll start our execution.

The results from the nonpartitioned table are back right away, super fast.

And wow, we’re still waiting for the query against the partitioned table. That took a little bit of time.

Let’s look at our execution plans and see what happened

Just from the plans, from the estimated cost we can see that when SQL Server looked at these two statements it immediately knew that the query against the partitioned table was going to be a lot more work. Those estimated costs turned out to be right based on what we saw, right?

Let’s take a look at the timing, and again I had run these queries before.

The data was already in memory for these guys.

Our top query, highlighting the select operator and going on down to query time stats, it’s actually not registering on the millisecond scale here. It was really fast.

Our second query, looking at its query time stats it took more than 11 seconds and it did go parallel. It used all four of my cores. Things went wrong. Looking at the operators on the right, even the top query did do an index scan.

Against our nonpartitioned table we have an index scan operator.

You may have the impression that scans are bad, seeks are good – not always the case!

Some scans can be incredibly efficient. Taking a look at this scan, the actual number of rows it brought back is one. We’re going to scroll down on this guy. Look at the number of rows read. The number of rows read is also one.

This scan operator did not read the whole index because SQL Server looked at this and figured out: you want the highest value for FirstNameId. You’ve got a nonclustered index, a rowstore nonclustered index on FirstNameId. Now the index is ascending, but I have the ability to read backwards, so I can go to the top end of that tree and do a super efficient backwards scan and pick up the top row. Very, very efficient. Very, very cool.

When we partitioned the index on FirstNameId, we do not have the same plan

We do have an index scan operator. Let’s take a look at the partition count. That index scan touched all 56 partitions in the index. It didn’t bring back one row for each. We don’t have 56 rows coming out of this guy.

If we scroll down and look at this guy, the number of rows read is way higher and there’s no little indication that it did a backwards scan. It doesn’t say forward it just doesn’t say backward in the plan. When it says backward, it’ll tell you. We have the absence of that.

So with this index– and it is the same! Looking here, I’ll prove the object that we used, let’s make our index name a little bit bigger, it is the index on FirstNameId, it did pick the right index and it did a scan on it, but this scan is different.

Let’s take a look at the slides and see what’s going on

Here’s a drawing of the nonclustered index before partitioning. The rowstore index on FirstNameId, SQL Server says hey, I could just go to the high end of this and do that really efficient backwards scan. There are some limitations to backwards scans. That operator, that zone in the execution plan, not the whole plan but just that part, it can only be single threaded.

Then hey, looking at this operation that’s totally fine, right? Like I only read one row. I only need one thread so that seems like a fine deal in this case.

Once we partitioned this index on FirstNameId, it became more complicated. We have multiple partitions, partitioned in this case on FakeBirthDateStamp. We’ve got these chunks of years. For every year of babies born, we’ve got a little B tree based on FirstNameId. SQL Server could find data in any of these partitions. It’s going to have to check them, but it decided to go do a scan operator and to do a forward scan against every single partition in the table.

This is a little bit derpy because we could do that same single threaded scan of just the top page– admittedly in every single partition, but hey doing that one time took not even a whole millisecond. So doing it 50 times is going to be a lot faster than 11 seconds, right?

The optimizer just didn’t figure out how to do this. There is a bug filed on this: Specific “hey, when we partitioned the table, scenarios that are specific to our MIN, MAX, or TOP queries got slow.”

Because this same issue can happen with MIN as well as TOP. If we rewrote this as a TOP, we would have the exact same problem.

This bug, if you look at the date this bug was opened in 2006. It is still active. You can vote for it if you think this is a problem, and I encourage you to.

We do have some work arounds. If you look at the workarounds number there, there’s three listed. Essentially there’s a big workaround pattern for T-SQL that Microsoft recommended on this bug, and let’s take a look at that option as well as other options to tune this MAX problem.