Why Partitioning Makes Some Queries Harder to Optimize (4 minutes)

This lesson is an overview. For detailed examples of regressed queries in table partitioning, complete with demos, check out the course Tuning Problem Queries in Table Partitioning.

Transcript

What about the cases where partitioning makes things slower?

I don’t have an exhaustive list of these. There’s always, whenever we change a data structure, we’re going to have to optimize the queries differently, and we may get a regression. We may get a performance improvement in some cases, too.

There are some well-known cases where there are optimization problems

The most famous of these with partitioned tables is if I want to find the maximum value in an index, or the minimum value in an index, or some top values in an index based on order by. If I have a simple nonclustered index on say, CreateDate, if that was… and I want to know the most recent you know, what’s the maximum CreateDate of the table?

If I have a nonclustered index on that and it’s not partitioned, it’s going to be nice. It’s going to contain the date field, and then just the minimum other fields that it has to have to exist as the index. It will sneak in the clustering key and stuff but, it won’t be a super huge index. And SQL Server can just go straight to one end of the index and be like and be like BOOP! right there, here’s your max value.

If I have a partitioned, nonclustered index on CreateDate it has in this case, let’s say it’s broken it into each CustomerId as CreateDate, right? I have partitioned your CustomerId.

Now we would think, okay well I do have to check every partition, but I would think it’s going to be able to do a seek in each partition; what’s the top value, what’s the top value and then do an overall what’s the top value. We have a little bit of an optimization issue in SQL Server where it’s not that smart. And it will scan every single partition on that nonclustered index. There are some T-SQL rewrites you can do to make it smarter, but it’s not a simple rewrite.

I have some detail on this in the class, Tuning Problem Queries in Table Partitioning-- I have a demos for that. But, yeah, not that fun.

So simple queries that were really fast before that used TOP MIN MAX, we may have to do rewrites on or create that in an online index. We could create that index that doesn’t partition on the table, but then no switching in, no switching out, no truncated wow that index is created and enabled so that’s kind of a bummer.

There are also some patterns where non partitioned, nonclustered index may be able to use an operator like a stream aggregate to do grouping operations. And the cool thing about a stream aggregate operator, is it knows how the data is sorted before it goes into the operator. That’s why it can use the stream aggregate, and the data can flow it through.

Whereas if we don’t have the ideal sorting order going into the stream aggregate, say because we are partitioned by a column that isn’t specific to our query and we don’t have guaranteed overall sort order. We may have to use a blocking operator like a hash match aggregate, and in some cases that may slow down our query. So just by that breaking data into a bunch of chunks it is natural and proper that in this case, the query optimizer might not have certain options just because we have changed the sort order of the table by breaking data into chunks based on a given column.

We may have to do some work with those queries when we partition something. And on certain things we may have to do some work arounds to make them faster again. They are just certain types of queries.