Expiring an Automatic Correction With Update Statistics (10 minutes)

WARNING: I do NOT currently recommend enabling the Automatic Plan Correction feature due to issues described in my post, Automatic Plan Correction Could be a Great Auto Tuning Feature for SQL Server. Here’s why it Isn’t.

Now another thing you might wonder is: we’re auto-tuning and this sounds great.

Performance was slow for a while, and it’s auto-tuned this, and it has turned out that it’s faster, but…

What if my data changes in a way that we could get an even FASTER plan?

What if my data distribution changes notably and with the different statistics on the object, if this was freshly optimized, we might get a whole new plan? Or maybe an index is added and we have a different opportunity.

For these reasons, automatic plan correction doesn’t just leave this in there forever. It can become unstuck even by just a statistics update, because it needs to see: Hey, could we do even better?

Statistics updates are frequently done as part of nightly maintenance by many DBAs, so if you think about this…

This could lead to a pattern where auto-tuning only really is in effect in certain times of the day, depending on how often the things that get auto-tuned are running– because it needs to get a certain sample set before it kicks in.

If you want to test this, you might just run UPDATE STATISTICS very simply and say, okay, I’m going to see if it came unstuck. I just updated statistics and now I’m going to even clear out the procedure cache in my database. It’s not just going to reuse the plan in cache. I’m just doing this for testing purposes, not saying this command is good to run against all your production databases. Now I’m going to run this, and I’m going to look at the actual plan. I turn on my actual plans, and I ran it, and let’s see, wait a second, that only took five seconds. I see a window aggregate operator. I got the fast plan!

Looking at the properties here on my select, ‘use plan’ = true.

My auto-tuning wasn’t removed by that UPDATE STATISTICS

Well, that’s for a reason. In SQL Server 2012, a change was made to how UPDATE STATISTICS impacts your server.

If data hasn’t changed in the table then the UPDATE STATISTICS doesn’t invalidate the execution plans, and it doesn’t invalidate this tuning. It’s just like, oh, you updated statistics, but you know actually, nothing’s changed. So, bless your heart, we’ll give you a nod and we’re just going to go about our business. To actually invalidate the tuning, I need to at least pretend to modify some data.

What I’m going to do here is fake delete some data. I’m deleting some data from the table here, but then I’m rolling it back.

Even this modification that doesn’t ever commit is enough to make the UPDATE STATISTICS command real.

In most of your production databases you are going to have modifications happening against the tables that count, unless you have a read-only type system. A lot of times if you do update stats on a nightly basis, it would have an effect and actually do something.

So, this is more of what you’d see in most cases.

Now that I’ve actually made UPDATE STATISTICS do something, I’m going to run my query again with threshold = null

Let’s see if this takes 3 seconds or 5 seconds. Errrrr, okay, we’re already at ten seconds. We are in fact, now that I have updated statistics, we are getting the slow plan again. Now that I ran this again, now that I’ve ACTUALLY updated statistics in a way that invalidated things, yes, my auto-tuning is gone, I am back to my slow plan, which took thirty seconds to run this time.

I’ve got my sequence project, my segment operators and my window spools again. What do the recommendations look like? Now that this has been reverted, looking at my recommendations again, I can scroll over and see that my current state is that is has expired.

It expired, due to: StatisticsChanged.

I love that it gives me this detail and says, I’ve got this recommendation still here, and here’s why it was meaningful, but statistics changed, so I can’t do it right now.

I have to keep watching things and see, do I really want to do this again?

What does my report look like now?

If I go into my queries with high variation report, hey, my checkbox is gone. Now, some of my– I’m looking at the last hour here, some of my data’s rolled off. What I’m going to do is, I’m going to go up here, I’m going to hit the configure button (which has decided to hide because of screen width) so I’m going to hit this configure button that I got to from the down arrow. It’s going to bring up a very large dialogue box.

Now, due to font size, I actually can’t see the okay button, so we’re going to have to take some things on faith here. I don’t want to just see the last hour, I actually want to set this to 2 hours. Even though I can’t see the bottom down here, we’re going to take a leap of faith, we’re just going to hit enter so that we can see more window. Oh, and we actually… let’s see we.. it doesn’t want to show me, I think we’ve had some aggregation go on here. It’s always fun when these reports… Let’s try this one more time! Let’s see what we can see. I’m going to try– especially when you can’t see the whole window, it’s always fun, so we’re doing CPU time and standard deviation, last twelve hours. When I look at the last twelve hours granularity, now I do see all 4 of them.

Notice that I don’t have any check marks

I can still see for this blue plan– if I look at the properties, and notice it’s still says not forced, but if I look at the properties on it, I can still see that in the past– let’s make sure I’m looking at, which plan are we on? This is the slow plan. It has low variation, so this plan was not forced. I’m going to briefly hover over some of these guys to figure out who is who. Okay, so this guy right here, the green bubble, I’m doing this, I’m doing the graph by standard deviation, not by, average duration. So, this green bubble now is the one whose average CPU time is just 2 seconds, it’s the fast plan.

The blue bubble, the light blue bubble, is the one whose average CPU time is much longer. It is the slow plan, because I’m graphing by standard deviation not by duration. It was hard for me to figure out who was who. So, here is our fast plan and we can still see it does say ‘not forced’ there, but if I go back over to the properties for the T-SQL operator, this was the original one who was forced, so it’s not saying use plan is true. Now we’re going to go to the dark blue one… Who are you? Okay, you are also a version of fast plan.

I’m trying to see if I can find you a used plan is true. YES!!! Plan_id 15.

And I should’ve just remembered the plan_id. Even though it says not forced here, this one was auto-tuned: it says used plan equals true.

I actually, for looking at this, I think the reports can get, as you can see, they can get confusing! Really easily for me, they can get confusing. I like looking in the T-SQL as well. The report does at least, tell me quickly visually, is something currently being forced, by whether or not I see a check mark.

After that, when something isn’t being forced, figuring out the history there can be a little complicated because used plan = true may be present even when it says not forced down there. I find that a little frustrating.

John asks, are these recommendations just adjusting the plan or actually making suggestion to the SQL joins, etcetera?

It’s not rewriting the T-SQL, so in terms of thinking about how optimization happens, right? These plans, they do use different joins– they use different PHYSICAL join implementations. So, this plan here, the one that’s fast, that uses the window aggregate operator, it has a nested loop join. The one who is slow, which is the light blue one, it actually has a hash match operator. Those are both different ways, different physical joins that can be implemented for the exact same T-SQL, just depending on, in this case, it’s really a matter of, okay, depending on how many rows I think are going to come through here, do I really want to do a nested loop join and go have to look up stuff in ref.FirstName for every single one of those rows.

So it’s not rewriting the T-SQL, it’s saying: okay, this interpretation of the T-SQL, this plan that I optimized, I’m going to fit the query into this plan. It is still paying attention to those statistics to adjust the cost of the plan, and things like that– which is good, because if it didn’t pay attention to the statistics when it was forcing the plan, we would end up, quite possibly, with not enough memory grant, et cetera, et cetera.

So, it is forcing the plan, but with some sensitivity to, I need to allocate the right amount of memory for optimizing for this, or at least try to, if that makes sense.