A test: what if I update fewer rows? (6 minutes)

Part of "The Case of the Slow Temp Table: A Performance Tuning Problem (50 minutes)"


In the demos in this session, I use EXEC WITH RECOMPILE to test and get a fresh plan for the procedure. Note that if you are using nested procedures or dynamic SQL, this only forces a recompile on the outermost “layer” you are calling.


One thing I wondered: I looked at that table variable plan, the one that is really fast and I said, “What if I update fewer rows?”

One of the things with table variables if it doesn’t have that recompile hint is it way underestimates how many rows are in there.

So I wondered: instead of updating 800,000 rows out of a million rows, what if I only update 8,000 rows?

It is going to be faster just because I’m doing less work, but am I going to see that extra IO in the execution plan?

Let’s test here and see.

First, we’re going to update our user database table with @RowID=8,000

I’m saying execute with recompile because I don’t want to reuse the plan that’s in cache for 800,000 rows, I want you to come up with a new plan… and also don’t leave it around for anyone else to reuse. Just give me a special plan. That’s why I’m doing exec with recompile.

This takes 78 milliseconds to do the update. Of course we’re updating far fewer rows, so it makes sense that this used to be 1.5 seconds.

Looking at my execution plan for the update, the user database table is now getting a narrow plan.

Now we have a clustered index update. We don’t have a separate operator finding the rows, we have a clustered index update that does have a seek predicate on it to find the 8,000 rows on it.

Looking at how much IO it does, the properties of our operator, we’re doing 538 logical reads to find those 8,000 rows. This looks like it’s going to be our fast case again, right?

Let’s see how the temp table does

Running it again with recompile, and this it doesn’t take very long– we’re not dealing with that many rows– but let’s see if the IO characteristics are different. Oh, this is looking good. I got a nice, narrow plan here. I didn’t get one step to find the rows and then one step to update the rows. Sometimes that wide plan is fast, but previously for the temp table it was slow.

Looking at the properties of my clustered index update operator, we didn’t do millions of logical reads, we only did 537 logical reads.

This narrow plan it looks like our temp table is much less pathological, it’s not doing tons and tons of extra IO when it has this narrow plan, at least for the case of 8,000 rows, that’s interesting information.

I’m going to run a table variable test…

Get a nice, fresh plan for 8,000 rows… it also gets a narrow plan. It got a narrow plan before, so that’s not anything new. Looking at the properties and the logical reads, we are at 537 logical reads again.

What about when we use a recompile query hint on the update of that table variable?

Well we’re still at 79 milliseconds.

Oh this is interesting, we did go back to a wide plan, with that recompile hint.

When it came to the temp table it was like, “Okay I’ll go with a narrow plan,” but for the table variable {with recompile} we have clustered index seek here to find the rows broken out as a separate step.

Our actual IO statistics, it’s not just 537, that’s 24,537! Then on the clustered index update, are we doing more IO looking at the IO there? Oh we’re doing another 24,000 logical reads there.

This is still pretty fast because we’re only dealing with 8,000 rows, but the pattern of extra IO that at some scale leads to slowness, that pattern is still happening for the table variable when we update it with a recompile hint.

What if I only update one row for the table variable? I’ve still got my recompile hint in there, I’m not doing 8,000 rows, I am simply updating a single row. I still get my wide plan for that. And let’s just for fun let’s look at the IO that we see on it. For the clustered index seek my actual IO statistics are seven logical reads there, and then you know that’s not that many but it kind of just interesting to see how many we get. And then we get three logical reads on the update as well. Definitely not too terrible to live with when you’re updating one row, but that pattern that at scale gets slower, and slower, and slower.

We have this inflated IO on multiple operators still happening even at a small scale when we’re using a table variable as well as a recompile query hint.