The fast alternative to NOLOCK: indexing (5 minutes)

Transcript

What’s an alternative?

Let’s look at the query that we’re doing, summing up the TotalNameCount.

I’m going to do Ctrl + L to look at the query execution plan for what we’re doing to sum up TotalNameCount. Ref.FirstName is a very small table, it does not have many pages at all. We are scanning the entire clustered index of that.

The clustered index of the table is keyed on a column named FirstNameID, so all the data is ordered by FirstNameID, not by TotalNameCount. But for every row, there’s a column named TotalNameCount. And to do a sum without a WHERE clause, we do have to look at every row. But… ref.FirstName has far more columns than the TotalNameCount column.

One thing we could do is give it something smaller to scan

I’m going to create an index on the TotalNameCount column. This isn’t going to produce a seek because I still have to look at every row to do this sum, but instead of looking at the base table itself, which has many columns, I am looking at a non-clustered index.

Now this non-clustered index is still being touched

I have actually added a structure that is another structure that has to be updated, because what’s being updated here is TotalNameCount– and when this is happening, SQL Server has to update TotalNameCount in all the indexes in which it’s present.

So I haven’t really created a completely separate path, but I’ve created a smaller, faster thing for me to scan.

The idea here is, sometimes, by indexing, I can reduce blocking

Sometimes I can, depending on what columns are being used, sometimes I can eliminate blocking by creating an index!

But in this case I can just make the blocking faster by speeding up my query who does the reads.

Or, I can reduce the duration of the blocking by giving a better index to my query who’s doing the reads. Let’s see if we can prove this.

I have my query, no NOLOCK hint this time, we’re back in read committed, and we’re going to do another 200 runs. Looking in the old Activity Monitor at our waits, I do have some lock waits here. But the lock wait times are less than they had when I was scanning the clustered index, I had more work to do. I had more pages to read. I had a longer period where I could be blocked.

And sure enough, if I go back here, my duration this time was 13 seconds, in read committed. We were getting 17 seconds before {the index was created}. So read committed with a good index took 13 seconds (seven seconds if my host isn’t using a lot of resources and creating some chaos in my guest).

If I look at the data I counted in read committed, I get correct information

200 times, I counted the same name count.

Now full disclosure: If we have the non-clustered index and I use the NOLOCK hint, it’s going to be even faster, right? Because I’m saying “Okay, you have fewer pages to read” AND I’m going to use a hint. So I’m going to go back up, and I’m going to say, “I want to run the query with NOLOCK” and this time, it can choose that better index. I’m going to go ahead and re-run that again. And now I’m saying, “Okay, just read the data however it is, and you’ve got an index.” And now this was just six seconds. Yes, saying, “I don’t care if the data is correct” is going to make me fast.

But when I look at what data did I get, because I used the NOLOCK hint, I may be returning a TotalNameCount that is wrong, and it may be that my CTO looks at just a report of “How are we doing?” and she sees a number that says “We have this many.” Then she runs the report again in five minutes and somehow now we get a number that completely conflicts, and our application, and our data, looks like maybe something is badly wrong in there.

So NOLOCK: the reason people like it is it really does make things faster. But it’s got this baggage about potentially, when we don’t expect it – because we usually don’t expect, “Hey, we hit an exception and this data isn’t committing” – then things start coming back with data that is not committed and consistent data.