A common error with NOLOCK – and more alternatives (6 minutes)

Notes

Regarding Error 601, when I’m talking about modification to page “structure”, I mean which pages are in use by the index or heap. It’s always 8K pages in these disk based indexes that we’re looking at, I don’t mean to imply that the page structure itself changes.

Transcript

There is a common error you can get with allocation order scans: error 601

This error says: I could not continue this scan with NOLOCK due to data movement.

Back in the SQL Server 2005 days, I used to get this error all the time. It has become much less common, and I was doing some searching on it at one point. I came across this Stack Exchange question. In this Stack Exchange question, in a comment, Remus Rusanu, who at the time was working on the SQL Server engine group, explained that there are changes that the SQL Server team made in SQL Server 2012 that make this error much less common. It is still possible for it to happen, but it’s much less common in SQL Server 2012 and higher.

If you think about what we were talking about when we looked at that IM page, it has to do with: okay, even if I’m saying it’s okay if the data is garbage, there could be changes that are being made that impact {which pages the index is using} that make me confused, and okay, I don’t even know what page to go to next from the IM page. I might lose my place, for example, which is what that means.

When you are NOLOCKing a query, you have to consider that you potentially may need to catch and handle error 601

You may need to retry your statement if SQL Server just loses its place.

A recap of what we have talked about

When do we use NOLOCK? Well, ask yourself, is it okay if the data is just crap? And I actually use the crass term in this, because I think NOLOCK or read uncommitted is a pretty crass thing. Sometimes we need to be crass, and I think about, okay.

What if the CTO or the CIO of this company, what if they see wrong data in this and they ask me about it? Is that going to be okay there? Also, does using NOLOCK here solve a specific, important problem? Does it have a reason? What often happens is that, in cases where NOLOCK solves one problem, it starts being added to lots of things as preventive medicine. And it seems like, okay, what’s the big deal? Because people don’t understand that the data can just be wrong. Then later, when weird data starts showing up, it’s very hard to troubleshoot and there’s a lot of code that has this in it.

Also, there’s an alternative I’m going to talk about called read committed snapshot isolation

Read committed snapshot isolation affects your default database isolation level and can help fight blocking.

There are certain race conditions you have to worry about, but the thing is, even if you enable this, NOLOCK hints are still honored, and you’ll still get dirty reads because SQL Server says: oh, well you want to do a dirty read! Here, I’ll do it. You’ve hinted it. Yeah, your default isolation level is optimistic, but you’ve said you want dirty reads, so I’ll do that for you.

So don’t use NOLOCK “just in case.”

I always think about is there a cleaner solution to the problem?

Here is a list of cleaner solutions.

Instead of using NOLOCK, instead of using dirty reads, I would rather, if my problem is blocking in deadlocks, I would rather solve or prevent the problem with indexes that help the queries get to the data in a targeted efficient way. The faster our queries are, the less surface area there is for blocking. If I hold my locks for less time, or if I need to read for less time, potentially I can reduce locking.

Optimistic locking, with either read committed snapshot isolation, or snapshot isolation. There is versioning overhead for these. For snapshot isolation, you do have to explicitly change isolation level for, say, your reports, and say: I want you to use snapshot isolation level. For read committed snapshot isolation, you have to make sure you don’t hit race conditions. That’s a whole ‘nother course on okay, how do I use these? It’s more of a larger, longer term solution, but it can- because in optimistic locking, readers don’t block writers and writers don’t block readers- we can get consistent results without all this blocking. So it can absolutely be worth the work to get that in there instead of just NOLOCKing everything.

If I want the allocation order scans, I would much rather use a TABLOCK hint. If you are using optimistic locking, if you are using RCSI, you may need to use a TABLOCKX hint. There are details on why that is in Paul White’s blog on the allocation order scans, it’s a great post.

Whenever it is customer data that we are using NOLOCK on, I always want something signed off from a very important person saying yep, it’s okay if this data is just plain wrong. It’s okay for that situation. And I have actually gotten that signature before in some cases, where it really legitimately is okay. That very important person will review it and say, and they’re usually glad to be asked, they’re very glad that if there is a question about returning inconsistent data, that you’re having it validated at higher levels of the company, because it isn’t trivial when your customers are seeing it.

Thanks for taking this course about the dirty secrets of NOLOCK. If you’ve got questions or comments, I would love to get them on the course lessons, and I would love to hear feedback from the course in the survey as well. Hope to see you in another course soon!

Back to: The Dirty Secrets of NOLOCK (50 minutes) > Learn

5 Comments. Leave new

  • Great course as always

    Reply
  • NIck Fairway
    July 9, 2020 10:41 am

    Hi Kendra. I have been a SQL developer since 1996 and have generally hated the liberal use of NOLOCK, and developers refusing to remove them. This course helped to cement my view – and I even learned a few things on the way! Thanks

    There is one related issue which I have never got around to proving but it is to do with scans and NOLOCK and I wanted to put this past you. Is it possible that if one process is doing a read uncommitted table scan e.g. for the SUM of a numeric column and another process does an insert or update which causes a page split (This could even be an update on another [varchar] column on the table to the one you are summing) that the table scan could either read that page twice or not all depending on where it was in the scan? I have seen 1 occasion where incorrect results certainly pointed to this, although I did not manage to recreate it at the time.

    Reply
    • Hi Nick,

      Yes, modifications can cause NOLOCK to miss rows or read rows twice based on the scenario you describe. Even without a page split, an update can physically relocate a row in the index being scanned and cause this.

      The same phenomena can also occur under the default implementation of read committed, because read committed tends to hold granular locks on a row or a page and the data movement can occur elsewhere in the index being scanned.

      Hope this helps!
      Kendra

      Reply
      • NIck Fairway
        July 9, 2020 2:17 pm

        Thanks Kendra.

        When I mention to Devs and even DBAs (who tend to care more about the performance than the accuracy of their database systems) that NOLOCK is ‘Sacrificing accuracy for the sake of performance’, they usually respond with something like ‘We rarely rollback a transaction so it is right almost all the time, and the system has to perform all the time’. It is very hard to get people to change their habits!

        Reply

Leave a Reply to wondi wolde Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Menu