Demo: Read Committed returning incorrect data (20 minutes)

Part of "Read Committed is Bonkers - Webcast Recording (46 minutes)"


We are now connected to a database– that exact database with those baby names.

I have restored a fresh copy of the database.

Before I create that index, we are going to look at the clustered index structure of the table itself

I’m looking at the sys.dm_db_index_physical_stats dynamic management view. You might use this sometimes to check fragmentation in a table.

I am running it in a detailed way as they look at all of the pages of the ref.FirstName object. It’s in the current database. And this 1 here is the index ID. 1 if a table has a clustered index 1 is always the index ID of that. So I’m saying look at the clustered index. The NULL here is for partitions– It’s not a partitioned table, so look at everything– it by default it has one partition, I haven’t made it any fancier.

Look at all the pages and tell me, how many levels this index has, how many pages are at that level, and how many rows do you have. Looking at just how our clustered index is structured, this thing isn’t very deep.

This index just has two different levels. We have a root page, which is exactly one page, and it knows about 471 FirstNameIDs. It indicates where those 471 pages are. It knows about 471 pages of FirstNameIDs and it knows the FirstNameID that starts on that page. In the leaf of our table we have 96,174 rows: that’s the table itself.

We’re going to create a nonclustered index, and we’re not putting all of the columns from the table in it

Just like in that slide, I’m just indexing FirstName. We’re going to look very similar query at the table, but instead of looking at the clustered index, I have a subquery in here that says I want to look at the index_id for the index I just created who has this name. Now I’m looking at the physical structure of my nonclustered index as shown by sys.dm_db_index_physical_stats and for my nonclustered index it is smaller because it has less information in it. It doesn’t keep first year reported and last year reported. Again it’s just two levels. We don’t have intermediate levels in this guy.

We have one root page, we have 247 pages in the leaf of the index that has all of our first names on it. All right, so I’ve got 247 pages that have an alphabetical list of my first name.

What we’re going to do is we’re going to move one name back and forth in this index

We’re going to bounce it between the left side of the index and the right side of the index. And of course these are all on pages. They aren’t really left and right, but when we picture how the pages are linked together I think it’s easier to picture left and right. Let’s look at a little bit of the data. And I’m running a query SELECT FirstName FROM ref.FirstName.

Let’s actually run it this time. I turned on Actual Execution Plans. Let’s prove it’s using our index, our nonclustered index to look at this.

And yes, when I run this query, it SQL Server says oh you just want the first name. The easiest, the smallest item that can return that to me is this nonclustered index on first name. So I am going to scan that nonclustered index. How can I tell if its a forward or backwards scan? If I look in the Properties, and I use my Properties window a lot, so I tend to keep it on the left.

There’s nothing here that says the scan is forward. When it has chosen to do a backward scan it will say scan direction is backwards. So the lack of something who says scan direction equals backward indicates to me that it is a forward scan.

But it is reading from our nonclustered index. It is not choosing to read from the Clustered index itself.

Our first row in the table, lets bring our results back, alphabetically, now I haven’t put in order by in this query, I’m just saying, let’s go back to our query here. If I don’t write order by I’m not guaranteed order. I happen to have the data ordered, but that is true that in SQL Server if you don’t specify an order by you are not guaranteed that your data will be returned in that order. But it happens to return it in order. I get lucky this time and my first name is Aaban and if I go to the very end of this list my last name is Zzyzx. (Data entry error or real last name, I don’t know. I do like some of the names that start with Z though, I think those are actually pretty cool.)

All right, we’re using that first name.

When we update a FirstName, it moves to a different physical page in our nonclustered index

What I want to show is– when I update this first name, Aaban, A-A-B-A-N, when I change it to another name I want to show you that it moved from the page at the beginning of the index, it physically moves to another page.

This is a disk based row store index. In other words, a classic index. It’s not some fancy Hekaton index. This is a disc based rowstore index and the way that works is if I update one of these names, it will, in my indexes where that name is present, it may have to physically move if it impacts the index structure.

And this time it does impact the index structure because I say in this index, it is ordered by first name.

So if I change the first name A-A-B-A-N to Z-Z-Z-A-B-A-N it is going to have to move to the end of the index. And we can prove that that happens. Right now our name is A-A-B-A-N.

Using PhysLocFormatter to read the page number for the row

Now I’m going to run this query using an undocumented command in SQL Server. We’re using this PhysLocFormatter function in SQL Server. This isn’t documented. It’s not for production usage, it’s just for geeking out on things like this. Because what I’m saying is I want to know physically where is the row located that you’re showing me. So for the row where the first name is either A-A-B-A-N or Z-Z-Z-A-B-A-N, right now its A-A-B-A-N. Show me physically where it is.

And right now it’s in data file 1. I just currently have one data file in this database. It is on page 9832 and it is, there’s little slots that actually indicate, okay where on the page is this located, and right now it’s in slot 0. So that is where it’s physically located now. And I’m going to copy that out and note it down. So we’re going to copy that with headers and we’re going to paste it fresh here, just to see here’s where it’s located now. I am now going to go ahead and update the row and move it to the end of this rowstore nonclustered index and SQL Server very quickly is able to do that.

The execution plan for the update

Looking at my execution plan we’ll see that SQL Server did have to update more than our index. First SQL Server said okay, I’m going to do an index seek. I need to find the row. Because I’m saying where the first name equals A-A-B-A-N. And it says okay, is there a row, I need to identify, is there a row to update. It’s not just going to assume. So it finds that row and then it does an update here. This does say, it’s interesting when we look at this, right, it says Clustered Index Update and you’re like oh, well, clustered index update? What, doesn’t it have to update the nonclustered index too?

If we hover over this and we look at the bottom of this, it’s got two index names in here. This is what’s called a narrow update plan. It is sometimes able to, sometimes decides to say, oh, I just am going to represent all of the indexes I’m updating in this plan in one operator and if you look at what the object actually is, this is our clustered primary key, the table itself, and this is our nonclustered index.

Wherever this name is physically stored, it has to update them. And in our nonclustered index everything is ordered by that name. It determines which page it is on and it actually has to move the page. If I check now and say okay, for any row with one of these names we have changed the name, this row is now named Z-Z-Z-A-B-A-N. It is still in data file 1 but it on page 10131 in slot 12.

This used to be on a totally different page. It had to physically move it as part of that update operation.

Here’s where things get interesting

Because of this physical movement and the fact that as read committed rolls through a table it only locks what it’s reading at the moment–whether it’s doing a page level read or a row level read– this means if data is moving around, and I’m only locking what I’m seeing at the moment, I can see data more than once.

Or I can miss data, because I’m only ensuring that I’m reading committed data at the very moment that I’m doing the read.

Let’s go ahead and update the row again

We’re going to move it back. Now I have made a change and it’s back to being A-A-B-A-N again. If I look at what page has the row now, oh, I have just rerun the update. I’m like oh, it’s still named Z-Z-Z-A-B-A-N. Let’s run the correct update this time. That time, so what I did that time, let’s actually talk through what I did there. I reran an update statement and in fact, let’s show the one I actually did, it did not find a row to update that time because the name was currently Z-Z-Z-A-B-A-N. It didn’t update any rows so the row didn’t move. It stayed exactly where it was.

But this time if I update the row and switch the name back to A-A-B-A-N, now if I look, where is the row now? It is now in data file 1, 9832, slot 0. And this is in fact exactly where it started out. It was able to be like, oh, there is in fact room on this page, where you used to be, and I can put you back there on that page this time. So there was room in it’s old spot back on that page.

Let’s start bouncing this row back and forth

What, we’re going to do in this demo, is we’re going to take this code and we’re going to start it up in a session of its own, we’re going to uncomment it, and what this code does is it just runs over and over again.

I’m running the undocumented GO number command here to just say run this a whole bunch and every time it runs it’s going to run to this batch, the first statement in the batch says okay, move it to the right end of the index, move it to that end of the index, and the next command just moves it back. So this row is going to be bouncing back and forth in this index, this nonclustered index. Of course, it is also getting updated in the table itself, but our query, who we will be testing at the time, is going to just be looking at this nonclustered index.

We’re going to count rows at the same time

I’m doing Control + L here to look at estimated plan. This is a query that just scans the nonclustered index on FirstName. What we’re going to do back in this session is we’re going to run a loop to test. We’re going to create a little table named NameCount. And NameCount just records how many queries are counted as being in ref.First.Name. If I run this right now there are 96,174 rows. That is the actual name count of this table. I’m going to turn off my Actual Execution Plans here. That’s important for running this loop.

We’re going to count 10,000 times just how many rows are in the table.

Notice that we are not inserting or deleting any rows

There is always going to be the exact same number of names in this table. I’m just changing what one of the names is. But I am not adding anything, I am not deleting this to anything, I only running update commands that don’t, they just change what the name is. So while those updates are happening, we’re just going to keep counting, 10,000 times, how many first names are in this table.

We’re going to start our updates here. And our updates are running. And while our updates are running we’re going to go ahead and create our NameCount table. And then we are going to go ahead and run this loop. It takes around a minute to finish. So I did turn my execution plans off, that massively slows down the experiment.

And what is happening while I’m doing this is my update command is having to take out locks to modify the data while it’s moving it. But it can just go and say okay, I’m going to lock you and move it, go ahead and say okay over here now we have this name and the index, and then release all the locks. It doesn’t have to scan the index while it’s doing the update. It doesn’t have to scan the whole nonclustered index because in seek, find the row, do the locks, figure out where to go it, lock there and then let go.

But our read query, who’s counting, is having to scan the index to count it

When we do a count star, SQL Server can’t just rely on metadata because metadata, things like statistics, they describe the data but they aren’t guaranteed to be correct at all. Because keeping them completely up to date and completely correct would be very performance intensive. So things like statistics, not guaranteed to be correct. When we are counting we are scanning that nonclustered index.

SQL Server is smart enough to, when it’s figuring out what am I going to do to count these names, it is smart enough to say okay, well, I could scan the clustered index of the table or I could scan the nonclustered index and it would be faster to scan the nonclustered index because it’s physically smaller.

It looks at the page count and says there’s fewer pages to count. We did have some blocking necessarily while that happened under read committed because these locks are being used to protect it and because somebody’s going to wait for a lock to be available, our reader would sometimes block the writer, our writer would sometimes block the reader. So our loop finished there. It took about 90 seconds this time. And it varies, kind of depending on timing.

What did it see?

Well, just a reminder there were always the same amount of names in this database. We didn’t insert rows, we didn’t delete rows. It got the correct count, we ran this 10,000 times. It got the correct count less than, it didn’t even get it more than 50% of the time. It often saw that there were fewer names, and it often saw that there were more names.

Both of these are completely wrong

We got incorrect results an awful lot. And the thing is, nothing in this query said hey, this is not the right result. It had no idea that those were not the right result. I’m going to go ahead and stop our update there, our work there is done. But because of the way that read committed only locks the rows it’s with, or the pages it’s with depending on how it’s actually running it, data changing, which is very normal for data to change, may cause incorrect results, and how would your user know that they had been given something that is just wrong?

So that’s a big problem.

Read committed has really, really big problems

Which are very, right, that was a very simple demo. That was really easy to demonstrate it being wrong a lot. So here’s how this happens in a disk based rowstore index. Because that’s what our demo was on today. A disk based nonclustered index using rowstore technology. In other words, the indexes we’ve been using and loving for years in SQL Server. We started counting the names and the count is scanning the leaf of that nonclustered index. And meanwhile the update comes in and it changes the name of that baby. And it changes it to a value that is physically located, this happens sometimes, ahead of us in the index.

When it changes the name from Aaban to ZZZaban it physically moves from one end of the index to the other. And this can happen while we are in the middle of our scan. So we saw it and counted it when it was Aaban. And now we saw it and counted it again when it’s ZZZaban. We counted the same name twice.

Running through that again, this can happen easily any time

It’s very common for index scans to happen while something is in the middle of an index scan, if it’s using read committed, data can move and can be seen twice.

What if this was some index that described money in different accounts, based on an account name, and the account name got changed?

We could show that for a group of accounts we have a completely incorrect amount of money in them. This is not a trivial issue at all because now I have just seen the same information twice but I don’t know that it was, I have no way to know that that was the same row that has been moved. Similarly, sometimes I’m doing my scan and when I’m here, the baby is currently named ZZZaban, right, the name is currently starting with a Z at the very end of that index.

There is no Aaban where the bird is right now. So it doesn’t see Aaban. It doesn’t count Aaban. And when I’m in the middle of my scan the update happens and ZZZaban becomes Aaban. So by the time I get to Zzaban, I don’t know that at the time I started my read there was a row there, because it’s just not there anymore.

The update has said oh, no, no, no, it’s not here anymore, that’s not valid data, don’t read that. And in this case I completely miss it. So, if I had an index that was based on account name and had say, included things like money in the account, and the account got named, this time I would have completely missed the money and would have come up short on that. Can be a really big problem but I would believe the data. Unless I had contradictory data from a different query in front of me I would just believe it was right. Now this is just one of many, many examples of incorrect data in read committed.

In the sort of more in depth version of this, we look at the fact that columnstore indexes are not immune to this problem

You can hit very big problems with inconsistent data in columnstore indexes so just using columnstore, if the data is being updated, right? If the data is read only, then it’s pretty consistent. But if the data is being updated you can also hit inconsistencies in columnstore with read committed.

There are also other issues where even just running a single query can return combinations of data that never existed in the database

If we’re using joins, depending on the kind of query execution plan we’re getting, read committed can actually see, when locking and blocking happens, it can see and return combinations of data that actually never, never were written in the database. So it can return what I would describe as an even garbagier form of garbage data. Even, less correct, and it’s because of the way that read committed uses blocking and the fact that we can have different branches of a query execution plan reading data for the same query.