Tag Archives | Isolation Levels

The Case of the Blocking Merge Statement (LCK_M_RS_U locks)

merge-troubles-rectangle

Recently I got a fun question about an “upsert” pattern as a “Dear SQL DBA” question. The question is about TSQL, so it lent itself to being answered in a blog post where I can show repro code and screenshots.

Here’s the Scenario in the Anonymized Question

We have a lookup table, which is the parent table in a foreign key relationship with a child table. The child table has lots more rows. When data comes in, we need to:

  • Check the Parent Table (the lookup table) to see if the ParentValue is already present.
  • If ParentValue is not present, insert a row for the new ParentValue into the Parent Table. This happens very rarely, but we need to check for it.
  • Then insert a row into the Child Table, using the ParentId.

The Problem: Blocking against the Parent Table

When run under a few sessions, locking and blocking issues were creeping up fast. A merge statement was being used to check for the values in the Parent Table and insert them when they weren’t present.

Let’s Look At the Code

This creates a database named upsert_test, then creates ParentTable and ChildTable objects in the database. Finally, a Foreign Key is created on ChildTable, referencing the ParentTable.

IF DB_ID('upsert_test') is not null
BEGIN
    USE MASTER;
    ALTER DATABASE upsert_test SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE upsert_test;
END
GO

CREATE DATABASE upsert_test;
GO

use upsert_test;
GO

CREATE TABLE dbo.ParentTable
(
    ParentId int IDENTITY(1,1) NOT NULL,
    ParentValue varchar(50) NOT NULL,
    CONSTRAINT pk_ParentTable PRIMARY KEY CLUSTERED (ParentId ASC)
);
GO

CREATE TABLE dbo.ChildTable
(
    ChildId INT IDENTITY(1,1) NOT NULL,
    ParentId INT NOT NULL,
    ChildValue VARCHAR(50) NOT NULL,
    CreatedDate DATE NOT NULL CONSTRAINT DF_Work_created DEFAULT (getdate()),
    CONSTRAINT PK_Work PRIMARY KEY CLUSTERED (ChildId ASC)
);
GO

ALTER TABLE ChildTable ADD CONSTRAINT FK_Work_Source FOREIGN KEY (ParentId) REFERENCES ParentTable (ParentId);
GO

Here’s the Upsert (aka Merge Query)

A stored procedure is used to handle incoming values.  It uses MERGE to look for matching rows in ParentTable, and insert when not matched.

IF OBJECT_ID('dbo.DoStuff') IS NULL
    EXEC ('CREATE PROCEDURE dbo.DoStuff as RETURN 0;');
GO

ALTER PROCEDURE dbo.DoStuff (
    @ParentValue varchar(50),
    @ChildValue varchar(50)
)
AS

    MERGE ParentTable with (HOLDLOCK) as p
    USING (SELECT @ParentValue NewParentValue) as new
        ON p.ParentValue = new.NewParentValue
    WHEN NOT MATCHED THEN
    INSERT (ParentValue) VALUES (new.NewParentValue);

    INSERT INTO ChildTable (ParentId, ChildValue)
    SELECT p.ParentId, @ChildValue
    FROM ParentTable p
    WHERE [email protected];
GO

Why is that HOLDLOCK Hint in the Merge Query?

My reader quite rightly used this hint in their merge query. Although MERGE looks like a single query, it’s actually just “syntactic sugar”. Behind the scenes, merge can be implemented as a select and an insert in two separate commands. Developers are advised to use HOLDLOCK to avoid race conditions with MERGE.

I asked one clarifying question — was the lock wait type they were seeing “LCK_M_RS_U”?

It was.

This confirmed that HOLDLOCK and merge were slowing them down instead of helping them.

Let’s Populate Some Rows for Testing and Reproduce the Blocking

exec dbo.DoStuff @ParentValue='Stuff', @ChildValue='Things';
GO
exec dbo.DoStuff @ParentValue='MoreStuff', @ChildValue='MoreThings';
GO
exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings';
GO

exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings x 2';
GO
exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings x 3';
GO

/* Create 5000 more ParentValues */
SET NOCOUNT ON;
DECLARE @namevarchar varchar(50), @i int=1
BEGIN TRAN
    WHILE @i <= 5000
    BEGIN
        SET @namevarchar= cast(RAND() AS VARCHAR(50));
        EXEC dbo.DoStuff @[email protected], @ChildValue='Whatever';
        SET @[email protected]+1;
    END
COMMIT
GO 

To see the blocking issue, just run the following code in three session windows at the same time. Note that we’re running this over and over with the same ParentValue, and the ParentValue of “Stuff” is already in the table. This will not have to insert any rows into ParentTable.

SET NOCOUNT ON;
exec dbo.DoStuff @ParentValue='Stuff', @ChildValue='Things';
GO 1000000

Here’s what the blocking looks like in Adam Machanic’s sp_WhoIsActive:

Range Lock Waits

HOLDLOCK = Serializable Isolation Level = Key Range Locks

The holdlock hint is a way to get serializable isolation level in SQL Server for a specific table, without having to change the isolation level for your entire session. Serializable is the highest isolation level in SQL Server using pessimistic locking.

When you “HOLDLOCK”, you tell SQL Server to protect any rows you read with a range lock– just in case someone comes along and tries to change one or sneak  one in.

That means that even when you’re just reading ParentTable and not inserting a row, you’re taking out a key range lock. You’re willing to fight other users over those rows to protect your statement.

There’s two parts to getting around the blocking and making this faster…

Index The Parent Table (Solution Part 1)

Currently, the only index on ParentTable is on ParentId.

Even if ParentTable is tiny, if we’re frequently accessing the table and looking up a ParentValue, we’ll benefit from creating a nonclustered index on that column. We should also allow only unique values into ParentValue for data integrity reasons. A unique nonclustered index is exactly what we need:

CREATE UNIQUE NONCLUSTERED INDEX ix_ParentTable_ParentValue on dbo.ParentTable(ParentValue)
GO

In my simple three session test, this makes the merge statement very efficient, and performance goes way up. You can no longer catch those LCK_M_RS_U waits in sp_WhoIsActive. However, I’m still concerned about them, and would still…

Ditch the Merge (Solution Part 2)

The “merge” command in SQL Server is often a let-down for folks. The syntax is confusing, most folks find out about the race conditions/concurrency issues the hard way, and the biggest problem is that it often seems “better” than other TSQL options because it was introduced as an enhancement in SQL Server 2008… but it isn’t always the better choice.

In this case, ditching the merge gives me more granular control of when I want to use that high level lock on ParentTable. The code is longer mostly because of a lot of comments.

ALTER PROCEDURE dbo.DoStuff (
    @ParentValue varchar(50),
    @ChildValue varchar(50)
)
AS
    DECLARE @ParentId INT;

    /* ParentId is very rarely new, so check for it first with only a shared lock */
    SELECT @ParentId=ParentId
    FROM dbo.ParentTable
    WHERE [email protected]

    /* Insert the new value if we have to. */
    /* Use the SELECT WITH UPDLOCK in case of race conditions */
    /* Get the new ParentId so we don't have to rejoin back to the table */
    IF @ParentId IS NULL
    BEGIN
        DECLARE @OutputVal TABLE (ParentId INT)

        INSERT dbo.ParentTable (ParentValue) 
        OUTPUT inserted.ParentId INTO @OutputVal(ParentId)
        SELECT x.newval
        FROM (SELECT @ParentValue as newval) as x
        LEFT JOIN dbo.ParentTable as p WITH (UPDLOCK, HOLDLOCK) on 
            x.newval=p.ParentValue
        WHERE p.ParentValue IS NULL;

        /* We are only ever going to have one row in @OutputVal */
        SELECT @ParentId=ParentId
        FROM @OutputVal;

    END

    INSERT INTO dbo.ChildTable (ParentId, ChildValue)
    SELECT @ParentId, @ChildValue;
GO

In our scenario, it’s rare for new ParentValues to come in. So I’ve used a pattern to try to use as many shared locks against ParentTable as possible, stick with the Read Committed Isolation level, and still protect against race conditions. The pattern is essentially this:

  1. Check if ParentValue already exists (shared read locks only)
  2. If this is the rare case that a ParentValue does not exist…
    • Insert the row into ParentTable
    • Protect against race conditions by inserting with a SELECT joining to ParentTable with UPDLOCK, HOLDLOCK in case the same new row happens to come in on two sessions at the same time
    • Use the OUTPUT clause to get the new @ParentId so we don’t have to join to ParentTable again in the next step
  3. Insert into ChildTable

Rough Comparison: Do These Changes Help?

I didn’t do thorough load testing on this. I just ran the call to dbo.DoStuff above in three sessions in a 4 core VM on my MacBook and looked at the BatchRequests/sec performance counter in SQL Server. Here’s what I saw:

Setup Batch Requests/sec with No Nonclustered Index on dbo.Parent Batch Requests/sec with Nonclustered Index on dbo.Parent
Merge 313 avg  4000+
Insert with Left Join 495 avg  4000+

In this test case, adding the nonclustered index makes a bigger difference than changing the TSQL. But I would still move away from merge, because I want to be able to control when anything tougher than a read lock is being taken out against ParentTable — that’s very attractive since new values come in rarely in this case. The more concurrent sessions that are running this, the more that will help.

Don’t Forget to Handle Errors

Error handling is important! The code in this post doesn’t have it for simplicity reasons. If you need a starter guide for error handling, check out Erland Sommarskog’s excellent whitepaper.

Further Tuning Thoughts

This code can be tuned further, but I’d want to set up a really clean load test using application servers (not SSMS) against a proper instance (maybe NOT on a MacBook). I would look at:

  • Whether validating ParentValue could be done in memory in the client application tier. Avoiding constant data access against dbo.ParentTable is attractive if that’s possible.
  • Wait statistics during execution to point to the next thing to tune.

What Do You Think of Merge?

Do you love or hate merge statements in SQL Server?

How to Choose Between RCSI and Snapshot Isolation Levels

reads-vs-writes-RCSI

SQL Server offers two flavors of optimistic locking for traditional disk-based tables: Read Committed Snapshot Isolation (RCSI), and Snapshot Isolation. They are each great tools to reduce blocking and make applications faster, particularly in databases that have lots of tiny reads and writes that need to be quick, or mixed-use patterns (lots of little reads and writes + larger reporting queries).

Both isolation levels are available in SQL Server Standard Edition and Web Edition as well as Enterprise. That’s right, they’re cheap and easy. Each of them are controlled by database level settings, and default to disabled for new user databases when you install SQL Server and leave the default settings on the model database.

When should you pick one or the other? And when might you enable both?

Cheat Sheet for Snapshot and RCSI

Lots of details are below, but you don’t have time to read all that. Here’s what you want to know:

#Scenario Read Committed Snapshot (RCSI) Snapshot
1Existing OLTP application databases that have report queries running against them No. Risk of race conditions unless you test everything. Yes (for reports)
2New OLTP databases with fresh code you're writing Yes. If you're not going to test then you'll have incorrect data anyway. Yes (for reports that contain multiple queries and need data to be consistent from a single point in time)
3Replication subscriber databases with big reporting queries or ETL scans  Yes  Yes (for reports that contain multiple queries and need data to be consistent from a single point in time)
4Vendor application databases Ask the vendor if they support RCSI. If they don't understand the question then ask if they also write their product for Oracle or Postgres. (If so then they probably support RCSI.) Not unless their install instructions specify it. (Requires code changes.)
5Pure data warehouses No No

And there you have it!

Read Committed Snapshot Isolation (RCSI): Your New Default Isolation Level

As soon as you enable Read Committed Snapshot on a database, modifications start noting timestamps on the rows in your database of when the changed happened. Modifications where there’s a ‘previous’ version (data that was deleted, or the ‘before’ value in an update) are stored in the row versioning store in tempdb.

After this is enabled, all your queries automatically start to use those little versioning timestamps in the database– no code changes required! If someone else is actively changing data, the queries automatically go use those versions in tempdb instead of waiting for the lock to be released.

Under RCSI, readers cannot block writers, and writers cannot block readers. That’s extremely powerful, and it’s awesome.

RCSI really shines for these scenarios:

  • Databases behind websites and busy user applications where there are frequent updates plus a large amount of reads (OLTP, or OLTP mixed with reporting)
  • Replication subscriber databases where replication is pumping in frequent changes, and readers are running reports or scanning data for ETL

Enable RCSI with ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON

Here’s the basic command:

ALTER DATABASE SQLIndexWorkbook SET READ_COMMITTED_SNAPSHOT ON;
GO

But that may not work. To enable RCSI, you must be the only active user in the database. That means you have to basically kill off everyone else temporarily. This command can do the trick, but be very careful:

ALTER DATABASE SQLIndexWorkbook SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE;
GO

The ‘WITH ROLLBACK IMMEDIATE’ part of the command essentially kills everyone else in the database. Immediately.

Except it might not be so immediate if they were doing a large piece of work that has to be rolled back. All rollback is single threaded. Depending on what they were doing, that might be a long wait.

You can check current activity in the database with Adam Machanic’s sp_WhoIsActive.

The Problem with RCSI: Race Conditions and Incorrect Data

There’s one big problem with RCSI for existing applications. It isn’t related to performance — yeah, RCSI will increase use of tempdb, but it’s pretty easy to speed up tempdb using SSDs in many cases, and really tempdb is primarily used for updates and deletes, not inserts. You can solve tempdb issues fairly quickly.

The problem is worse, because it’s about incorrect data, and it’s not a bug. Your application code may have been written to be dependent on pessimistic locking, and enabling RCSI can change your application’s logic.

Let’s look at an example using pseudo-code. Let’s say our company sells bunny slippers online. Right now we’re running a huge coupon-based sale where bunny slippers are 99% off, but only for the first 100 customers.

Our code follows this pattern:

  1. Select the number of coupons used so far from dbo.Coupons
  2. If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon
  3. [Proceed with checkout…]

Before You Enable RCSI (Read Committed)…

Under the default read committed isolation level, here’s how this works:

#Pattern Locking Impact What if Someone Else is Updating?
1Select the number of coupons used so far from dbo.Coupons Shared locks for reading Will be blocked and wait if coupon is being updated
2If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon Update locks Will be blocked and wait if someone else is updating

After You Enable RCSI…

The pattern changes in a very important way:

#Pattern Locking Impact What if Someone Else is Updating?
1Select the number of coupons used so far from dbo.Coupons Shared locks for reading Will NOT be blocked if coupon is being updated. Will read the committed count of coupons in version store and proceed to the next step immediately.
2If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon Update locks Will be blocked and wait if someone else is updating (But it's too late! We may have already read an incorrect count.)

If you turn on RCSI without testing, things might be fine at first. But when things get busy, if you hit this race condition, then you could use more coupons than you were supposed to.

This could be a very expensive mistake if you’re selling more than bunny slippers.

For existing applications,  you have to ask:

  • Was this application written to also support Oracle or Postgres? Those use optimistic locking, so if so, everything is probably fine.
  • What are the risks that we’ll hit a race condition like this?
  • What should we test to mitigate the risks?
  • Who’s going to do the testing, and what bribes gifts do they accept?

If the risks are high and you can’t do enough testing, never fear: SNAPSHOT isolation might still help

Snapshot Isolation: When You Can’t Test Everything, or Need Transaction Level Consistency

Snapshot Isolation is fantastic for existing applications, because it doesn’t automatically apply to every query that runs. And for many existing applications, the problem is really that some long reporting queries block little reads and writes, or vice versa. So all you have to do is tell the reports to use Snapshot Isolation.

The other great use for Snapshot Isolation is multi statement reports. Sometimes a report has multiple areas — there’s more than one chart, or a diagram and then a chart. And if they’re not consistent with a single point in time, the report can be wrong (revenue doesn’t add up between them, etc).  Snapshot isolation helps with this: if all the queries in your report are wrapped in a transaction, they will all be consistent as of the time the transaction began.

RCSI does not help with this problem. Under RCSI, all individual queries are consistent with the time that query began, regardless of whether they are wrapped in a larger transaction or not.

Enable Snapshot with ALTER DATABASE SET ALLOW_SNAPSHOT_ISOLATION ON

You need to first enable Snapshot for the database, like this:

ALTER DATABASE SQLIndexWorkbook SET ALLOW_SNAPSHOT_ISOLATION ON;
GO

Good news, there’s no need for any ‘ROLLBACK IMMEDIATE’ on this one. Snapshot isolation is the smarter child in the family. It will go into a pending state and wait for any running transactions to finish up, then switch on. (I love you, snapshot isolation.)

Now for the bad news. You have to change the code if you want it to use snapshot. To use this, each session must run:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
GO

This typically means that you need to segment out the queries which you’d like to use snapshot into their own connection pool as well, because resetting a connection doesn’t reset the isolation level at the same time.

If you want to turn off SNAPSHOT isolation, make sure you remove all those “SET TRANSACTION ISOLATION LEVEL SNAPSHOT” commands from your code — otherwise queries will start failing.

Why Can’t We Have Both?

You can! It is more and more common to have “mixed workloads” against a database, where lots of little transactional queries co-exist with large reporting queries. People just can’t wait for an ETL to run to get their reports anymore. If you write the code to support RCSI (no race conditions), you can use it to prevent blocking.

And you may still want to use SNAPSHOT for multi-query reports so that everything in the report is consistent with one another.

Bonus: enabling both Read Committed Snapshot and Snapshot Isolation against a database will not cause more versions to be created against the version store, either.  They both use the same versions. The main difference will be that if your SNAPSHOT multi-query transactions tend to run long, the version store in your tempdb may be a bit slower to be cleaned up.

It Pays to Be Optimistic

I’ve solved lots of problems with RCSI and Snapshot Isolation over the years. These are great tools, and they aren’t going anywhere. If you’re a SQL Server DBA or Developer, get to know them as part of your configuration toolkit.