How to Choose Between RCSI and Snapshot Isolation Levels

reads-vs-writes-RCSI

SQL Server offers two flavors of optimistic locking for traditional disk-based tables: Read Committed Snapshot Isolation (RCSI), and Snapshot Isolation. They are each great tools to reduce blocking and make applications faster, particularly in databases that have lots of tiny reads and writes that need to be quick, or mixed-use patterns (lots of little reads and writes + larger reporting queries).

Both isolation levels are available in SQL Server Standard Edition and Web Edition as well as Enterprise. That’s right, they’re cheap and easy. Each of them are controlled by database level settings, and default to disabled for new user databases when you install SQL Server and leave the default settings on the model database.

When should you pick one or the other? And when might you enable both?

Cheat Sheet for Snapshot and RCSI

Lots of details are below, but you don’t have time to read all that. Here’s what you want to know:

#Scenario Read Committed Snapshot (RCSI) Snapshot
1Existing OLTP application databases that have report queries running against them No. Risk of race conditions unless you test everything. Yes (for reports)
2New OLTP databases with fresh code you're writing Yes. If you're not going to test then you'll have incorrect data anyway. Yes (for reports that contain multiple queries and need data to be consistent from a single point in time)
3Replication subscriber databases with big reporting queries or ETL scans  Yes  Yes (for reports that contain multiple queries and need data to be consistent from a single point in time)
4Vendor application databases Ask the vendor if they support RCSI. If they don't understand the question then ask if they also write their product for Oracle or Postgres. (If so then they probably support RCSI.) Not unless their install instructions specify it. (Requires code changes.)
5Pure data warehouses No No

And there you have it!

Read Committed Snapshot Isolation (RCSI): Your New Default Isolation Level

As soon as you enable Read Committed Snapshot on a database, modifications start noting timestamps on the rows in your database of when the changed happened. Modifications where there’s a ‘previous’ version (data that was deleted, or the ‘before’ value in an update) are stored in the row versioning store in tempdb.

After this is enabled, all your queries automatically start to use those little versioning timestamps in the database– no code changes required! If someone else is actively changing data, the queries automatically go use those versions in tempdb instead of waiting for the lock to be released.

Under RCSI, readers cannot block writers, and writers cannot block readers. That’s extremely powerful, and it’s awesome.

RCSI really shines for these scenarios:

  • Databases behind websites and busy user applications where there are frequent updates plus a large amount of reads (OLTP, or OLTP mixed with reporting)
  • Replication subscriber databases where replication is pumping in frequent changes, and readers are running reports or scanning data for ETL

Enable RCSI with ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON

Here’s the basic command:

ALTER DATABASE SQLIndexWorkbook SET READ_COMMITTED_SNAPSHOT ON;
GO

But that may not work. To enable RCSI, you must be the only active user in the database. That means you have to basically kill off everyone else temporarily. This command can do the trick, but be very careful:

ALTER DATABASE SQLIndexWorkbook SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE;
GO

The ‘WITH ROLLBACK IMMEDIATE’ part of the command essentially kills everyone else in the database. Immediately.

Except it might not be so immediate if they were doing a large piece of work that has to be rolled back. All rollback is single threaded. Depending on what they were doing, that might be a long wait.

You can check current activity in the database with Adam Machanic’s sp_WhoIsActive.

The Problem with RCSI: Race Conditions and Incorrect Data

There’s one big problem with RCSI for existing applications. It isn’t related to performance — yeah, RCSI will increase use of tempdb, but it’s pretty easy to speed up tempdb using SSDs in many cases, and really tempdb is primarily used for updates and deletes, not inserts. You can solve tempdb issues fairly quickly.

The problem is worse, because it’s about incorrect data, and it’s not a bug. Your application code may have been written to be dependent on pessimistic locking, and enabling RCSI can change your application’s logic.

Let’s look at an example using pseudo-code. Let’s say our company sells bunny slippers online. Right now we’re running a huge coupon-based sale where bunny slippers are 99% off, but only for the first 100 customers.

Our code follows this pattern:

  1. Select the number of coupons used so far from dbo.Coupons
  2. If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon
  3. [Proceed with checkout…]

Before You Enable RCSI (Read Committed)…

Under the default read committed isolation level, here’s how this works:

#Pattern Locking Impact What if Someone Else is Updating?
1Select the number of coupons used so far from dbo.Coupons Shared locks for reading Will be blocked and wait if coupon is being updated
2If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon Update locks Will be blocked and wait if someone else is updating

After You Enable RCSI…

The pattern changes in a very important way:

#Pattern Locking Impact What if Someone Else is Updating?
1Select the number of coupons used so far from dbo.Coupons Shared locks for reading Will NOT be blocked if coupon is being updated. Will read the committed count of coupons in version store and proceed to the next step immediately.
2If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon Update locks Will be blocked and wait if someone else is updating (But it's too late! We may have already read an incorrect count.)

If you turn on RCSI without testing, things might be fine at first. But when things get busy, if you hit this race condition, then you could use more coupons than you were supposed to.

This could be a very expensive mistake if you’re selling more than bunny slippers.

For existing applications,  you have to ask:

  • Was this application written to also support Oracle or Postgres? Those use optimistic locking, so if so, everything is probably fine.
  • What are the risks that we’ll hit a race condition like this?
  • What should we test to mitigate the risks?
  • Who’s going to do the testing, and what bribes gifts do they accept?

If the risks are high and you can’t do enough testing, never fear: SNAPSHOT isolation might still help

Snapshot Isolation: When You Can’t Test Everything, or Need Transaction Level Consistency

Snapshot Isolation is fantastic for existing applications, because it doesn’t automatically apply to every query that runs. And for many existing applications, the problem is really that some long reporting queries block little reads and writes, or vice versa. So all you have to do is tell the reports to use Snapshot Isolation.

The other great use for Snapshot Isolation is multi statement reports. Sometimes a report has multiple areas — there’s more than one chart, or a diagram and then a chart. And if they’re not consistent with a single point in time, the report can be wrong (revenue doesn’t add up between them, etc).  Snapshot isolation helps with this: if all the queries in your report are wrapped in a transaction, they will all be consistent as of the time the transaction began.

RCSI does not help with this problem. Under RCSI, all individual queries are consistent with the time that query began, regardless of whether they are wrapped in a larger transaction or not.

Enable Snapshot with ALTER DATABASE SET ALLOW_SNAPSHOT_ISOLATION ON

You need to first enable Snapshot for the database, like this:

ALTER DATABASE SQLIndexWorkbook SET ALLOW_SNAPSHOT_ISOLATION ON;
GO

Good news, there’s no need for any ‘ROLLBACK IMMEDIATE’ on this one. Snapshot isolation is the smarter child in the family. It will go into a pending state and wait for any running transactions to finish up, then switch on. (I love you, snapshot isolation.)

Now for the bad news. You have to change the code if you want it to use snapshot. To use this, each session must run:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
GO

This typically means that you need to segment out the queries which you’d like to use snapshot into their own connection pool as well, because resetting a connection doesn’t reset the isolation level at the same time.

If you want to turn off SNAPSHOT isolation, make sure you remove all those “SET TRANSACTION ISOLATION LEVEL SNAPSHOT” commands from your code — otherwise queries will start failing.

Why Can’t We Have Both?

You can! It is more and more common to have “mixed workloads” against a database, where lots of little transactional queries co-exist with large reporting queries. People just can’t wait for an ETL to run to get their reports anymore. If you write the code to support RCSI (no race conditions), you can use it to prevent blocking.

And you may still want to use SNAPSHOT for multi-query reports so that everything in the report is consistent with one another.

Bonus: enabling both Read Committed Snapshot and Snapshot Isolation against a database will not cause more versions to be created against the version store, either.  They both use the same versions. The main difference will be that if your SNAPSHOT multi-query transactions tend to run long, the version store in your tempdb may be a bit slower to be cleaned up.

It Pays to Be Optimistic

I’ve solved lots of problems with RCSI and Snapshot Isolation over the years. These are great tools, and they aren’t going anywhere. If you’re a SQL Server DBA or Developer, get to know them as part of your configuration toolkit.

 

, , , ,

18 Responses to How to Choose Between RCSI and Snapshot Isolation Levels

  1. Michael J Swart February 18, 2016 at 8:28 am #

    Hey Kendra,

    Is it true that once we begin using RCSI or Snapshot isolation, we have to be disciplined about monitoring for long-running queries that could cause large growth in tempdb’s version-store?

    How tricky is that?

    • Kendra Little February 18, 2016 at 1:27 pm #

      Great question. The cases I know of where there’s been perf impact has been when cleanup has been blocked for days (not minutes). But good apps have bad moments when you least expect it, so worth monitoring.

      It’s super easy — you set up a SQL agent alert job based on the performance counter “longest transaction running time” on the SQL server transactions object. The counter updates every 60 seconds. Works for both RCSI and snapshot (even though bol says RCSI only, it’s always worked for snapshot for me even with RCSI off). https://msdn.microsoft.com/en-us/library/ms189038.aspx

      Baseline it first before setting a high sev alert though. Things like index maintenance can cause spikes.

      There are other counters for version store size you can baseline and alert on, too, if you want to get fancy. I like the Tran running time for simplicity.

  2. MickeyStuewe February 18, 2016 at 9:28 am #

    Hi Kendra,

    It sounds like RCSI would work really nicely on a database that uses solely views that are referenced in Tibco reports. Is there anything I should be concerned about when the views reference another database that is replicated on the same server or Linked Servers (which I’m trying to get rid of)?

    Thanks,
    Mickey

    • Kendra Little February 18, 2016 at 1:18 pm #

      You’ll have read committed still against the other databases, unless you’re hinting otherwise or they also have RCSI turned on. If queries were slow because of references to tables in other databases/ blocking there, it would delay version store cleanup. That’s all that comes to mind.

  3. Michael Williamson February 18, 2016 at 11:39 am #

    Great article. I have read that errors can occur on “wide rows” (> 8046 max bytes) when using RCSI. I’m currently at a client that I believe would benefit from moving to RSCI, but unfortunately they have some gigantic tables (e.g. 182 columns in one table). Do you know what types of errors might be encountered and the associated “risk”? Thanks!

  4. Kalen July 3, 2016 at 2:04 pm #

    Hi Kendra

    You said both options “default to disabled when you install SQL Server”. Since these are database level settings, you might want to say that they default to disabled when a new database is created (because that is the way model is created). In a new installation, it does turn out that both master and msdb have the snapshot_isolation_state set to 1.

    Thanks for the nice chart!

    ~Kalen

    • Kendra Little July 5, 2016 at 7:42 am #

      Thanks for the comment, Kalen. I tweaked the wording a bit to make it clear that it’s about user databases and comes from model.

      I have the “when you install” in there just to try to differentiate from Azure SQL Database. I’m trying to think of the best term for that — I used to use something like “boxed product” install, but then nobody ever really sees a box anymore. “When you install SQL Server yourself” isn’t really great either, since so many people use hosting providers, but it’s the best I have for now.

  5. Jurgen wolf November 15, 2016 at 8:06 am #

    Hi Kendra,

    You mentioned in the cheat sheet that both options are a no for a data warehouse.

    I would like to know, if that is a case of it shouldn’t be used at all or rather can but need not be used? If you can use it, in which scenarios would you not use it and which would you? Wouldn’t a data warehouse also benefit from it though to some degree, especially if queries are happening during the ETL load period?

    Thanks

    • Kendra Little November 15, 2016 at 10:10 am #

      Hi Jurgen,

      The cheat sheet is general patterns. There might be cases where I’d use RCSI and/or Snapshot in a data warehouse, it’s just not the usage pattern it was designed for. The biggest issues that come to mind are:

      * Large number of versions created in version store– particularly if you have a complex dimension load, this might slow it down
      * Long running transactions in batch processes might prevent version cleanup

      There are many cases where optimistic locking might work well in a data warehouse, if things like this aren’t a problem. It’s just not a “natural” fit, and I would test it very carefully.

  6. Jim Harrington May 26, 2017 at 4:05 pm #

    Hello Kendra,

    Since yesterday I have been puzzling over a surprise with how the SNAPSHOT transaction behaves. (But thanks to this I did find your site today 🙂

    In SSMS (under SQL Server 2008) I can start a SNAPSHOT transaction and WAITFOR DELAY 15 seconds, click to a second tab (i.e. session), UPDATE a four-row testing table and then click back to the first tab to wait. After the DELAY, the still going SNAPSHOT transaction SELECTs the rows of the test table but returns their UPDATEd versions, even though the UPDATE happened in another session after the transaction began. This would seem not to keep the “a single point in time” promise of the SNAPSHOT isolation level.

    But if the SNAPSHOT transaction, before the DELAY, selects the four rows (or even SELECT COUNT(*)s the testing table), afterwards the SELECT statement running again in the same transaction will return as expected the unchanged versions.

    It seems the SNAPSHOT is being taken when the table is first touched. Does this fit with what you have seen? Thank you! -Jim

    • Kendra Little May 30, 2017 at 8:30 am #

      Hi Jim,

      Thanks for your note! I think I may do a blog post on this because it’s a fine point that I admit I haven’t thought much about. Typically when I’ve used snapshot isolation, I’ve used it for something like reports, where we want each statement to have consistent data with one another and dodge blocking problems, and we’re running a SELECT immediately after beginning the transaction.

      This is documented deep in the whitepaper on row versioning isolation levels in a little paragraph titled ‘Understanding the “Beginning” of a Transaction’. It says:

      … the version that a transaction will use is based on the first statement that accesses data, and not the BEGIN TRAN that creates the transaction.

      I do commonly say things like “snapshot transactions see data consistent with the start of the transaction.” (And so does this whitepaper!) But technically in this case “start of the transaction” refers to the first data access within the transaction.

      I am going to try to change the way I talk/write about this to be a little more clear in the future, because it is an interesting point, depending how you’re using it!

      • Jim Harrington May 30, 2017 at 4:44 pm #

        Hi Kendra,

        Welcome and thank you for your kind reply and for the whitepaper reference! What I am seeing is the engine’s design, I now understand.

        About the whitepaper’s point, “the version that a transaction will use is based on the first statement that accesses data,” I inadvertently found today one other necessary condition: It seems that the data access must apply a lock. Continuing with the test table of my original question to you, if that initial SELECT from the test table at the beginning of the transaction uses a NOLOCK or READCOMMITTED table hint, the snapshot is not taken and the SELECT after the WAITFOR returns the updated row versions.

        Knowing all of this brings to mind today a workaround for a SNAPSHOT transaction comprising a series of (maybe drawn out) SELECT statements. Immediately after BEGINning the SNAPSHOT, the following example IF statement can very quickly tickle all of the tables which need to stay contemporary to each other over the course of the transaction. No hints, deferring to SQL Server for locking. Happily, it is fast even with a many-rowed table (). (In a local system, table [Production Order Confirmation] has 1.86 million rows while [Material Document Item] has 14 million.)

        BEGIN (snapshot) TRANSACTION

        IF EXISTS(SELECT * FROM [test 1]) AND
        EXISTS(SELECT * FROM [Production Order Confirmation]) AND
        EXISTS(SELECT * FROM [Material Document Item (a)]) AND so on . . .

        SELECT ‘HELLO’

        (The IF needs to lead to something, hence the greeting.) All of the real SELECT statements would follow, finally concluding with the COMMIT, of course. Thanks much! -Jim

        • Kendra Little May 31, 2017 at 3:14 pm #

          “…if that initial SELECT from the test table at the beginning of the transaction uses a NOLOCK or READCOMMITTED table hint, the snapshot is not taken and the SELECT after the WAITFOR returns the updated row versions.”

          The difference you’ve noticed isn’t that it needs to take a lock. Using those hints is changing the isolation level so it is NOT snapshot isolation anymore. Read uncommitted (nolock) and read committed still use locks — essentially what you’re doing with those hints is lowering the isolation level for the statement you’re hinting so it’s not snapshot isolation.

          You also only need to do data access under snapshot isolation against ANY object in the database. This sets the reference version. (It’s not set per table.)

          Can I ask what use case you’d have where you’d be starting a transaction under snapshot isolation but not doing any data access using snapshot right away?

          • Jim Harrington May 31, 2017 at 5:33 pm #

            “You also only need to do data access under snapshot isolation against ANY object in the database.”

            Oh I see now.

            “Can I ask . . . ?” Thank you.

            The snapshot transactions in question for me either select from the same table twice with different (though overlapping) criteria or go against a pair of tables one at a time that are maintained together by another session’s single snapshot transaction. Both cases have some SQL’ing between the successive SELECTs at issue. SQL Server I had assumed assures consistency and you have made plain that it really does 🙂

            One more scenario occasioned my post last Friday. Three SQL Server sessions INSERTing rows to a narrow table WITH (TABLOCK) which lives to list recently changed production order operations (with columns [Production Order Number] and [Operation Number]), not caring if the order operation is already listed there — the primary key also includes one more column of type Timestamp. A fourth session periodically SELECTs what is there, including the maximum (i.e. most recent) Timestamp value for each order operation so that later on it can DELETE the rows it selected and earlier ones but leave alone any subsequently inserted ones. (Handle those the next time.)

            •Is that SELECT statement best within a SNAPSHOT transaction or should I play it safe and duke it out with the INSERTs, simply going with a TABLOCK and no transaction?• I was trying to verify that the SNAPSHOT always would be taken between all of those INSERTs, not somehow in the midst of one. Which approach would you see as the smoothest? Thanks for the space here! *//**

Trackbacks/Pingbacks

  1. Choosing Between Optimistic Concurrency Levels – Curated SQL - February 19, 2016

    […] Kendra Little has a cheat sheet for comparing the two optimistic concurrency levels: […]

  2. Deadlock Code for WideWorldImporters - by Kendra Little - September 13, 2016

    […] are your options for fixing this deadlock? Test out optimistic locking, query rewrites, setting deadlock priority, and creating indexes to see what works (I’m not […]

  3. Challenge Accepted – A Shot of SQLEspresso - September 14, 2016

    […] blog was a HUGE help. You can read at her blog all the details as to why this is important here: http://www.littlekendra.com/2016/02/18/how-to-choose-rcsi-snapshot-isolation-levels/ . Once those to options were implemented the replication ran seamlessly and the blocking […]

Leave a Reply