How to Choose Between RCSI and Snapshot Isolation Levels
SQL Server offers two flavors of optimistic locking for traditional disk-based tables: Read Committed Snapshot Isolation (RCSI), and Snapshot Isolation. They are each great tools to reduce blocking and make applications faster, particularly in databases that have lots of tiny reads and writes that need to be quick, or mixed-use patterns (lots of little reads and writes + larger reporting queries).
Both isolation levels are available in SQL Server Standard Edition and Web Edition as well as Enterprise. That’s right, they’re cheap and easy. Each of them are controlled by database level settings, and default to disabled for new user databases when you install SQL Server and leave the default settings on the model database.
When should you pick one or the other? And when might you enable both?
Cheat Sheet for Snapshot and RCSI
Lots of details are below, but you don’t have time to read all that. Here’s what you want to know:
|Existing OLTP application databases that have report queries running against them||No. Risk of race conditions unless you test everything.||Yes (for reports)|
|New OLTP databases with fresh code you’re writing||Yes. If you’re not going to test then you’ll have incorrect data anyway.||Yes (for reports that contain multiple queries and need data to be consistent from a single point in time)|
|Replication subscriber databases with big reporting queries or ETL scans||Yes||Yes (for reports that contain multiple queries and need data to be consistent from a single point in time)|
|Vendor application databases||Ask the vendor if they support RCSI. If they don’t understand the question then ask if they also write their product for Oracle or Postgres. (If so then they probably support RCSI.)||Not unless their install instructions specify it. (Requires code changes.)|
|Pure data warehouses||No||No|
And there you have it!
Read Committed Snapshot Isolation (RCSI): Your New Default Isolation Level
As soon as you enable Read Committed Snapshot on a database, modifications start noting timestamps on the rows in your database of when the changed happened. Modifications where there’s a ‘previous’ version (data that was deleted, or the ‘before’ value in an update) are stored in the row versioning store in tempdb.
After this is enabled, all your queries automatically start to use those little versioning timestamps in the database– no code changes required! If someone else is actively changing data, the queries automatically go use those versions in tempdb instead of waiting for the lock to be released.
Under RCSI, readers cannot block writers, and writers cannot block readers. That’s extremely powerful, and it’s awesome.
RCSI really shines for these scenarios:
- Databases behind websites and busy user applications where there are frequent updates plus a large amount of reads (OLTP, or OLTP mixed with reporting)
- Replication subscriber databases where replication is pumping in frequent changes, and readers are running reports or scanning data for ETL
Enable RCSI with ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON
Here’s the basic command:
ALTER DATABASE SQLIndexWorkbook SET READ_COMMITTED_SNAPSHOT ON; GO
But that may not work. To enable RCSI, you must be the only active user in the database. That means you have to basically kill off everyone else temporarily. This command can do the trick, but be very careful:
ALTER DATABASE SQLIndexWorkbook SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE; GO
The ‘WITH ROLLBACK IMMEDIATE’ part of the command essentially kills everyone else in the database. Immediately.
Except it might not be so immediate if they were doing a large piece of work that has to be rolled back. All rollback is single threaded. Depending on what they were doing, that might be a long wait.
You can check current activity in the database with Adam Machanic’s sp_WhoIsActive.
The Problem with RCSI: Race Conditions and Incorrect Data
There’s one big problem with RCSI for existing applications. It isn’t related to performance – yeah, RCSI will increase use of tempdb, but it’s pretty easy to speed up tempdb using SSDs in many cases, and really tempdb is primarily used for updates and deletes, not inserts. You can solve tempdb issues fairly quickly.
The problem is worse, because it’s about incorrect data, and it’s not a bug. Your application code may have been written to be dependent on pessimistic locking, and enabling RCSI can change your application’s logic.
Let’s look at an example using pseudo-code. Let’s say our company sells bunny slippers online. Right now we’re running a huge coupon-based sale where bunny slippers are 99% off, but only for the first 100 customers.
Our code follows this pattern:
- Select the number of coupons used so far from dbo.Coupons
- If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon
- [Proceed with checkout…]
Before You Enable RCSI (Read Committed)
Under the default read committed isolation level, here’s how this works:
|Select the number of coupons used so far from dbo.Coupons||Shared locks for reading||Will be blocked and wait if coupon is being updated|
|If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon||Update locks||Will be blocked and wait if someone else is updating|
|Select the number of coupons used so far from dbo.Coupons||Shared locks for reading||Will NOT be blocked if coupon is being updated. Will read the committed count of coupons in version store and proceed to the next step immediately.|
|If the number of coupons used is less than 100 then update the number of coupons used (coupons used +1) in dbo.Coupon||Update locks||Will be blocked and wait if someone else is updating (But it’s too late! We may have already read an incorrect count.)|
If you turn on RCSI without testing, things might be fine at first. But when things get busy, if you hit this race condition, then you could use more coupons than you were supposed to.
This could be a very expensive mistake if you’re selling more than bunny slippers.
For existing applications, you have to ask:
- Was this application written to also support Oracle or Postgres? Those use optimistic locking, so if so, everything is probably fine.
- What are the risks that we’ll hit a race condition like this?
- What should we test to mitigate the risks?
- Who’s going to do the testing, and what bribes gifts do they accept?
If the risks are high and you can’t do enough testing, never fear: SNAPSHOT isolation might still help
Snapshot Isolation: When You Can’t Test Everything or Need Transaction Level Consistency
Snapshot Isolation is fantastic for existing applications, because it doesn’t automatically apply to every query that runs. And for many existing applications, the problem is really that some long reporting queries block little reads and writes, or vice versa. So all you have to do is tell the reports to use Snapshot Isolation.
The other great use for Snapshot Isolation is multi statement reports. Sometimes a report has multiple areas – there’s more than one chart, or a diagram and then a chart. And if they’re not consistent with a single point in time, the report can be wrong (revenue doesn’t add up between them, etc). Snapshot isolation helps with this: if all the queries in your report are wrapped in a transaction, they will all be consistent as of the time the transaction began.
RCSI does not help with this problem. Under RCSI, all individual queries are consistent with the time that query began, regardless of whether they are wrapped in a larger transaction or not.
Enable Snapshot with ALTER DATABASE SET ALLOW_SNAPSHOT_ISOLATION ON
You need to first enable Snapshot for the database, like this:
ALTER DATABASE SQLIndexWorkbook SET ALLOW_SNAPSHOT_ISOLATION ON; GO
Good news, there’s no need for any ‘ROLLBACK IMMEDIATE’ on this one. Snapshot isolation is the smarter child in the family. It will go into a pending state and wait for any running transactions to finish up, then switch on. (I love you, snapshot isolation.)
Now for the bad news. You have to change the code if you want it to use snapshot. To use this, each session must run:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT; GO
This typically means that you need to segment out the queries which you’d like to use snapshot into their own connection pool as well, because resetting a connection doesn’t reset the isolation level at the same time.
If you want to turn off SNAPSHOT isolation, make sure you remove all those “SET TRANSACTION ISOLATION LEVEL SNAPSHOT” commands from your code – otherwise queries will start failing.
Why Can’t We Have Both?
You can! It is more and more common to have “mixed workloads” against a database, where lots of little transactional queries co-exist with large reporting queries. People just can’t wait for an ETL to run to get their reports anymore. If you write the code to support RCSI (no race conditions), you can use it to prevent blocking.
And you may still want to use SNAPSHOT for multi-query reports so that everything in the report is consistent with one another.
Bonus: enabling both Read Committed Snapshot and Snapshot Isolation against a database will not cause more versions to be created against the version store, either. They both use the same versions. The main difference will be that if your SNAPSHOT multi-query transactions tend to run long, the version store in your tempdb may be a bit slower to be cleaned up.
It Pays to Be Optimistic
I’ve solved lots of problems with RCSI and Snapshot Isolation over the years. These are great tools, and they aren’t going anywhere. If you’re a SQL Server DBA or Developer, get to know them as part of your configuration toolkit.