Tag Archives | locking

Why You Should Switch in Staging Tables Instead of Renaming Them (Dear SQL DBA Episode 26)

Over the years, I’ve come across a pattern fairly frequently: an application in an OLTP database periodically creates new tables, loads some data into them and fixes it up, then does a switcheroo and replaces old tables with the new tables.

This can cause major problems with blocking if anyone else is querying the table. The problem looks like this:

  1. Innocent query / queries are selecting from the current production table
  2. Switcheroo process comes along and tries to do an sp_rename
  3. Switcheroo process is blocked and is waiting on a schema modification lock
  4. Switcheroo process starts blocking EVVERRRYBODYYYY else who wants to query the table

Sometimes this clears quickly, but it varies depending on how long the queries take and how complex the Switcheroo process is. On a busy system, a big blocking backlog can even potentially cause THREADPOOL waits, which means the SQL Server generally feels like it’s not working for anyone.

This is a tough problem, because you can’t get around it with isolation levels (even read uncommitted/nolock queries require a shared schema lock, which blocks a schema modification lock). You also can’t get around it with optimistic locking.

In the past, I wrote that if you have to do this switcheroo, sp_rename is better than ALTER SCHEMA TRANSFER, but it still has a bunch of problems.

Today, a little light bulb went on. There is a better way to do this than to use sp_rename if  you have existing code using this pattern!

Note: If you can avoid the ‘switcharoo’ pattern altogether and simply create and manage multiple versions of your tables, and have your application use the latest version, that is best because it avoids the locking problem entirely. The trick in this post is for existing codebases where sp_rename or ALTER SCHEMA TRANSFER is already in use, and you need something to mitigate blocking problems in the short term.

Here’s a 12 minute video talking through the optional pattern. You can also scroll down below the video to read a written version of the solution. If you enjoy the video, you might like to subscribe to the podcast. I would also love a review on iTunes!

This pattern works in SQL Server 2014 and higher. And it even works in Standard Edition of 2014.

Some folks will see the word ‘Switch’ in this pattern and assume the pattern that I’m suggesting is Enterprise Edition only for versions before SQL Server 2016 SP1.

However, oddly enough, you can use partition switching even in Standard Edition, as long as the tables only have one partition.

And all rowstore tables have at least one partition! That happens automagically when you create a table.

Here’s the pattern: Use SWITCH PARTITION with WAIT_AT_LOW_PRIORITY

If you’d like to play around with this in full, I’ve got a big old code sample in a gist for you to use on your test instance. But here’s the part that contains the magic:

BEGIN TRAN

    ALTER TABLE dbo.ProductionTable SWITCH PARTITION 1 TO dbo.ProductionTableOld PARTITION 1
        WITH ( WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 1 MINUTES, ABORT_AFTER_WAIT = BLOCKERS ));  

    --Anyone who tries to query the table after the switch has happened and before
    --the transaction commits will be blocked: we've got a schema mod lock on the table

    ALTER TABLE dbo.StagingTable SWITCH PARTITION 1 TO dbo.ProductionTable PARTITION 1;

COMMIT

Some notes:

  • For Standard Edition below SQL Server 2016 SP1, the syntax is a little different: ALTER TABLE dbo.ProductionTable SWITCH TO dbo.ProductionTableOld;  (Thanks to Ben for pointing this out in the comments!)
  • The reason this solution helps is that WAIT_AT_LOW_PRIORITY won’t cause the big blocking chain behind it if this gets blocked. It will sit by the side MUCH more gracefully, even if it can’t get a schema modification lock. (Here’s a Microsoft post that goes into a lot of detail about wait at low priority – if you want the nitty gritty.)
  • This also helps because you get to pick how long it waits, and what it does after that time is up. In this sample I say to kill off the blockers. That’s not going to be a good choice all the time: maybe you’d like it to just sit there waiting at low priority, or to give up itself. You get the choice.
  • You probably do want to use an explicit transaction with this, unless it’s OK that someone queries the table while it’s empty, between the switching. And if that’s the case, you’d want the WAIT_AT_LOW_PRIORITY options on the second switch.
  • This only works when the tables are in the same filegroup. Otherwise it wouldn’t be a metadata only change.
  • You have to create matching indexes on your staging table and production table to make the switching work. (But you were doing that work before you renamed the new table in, anyway.) I kept the sample gist super simple, but you can add indexes to that if you want to see it in action.

What if I want to truncate the data, instead of switching it out?

The TRUNCATE TABLE command doesn’t have the WAIT_AT_LOW_PRIORITY option. Using it would put you right back in your big old blocking chain problem. A DELETE statement will be logged and also has blocking problems.

Instead, you can use the SWITCH pattern above, and then just immediately truncate dbo.ProductionTableOld. As long as that table isn’t being read by anyone, you don’t have a blocking problem truncating it.

Anyone think this is a terrible idea?

There could well be something I’m missing about this solution. Happy to hear about it in the comments if you see a problem!

Collecting the Blocked Process Report (XEvents and Server Side Trace)

unblocked-clean-upI’m a big fan of the built-in Blocked Process Report in SQL Server. It’s come in handy for troubleshooting blocking situations for me many times.

I wanted a friendly way to share code to configure and manage the Blocked Process Report, so I’ve created a gist on GitHub sharing TSQL that:

  • Enables the Blocked Process Report (BPR)
  • Collects the BPR with an Extended Events trace
  • Collects the BPR using a Server Side SQL Trace (in case you don’t care XEvents or are running an older version of SQL Server)
  • Lists out the Extended Events and SQL Traces you have running, and gives you code to stop and delete traces if you wish

View or download the code from GitHub, or get it below.

Tools to help decode the Blocked Process Report after you collect it

/***********************************************************************
Copyright 2016, Kendra Little - littlekendra.com
MIT License, http://www.opensource.org/licenses/mit-license.php
***********************************************************************/
/***********************************************************************
FIRST, TELL SQL SERVER TO ISSUE THE BLOCKED PROCESS REPORT
***********************************************************************/
/* Check if there are any pending configuration items before you start */
/* Take care of those before proceeding if they exist */
SELECT *
FROM sys.configurations
where value <> value_in_use;
GO
/* Show 'advanced options' -- the BPR setting is advanced! */
/* Warning: RECONFIGURE pushes through ALL pending changes! */
IF (SELECT value_in_use FROM sys.configurations
where name=N'show advanced options') <> 1
BEGIN
EXEC ('EXEC sp_configure ''show advanced options'', 1;');
EXEC ('RECONFIGURE');
END
/* Set the blocked process threshold (seconds) to a value of 5 */
/* or higher to tell SQL Server to issue blocked process reports. */
/* Set this back to 0 at any time to stop blocked process reports. */
EXEC sp_configure 'blocked process threshold (s)', 5;
GO
RECONFIGURE;
GO
/* You're not done-- you must configure a trace to pick up the
Blocked Process Report.
You may use either:
* SQL Trace (server side trace recommended)
* Extended Events
*/
/***********************************************************************
NEXT, TRACE THE BLOCKED PROCESS REPORT
THIS OPTION USES XEVENTS.
YOU CAN USE A SERVER SIDE SQL TRACE INSTEAD THOUGH (SCROLL DOWN)
***********************************************************************/
/* Pre-requisites and notes:
Configure 'blocked process threshold (s)' to 5 or higher in sp_configure
This works with SQL Server 2014 and higher
Change the filename to a relevant location on the server itself
Tweak options in the WITH clause to your preference
Note that there is no automatic stop for this! If you want that, use a
Server Side SQL Trace instead.
THIS CREATES AND STARTS AN EXTENDED EVENTS TRACE
*/
/* Create the Extended Events trace */
CREATE EVENT SESSION [Blocked Process Report] ON SERVER
ADD EVENT sqlserver.blocked_process_report
ADD TARGET package0.event_file
(SET filename=
N'S:\XEvents\Blocked-Process-Report.xel', max_file_size=(1024),max_rollover_files=(4))
/* File size is in MB */
WITH (
MAX_MEMORY=4096 KB,
EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=120 SECONDS /* 0 = unlimited */,
MAX_EVENT_SIZE=0 KB,
MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,
STARTUP_STATE=ON)
GO
/* Start the Extended Events trace */
ALTER EVENT SESSION [Blocked Process Report]
ON SERVER
STATE = START;
GO
/* Drop the trace when you're done with a command like this:
DROP EVENT SESSION [Blocked Process Report] ON SERVER
GO
*/
/***********************************************************************
THIS OPTION USES A SERVER SIDE SQL TRACE TO PICK UP THE BPR.
YOU ONLY NEED THIS *OR* THE XEVENTS TRACE -- NOT BOTH
***********************************************************************/
/* Modified from a script generated from SQL Server Profiler */
/* Pre-requisites and notes:
Configure 'blocked process threshold (s)' to 5 or higher in sp_configure
This works with SQL Server 2005 and higher
Change the filename to a relevant location on the server itself
Tweak options to your preference (including the end date)
THIS CREATES AND STARTS A SERVER SIDE SQL TRACE
*/
declare @rc int;
declare @TraceID int;
declare @maxfilesizeMB bigint;
declare @TraceEndDateTime datetime;
declare @TraceFilename nvarchar(500);
declare @rolloverfilecount int;
set @TraceEndDateTime = '2020-12-12 00:00:00.000';
set @maxfilesizeMB = 1024;
set @TraceFilename = N'S:\XEvents\Blocked-Process-Report';
set @rolloverfilecount = 4;
/* Create the basic server side trace */
exec @rc = sp_trace_create
@TraceID output,
@options = 2 /* trace will use rollover files */,
@tracefile = @TraceFilename,
@maxfilesize = @maxfilesizeMB,
@stoptime = @TraceEndDateTime,
@filecount = @rolloverfilecount;
if (@rc != 0) goto error;
/* Add the blocked process report event and collect some columns */
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 137, 1, @on
exec sp_trace_setevent @TraceID, 137, 3, @on
exec sp_trace_setevent @TraceID, 137, 12, @on
exec sp_trace_setevent @TraceID, 137, 15, @on
exec sp_trace_setevent @TraceID, 137, 26, @on
/* Start the trace */
exec sp_trace_setstatus @TraceID, 1
/* Return the trace id to the caller */
select TraceID=@TraceID
goto finish
error:
select ErrorCode=@rc
finish:
go
/***********************************************************************
NOT SURE HOW MANY XEVENTS TRACES YOU'RE RUNNING?
USE THIS TO CHECK, AND POSSIBLY STOP AND DELETE.
***********************************************************************/
/* List Extended Events Traces which are currently started.
Built-in sessions include:
system_health
sp_server_diagnostics session
hkenginexesession
telemetry_xevents
*/
SELECT
name,
pending_buffers,
create_time,
session_source
FROM sys.dm_xe_sessions;
GO
/* Plug the trace name you want to stop and drop into
the commands below */
ALTER EVENT SESSION [Blocked Process Report]
ON SERVER
STATE = STOP;
GO
DROP EVENT SESSION [Blocked Process Report] ON SERVER
GO
/***********************************************************************
NOT SURE HOW MANY SERVER SIDE TRACES OR PROFILER TRACES YOU'RE RUNNING?
USE THIS TO CHECK, AND POSSIBLY STOP AND DELETE.
***********************************************************************/
/* Want to clean up a server side trace for the Blocked Process Report, or anything else? */
/* This will list all Server Side Traces (whether or not they have started) */
/* The default trace is usually trace id=1,
it will show as having no stop time and have a path like
D:\MSSQL\DATA\MSSQL13.MSSQLSERVER\MSSQL\Log\log_123.trc
*/
SELECT *
FROM sys.traces;
GO
/* To stop a trace, get the id from the query above */
/* Stop the trace by setting it to status = 0 */
EXEC sp_trace_setstatus @traceid = ? , @status = 0;
GO
/* Delete the trace by setting the status to 2 */
EXEC sp_trace_setstatus @traceid = ? , @status = 2;
GO

 

 

Decoding Key and Page WaitResource for Deadlocks and Blocking

troubleshooting-deadlocksIf you use SQL Server’s blocked process report or collect deadlock graphs, occasionally you’ll come across things that look like this:

waitresource=PAGE: 6:3:70133

waitresource=KEY: 6:72057594041991168 (ce52f92a058c)

Sometimes there’s more information in the massive monster of XML that you’re scanning through (deadlock graphs have a resource list that help reveal the object and index name), but sometimes there isn’t.

Here’s a reference on how to decode them.

All of this information is out there on the internet already in various places, it’s just spread out! I’m going to pull the whole thing together, from DBCC PAGE to hobt_id to the undocumented %%physloc%% and %%lockres%% functions.

First we’ll talk through PAGE lock waits, then we’ll hit the KEY lock waits.

1) waitresource=PAGE: 6:3:70133 ” = Database_Id : FileId : PageNumber

If your query was waiting on a page level lock, SQL Server gives you the page address.

Breaking “PAGE: 6:3:70133” down, we’ve got:

  • database_id=6
  • data_file_id = 3
  • page_number = 70133

1.1) Decode the database_id

Find the database name with this query:

SELECT 
    name 
FROM sys.databases 
WHERE database_id=6;
GO

That’s the WideWorldImporters sample database on my SQL Server instance.

1.2) Look up the data file name — if you’re interested

We’re going to use the data file id in the next step to find the name of the table. You can just move on. But if you’re curious about the name of the data file, you can look it up by using the database and plugging the data file id into this query:

USE WideWorldImporters;
GO
SELECT 
    name, 
    physical_name
FROM sys.database_files
WHERE file_id = 3;
GO

In WideWorldImporters, this is the data file named WWI_UserData, and I restored it to C:\MSSQL\DATA\WideWorldImporters_UserData.ndf. (Whoops, you caught me putting files on my system drive! Oh no! The embarrassment).

1.3) Get the name of the object from DBCC PAGE

We know this is page # 70133 in data file 3 in the WideWorldImporters database. We can look at that page with the undocumented DBCC PAGE and Trace Flag 3604.

Note: I prefer running DBCC page against a restored backup elsewhere, because it’s not supported. In some cases, running DBCC PAGE can cause stack dumps.

/* This trace flag makes DBCC PAGE output go to our Messages tab
instead of the SQL Server Error Log file */
DBCC TRACEON (3604);
GO
/* DBCC PAGE (DatabaseName, FileNumber, PageNumber, DumpStyle)*/
DBCC PAGE ('WideWorldImporters',3,70133,2);
GO

Scrolling down in the output, I can find the object_id and IndexId:

dbcc-page-waitresource

Whew, almost there!

I can now find the table and index name with this query:

USE WideWorldImporters;
GO
SELECT 
    sc.name as schema_name, 
    so.name as object_name, 
    si.name as index_name
FROM sys.objects as so 
JOIN sys.indexes as si on 
    so.object_id=si.object_id
JOIN sys.schemas AS sc on 
    so.schema_id=sc.schema_id
WHERE 
    so.object_id = 94623380
    and si.index_id = 1;
GO

And behold, this lock wait was on the PK_Sales_OrderLines index on the Sales.OrderLines table.

Note: In SQL Server 2014 and higher, you could also find the object name using the undocumented sys.dm_db_database_page_allocations dynamic management object. But you have to query all the  pages in the database, which seems not as awesome against large databases — so I listed the DBCC page method.

1.4) Can I see the data on the page that was locked?

Well, yes. But … do you really need to?

This is slow even on small tables. But it’s kinda fun, so… since you read this far… let’s talk about %%physloc%%!

%%physloc%% is an undocumented piece of magic that will return the physical record locator for every row. You can  use %%physloc%% with sys.fn_PhysLocFormatter in SQL Server 2008 and higher.

Now that we know that the page lock wait was on Sales.OrderLines, we can see all the data in that table on data file = 3 and page number = 70133 with this query:

Use WideWorldImporters;
GO
SELECT 
    sys.fn_PhysLocFormatter (%%physloc%%),
    *
FROM Sales.OrderLines (NOLOCK)
WHERE sys.fn_PhysLocFormatter (%%physloc%%) like '(3:70133%'
GO

Like I said, this is slow even on tiny tables. I’ve added NOLOCK to the query because while we want a glance at this info, we have no guarantee that it’s the way it was earlier when the blocking happened anyway– we’re guessing, so we may as well do dirty reads.

But woo hoo, it gives me a clean display of the 25 rows which the query was fighting for:

physloc-data-on-page-with-resourcewait

That’s enough detail on waitresource=PAGE. What if you were waiting on a KEY?

2) waitresource=KEY: 6:72057594041991168 (ce52f92a058c)” = Database_Id, HOBT_Id ( Magic hash that you can decode with %%lockres%% if you really want)

If your query was trying to lock a row in an an index and was blocked, you get a totally different style of address.

Breaking “6:72057594041991168 (ce52f92a058c)” down, we’ve got:

  • database_id = 6
  • hobt_id = 72057594041991168
  • magic hash value = (ce52f92a058c)

2.1) Decode the database_id

This works the exact same way it did for the page example above! Find the database name with this query:

SELECT 
    name 
FROM sys.databases 
WHERE database_id=6;
GO

In my case, that’s still the WideWorldImporters sample database.

2.2) Decode the hobt_id

We need to use that database, and then query sys.partitions, with some helper joins to figure out the table and index name…

USE WideWorldImporters;
GO
SELECT 
    sc.name as schema_name, 
    so.name as object_name, 
    si.name as index_name
FROM sys.partitions AS p
JOIN sys.objects as so on 
    p.object_id=so.object_id
JOIN sys.indexes as si on 
    p.index_id=si.index_id and 
    p.object_id=si.object_id
JOIN sys.schemas AS sc on 
    so.schema_id=sc.schema_id
WHERE hobt_id = 72057594041991168;
GO

That tells me that the query was waiting for a lock on Application.Countries, using the PK_Application_Countries index.

2.3) Now for some %%lockres%% magic – if you want to figure out which row was locked

If I really want to know exactly which row the lock needed, I can decode that by querying the table itself. We can use the undocumented %%lockres%% function to find the row equal to that magic hash value.

Note that this is going to scan the table, and on large tables that might not be so awesome all the time:

SELECT
    *
FROM Application.Countries (NOLOCK)
WHERE %%lockres%% = '(ce52f92a058c)';
GO

I added NOLOCK to this query (as Klaus Aschenbrenner suggested on Twitter) because locking can be an issue — and in this case, you’re looking to get a glance at the data as it is now, not as it was earlier when the transaction ran– so I don’t think data consistency is a big issue.

Voila, the row we were fighting for appears!

lockres-sqlserver-waitresource-key

Credits and more reading

I’m not sure who first documented many of these things, but here are two posts on some of the less documented nitty gritty that you may enjoy:

Setting up Free Blocking Alerts and Deadlock Monitoring (Dear SQL DBA Episode 17)

What tools in SQL Server will notify you about blocking and help track the queries behind your toughest blocking and deadlocking problems?

Watch the 21 minute video, subscribe to the podcast, or read the episode notes and links below.

Dear SQL DBA,

What is the best way to set up blocking and deadlock alerts on the server? I want to be notified automatically without any impact on the prod server.

I have tried alerts with SQL server performance condition alerts with SQL server agent. They do not show the queries or tables involved etc?

Thanks,

All Blocked Up


Woo hoo, I love this question!

So first off, I’m going to answer this discussing the free, built-in tools with SQL Server. If you have a budget for custom monitoring tools, you can buy fancy tools that have customized notifications for blocking and which capture the queries and plans involved. If that’s the case, set up free trials against a test system.

But not everyone has budget for every single SQL Server instance. So it’s extremely useful to know what SQL Server offers to help you with this.

And by the way, if you’re going to be at the SQLPASS Summit in Seattle in just under a month, I’m giving a session that shows different blocking scenarios. Come see my session, The Great Performance Robbery: Locking Problems and Solutions on Thursday, Oct 27th at 10:45AM in room 6C.

Free, simple blocking notifications

I like to set up blocking notifications with a simple SQL Server agent alert on the “SQLServer: General Statistic: Processes Blocked” performance counter.

This will not give you the queries involved in the blocking — don’t worry, we’ll cover tracking that in the next step.

This alert is low impact and it will let you know when you need to look at the SQL Server.

To get this alert to work, you’ll need to:

  • Configure the SQL Server Agent service to auto start
  • Set up database mail, enable it on the  SQL Server Agent, and then restart the SQL Server Agent
  • Configure an Operator in the SQL Server Agent
  • Create a new ‘performance’ style alert, base it on the “Processes Blocked” counter, and tell it who to notify

A few things to note:

  1. The SQL Agent doesn’t poll counters constantly – and we want this to be lightweight, so that’s a good thing. It will only poll every 15-30 seconds, and there’s no published guaranteed SLA on that polling frequency.
  2. If you really need something more sensitive and reliable, you need a monitoring system fully independent / outside of the SQL Server to be polling in and checking it for availability and performance.
  3. You can configure the alert to only fire every X minutes. I highly recommend that, so you don’t get an inbox of alerts every 20 seconds

Create some blocking in a test database or in tempdb and make sure the alert works.

I have example code to create blocking and deadlocks for your dev environments in my post, Deadlock Code for the WorldWideImporters Sample Database.

For production databases, you can create a temp table and write similar code to create blocking in those.

lookout-query-comin-through-wide

Finding the queries involved with the Blocked Process Report

OK, we’ve got notifications. We need SQL Server to give us more information on who is involved in the blocking.

I like to use the built-in Blocked Process Report for this. This has been in SQL Server for a long time, and it’s extremely useful.

The Blocked Process Report shows you the “input buffer” of the commands involved – it may be partial information and not the full text of the query. It will also show you the login name for who is running what, and the type of lock requests involved.

The Blocked Process Report is pretty lightweight, because SQL Server has to frequently wake up and look for blocking situations that can’t resolve themselves. By default, the deadlock monitor wakes up every 5 seconds and looks around to see if there is a deadlock which it needs to break. You may enable a feature called the ‘Blocked Process Report’ that tells SQL Server to additionally issue a report on any blocking which it finds.

To get this to work, you need to:

  • Enable the sp_configure option for the blocked process threshold. This defaults to ‘0’, which is off. You configure the threshold to the number of seconds you’d like the threshold to be. This should be a value of 5 or higher, because making the deadlock monitor run constantly could tank your performance. A good ‘starting’ value is 30 seconds.
  • You also need to set up a trace to collect an event called the ‘blocked process report’. Setting the threshold causes the event to be output, but SQL Server won’t collect it for you unless you start a SQL Trace or an Extended events trace that collects that event.

Once you have the trace file, you can copy it off of the production server to interpret it.

Michael J Swart has written a great free tool called the Blocked Process Report Viewer to help interpret the blocking chains. It’s free at https://sqlblockedprocesses.codeplex.com.

The viewer makes it easier to see the root of the blocking chain and who was blocking whom.

This trace is pretty lightweight, but with any trace you want to make sure that you don’t add a bunch of events that you don’t need, and that you periodically clean up the files and don’t let it impact drive space.

When I talk about running traces, I don’t mean running Profiler

We’re entering dangerous territory here. Whenever you talk about tracing in SQL Server these days, someone gets offended.

Here’s what you need to know. There’s two main ways to run a trace:

  1. SQL Trace. This is the old school option. You can run this using…
    1. The Profiler client (I don’t like this option)
    2. A Server Side trace scripted out from Profiler (much better!).  You can get up to speed on Server Side Traces reading this article on generating a service side trace by Jen McCown. (Note that she wrote this article back in 2009. That’s fine, SQLTrace hasn’t been changing since then.)
  2. Extended Events. This is much easier to use on SQL Server 2012 and higher than in previous versions because a GUI was introduced in Management Studio for it under Object Explorer.

I do not like leaving the Profiler application running because I’ve seen it do everything from slowing down performance to filling up drives over the years. And creating Server Side traces isn’t very hard if you do want to use SQL Trace.

I personally only like to have a trace running if I know I need it and am going to look at it. So I only enable this when I have a blocking problem. Whenever you choose to leave a trace running, you need to periodically check in on the files its created and clean up after it.

Detecting and handling deadlocks

What about locking problems where SQL Server has to step in and kill one of the queries?

You have a few built in options about how to get info on this. There are some trace flags that you can turn on which cause some information about who is involved in the deadlock to be printed to the SQL Server Error Log. This isn’t my preferred option because the information is very hard to parse through and read.

I find it more helpful to get a ‘deadlock graph’, which is a picture of how the locking fight went down.

On older versions of SQL Server, you can capture the deadlock graph with a server side trace.

On newer versions of SQL Server, you can capture this with an Extended Events trace.

A great resource for deciding how to capture deadlock information is Jonathan Kehayias’ excellent Simple Talk article, Handling Deadlocks in SQL Server.  He covers how to collect the graphs, shows examples of how they look, and gets you started tackling them.

If you get through this point and need to get really fancy with deadlocks, Michael J Swart recently wrote about using Event Notifications to collect execution plans related to deadlocks in his post, “Build Your Own Tools“. Just don’t try to run before you walk: this is pretty advanced stuff and you need to be comfortable using Service Broker, which is part of Event Notifications behind the scenes.

Updates, Oct 12, 2016:

  • Deadlock alerting: While it’s always better for the application itself to catch, handle, and log/alert on failed queries (not just deadlocks, but timeouts and anything else)… if your application can’t do that, and you can’t change it, check out this SQL Magazine article by Michael Campbell on alerting on deadlocks from SQL Server.
  • Also, thanks to Johan for pointing out that the free SQL Sentry Plan Explorer client tool is also able to display deadlock graphs. Just save them as an XDL file and you can open and view them.

Quick recap – getting more info on blocking

A fast rundown of the free, built-in tools we covered:

  • I like to use a simple, light, performance alert in the SQL Server agent for notification about blocking
  • I like the Blocked Process report to find out who’s blocking whom – collected by a server side SQL Trace or Extended events
  • I find collecting deadlock graphs with either a server side SQL Trace or Extended Events to be the most helpful way to figure out who’s involved in the nastiest blocking tangles.

Want to submit a Dear SQL DBA Question?

Want clarification on some of the basics? Got a question that jumped into your mind reading or listening to this? I’d love to hear it– asking is always free and easy!

Deadlock Code for the WideWorldImporters Sample Database

If you haven’t checked out Microsoft’s new WideWorldImporters sample database for 2016, it’s a pretty cool new little database. The database makes it easy to play around with new 2016 features, and it even ships with some cool little executables to run inserts in the “workload-drivers” folder.

I recently put together some code to reproduce a simple deadlock in WideWorldImporters. This isn’t related to the new features at all– I was just exploring the sample database and writing some demos for locking and blocking. Deadlocks are easier to understand when you have hands-on code that you can step through and watch it happen live, rather than just being mystified by it when it happens later!

Setup – Restore WideWorldImporters

All you need to do is download the WideWorldImporters-Full.bak backup (121MB) and restore it to a test instance with Developer Edition running.

Deadlock-Priority

Let’s Cause a Deadlock

You will use two session windows to create your deadlock.

In session window #1, run this code:

USE WideWorldImporters;
GO

/* Run the BEGIN tran and the first statement in this session.
We're taking out a lock on the Countries table */
WHILE @@TRANCOUNT > 1 ROLLBACK
BEGIN TRAN

    UPDATE Application.Countries
    SET LatestRecordedPopulation = LatestRecordedPopulation + 1
    WHERE IsoNumericCode = 840;

Leaving that there, open session window #2, and run this code:

SELECT CityName, StateProvinceName, sp.LatestRecordedPopulation, CountryName
FROM Application.Cities AS city
JOIN Application.StateProvinces AS sp on
    city.StateProvinceID = sp.StateProvinceID
JOIN Application.Countries AS ctry on 
    sp.CountryID=ctry.CountryID
WHERE sp.StateProvinceName = N'Virginia'

The code in session window #2 should be blocked — you’ll see it sitting there executing.

Back in session window #1, run this:

    UPDATE Application.StateProvinces
    SET LatestRecordedPopulation = LatestRecordedPopulation +1
    WHERE StateProvinceCode=N'VA'
COMMIT
GO

Session window #1 should tell you that 1 row was affected. It’s done!

Look back in session window #2, though. You should see the following message:

Msg 1205, Level 13, State 51, Line 1

Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 A “deadly embrace”

SQL Server’s deadlock manager woke up, looked around, and saw that our two session windows were stuck. They each were requesting locks that the other session wouldn’t give up– and if the deadlock manager didn’t break the deadlock, they’d be stuck there forever.

I didn’t set the deadlock priority on any of my transactions, so the deadlock manager picked the session that it thought would be the least work to roll back– and it became the victim.

How to learn more about deadlocks

This repro code will get you started. It’s rerunnable: you can keep deadlocking and deadlocking those sessions.

To teach yourself more about how the locks work:

  • Run sp_WhoIsActive with @get_locks=1 from a 3rd session window at different points and observe lock grants and waits between the session
  • Set up an extended events session to pick up the deadlock graph, reproduce the deadlock, and teach yourself to decode the graph. (Erin Stellato has a free Stairway to Extended Events if you’re just getting started with those.
  • What are your options for fixing this deadlock? Test out optimistic locking, query rewrites, setting deadlock priority, and creating indexes to see what works (I’m not saying that all of them do in this case)
  • What is Countries_Archive, and why does it show up in the lock graphs?

Happy learning!

Testing an Insert for Race Conditions with Ostress.exe

test-for-race-conditions-rectangle

Whenever we have multiple sessions modifying data, things get tricky. When we have a pattern of “check if the data exists and then do a thing,” multiple sessions get even more complicated.

It’s not very practical to try to open a bunch of sessions in SQL Server Management Studio to run commands in a loop. It’s hard to manage and if you’re like me, you find a way to crash SSMS doing it.

It’s much easier to test for race conditions using the free OStress.exe tool from Microsoft. Download OStress.exe as part of the RML Utilities for SQL Server (x64) here.

Our Example: Insert a Row if It Doesn’t Exist Already

We have a simple table named dbo.RaceCondition with two columns: RaceConditionId and RaceConditionValue. RaceConditionId is the clustered Primary Key. A unique index on RaceConditionValue prevents duplicate rows from being inserted.

Here’s how to create it:

IF DB_ID('RaceConditionTesting') is not null
BEGIN
    USE MASTER;
    ALTER DATABASE RaceConditionTesting SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE RaceConditionTesting;
END
GO

CREATE DATABASE RaceConditionTesting;
GO

use RaceConditionTesting;
GO

CREATE TABLE dbo.RaceConditionTable
(
    RaceConditionId int IDENTITY(1,1) NOT NULL,
    RaceConditionValue varchar(50) NOT NULL,
    CONSTRAINT pk_RaceConditionTable PRIMARY KEY CLUSTERED (RaceConditionId ASC)
);
GO

CREATE UNIQUE INDEX ix_RaceConditionTable_RaceConditionValue on dbo.RaceConditionTable(RaceConditionValue);
GO

/* Insert one row */
INSERT dbo.RaceConditionTable (RaceConditionValue) VALUES ('Foo');
GO

Our First Attempt: A Single Select Statement

Here’s the code to insert rows into dbo.RaceConditionTable. dbo.InsertIfNotExists contains just one INSERT/SELECT statement.

Note that this is all one statement. There’s no “IF” statement. There’s also no hints.

IF OBJECT_ID('dbo.InsertIfNotExists') IS NULL
    EXEC ('CREATE PROCEDURE dbo.InsertIfNotExists as RETURN 0;');
GO
ALTER PROCEDURE dbo.InsertIfNotExists (
    @RaceConditionValue varchar(50)
)
AS
    SET NOCOUNT ON;

    INSERT dbo.RaceConditionTable (RaceConditionValue) 
    SELECT x.newval
    FROM (SELECT @RaceConditionValue as newval) as x
    LEFT JOIN dbo.RaceConditionTable as p on 
        x.newval=p.RaceConditionValue
    WHERE p.RaceConditionValue IS NULL;
GO

Our Simple Test Harness: dbo.RaceConditionTest and Ostress.exe

Now we need a way to test this.  dbo.RaceConditionTest is a procedure that looks at RaceCondition table, picks the highest value for RaceConditionId in the table, adds one to it and turns that into a character value for RaceConditionValue.

We are going to run this procedure in OStress.exe across multiple threads. Those threads will all be generating values for RaceConditionValue using the same table. Will the simple INSERT/SELECT statement use the right locks to prevent them from generating the same values? If so, they’ll fail, because our unique index does not allow duplicate values.

IF OBJECT_ID('dbo.RaceConditionTest') IS NULL
    EXEC ('CREATE PROCEDURE dbo.RaceConditionTest as RETURN 0;');
GO
ALTER PROCEDURE dbo.RaceConditionTest 
AS
    DECLARE @namevarchar varchar(50)
    SELECT @namevarchar = CAST(MAX(RaceConditionId)+1 AS varchar(50))
    FROM dbo.RaceConditionTable

    EXEC dbo.InsertIfNotExists @[email protected];
GO

I’ll run OStress.exe with this command:

ostress.exe -Q"exec RaceConditionTesting.dbo.RaceConditionTest" -n6 -r100 -o"c:\ostressoutput"

This tells OStress to run the RaceConditionTest procedure on six threads, try to run it 100 times on each thread, and to put output files in c:\ostressoutput.

Test Results: The Simple Insert/Select

OStress.exe spews insert failures everywhere with our first INSERT statement. The message “Cannot insert duplicate key row in object ‘dbo.RaceConditionTable’ with unique index ‘ix_RaceConditionTable_RaceConditionValue'” is everywhere:

ostress-insert-select-duplicate-key

Here’s the insert command we’ve been using, to review:

    INSERT dbo.RaceConditionTable (RaceConditionValue) 
    SELECT x.newval
    FROM (SELECT @RaceConditionValue as newval) as x
    LEFT JOIN dbo.RaceConditionTable as p on 
        x.newval=p.RaceConditionValue
    WHERE p.RaceConditionValue IS NULL;

The OStress.exe results show that the locks on the LEFT JOIN to race condition to check if a row exists can be processed and released before the INSERT runs. This means that the following can happen:

  • Session A sees that key 266 does not exist and prepares to insert
  • Session B sees that key 266 does not exist and prepares to insert
  • Session A runs its  insert
  • Session B attempts to run its insert, but fails because of a duplicate key error

Revised Code: Higher Locks with UPDLOCK

What if we tell SQL Server to take out a higher lock when it reads the row? If we block when we read, maybe that will solve the problem by making others wait. Let’s test it. Here’s our revised code:

IF OBJECT_ID('dbo.InsertIfNotExists') IS NULL
    EXEC ('CREATE PROCEDURE dbo.InsertIfNotExists as RETURN 0;');
GO
ALTER PROCEDURE dbo.InsertIfNotExists (
    @RaceConditionValue varchar(50)
)
AS
    SET NOCOUNT ON;

    INSERT dbo.RaceConditionTable (RaceConditionValue) 
    SELECT x.newval
    FROM (SELECT @RaceConditionValue as newval) as x
    LEFT JOIN dbo.RaceConditionTable as p WITH (UPDLOCK) on 
        x.newval=p.RaceConditionValue
    WHERE p.RaceConditionValue IS NULL;
GO

The only change here is that the LEFT JOIN now has a hint telling SQL Server to take out an update lock (UPDLOCK) when it reads the row.

Test Results: UPDLOCK

Re-running OStress.exe, I can see that I haven’t fixed the issue. I’ve still got unique key violations galore:

ostress-insert-select-updlock-duplicate-key

Even though I’m taking out UPDLOCKS, the following race condition pattern can still occur

  • Session A takes out an UPDLOCK, sees that key 266 does not exist, releases its UPDLOCK, and prepares to insert
  • Session B takes out an UPDLOCK, sees that key 266 does not exist, releases its UPDLOCK, and prepares to insert
  • Session A runs its  insert
  • Session B attempts to run its insert, but fails because of a duplicate key error

We need to hold that lock.

More Revised Code: Higher Locks with UPDLOCK, Lock Duration with HOLDLOCK

We can protect against race conditions like this by telling SQL Server to take out more locks when it processes the existence check and to hold them for the duration of the transaction. Here’s the revised code:

IF OBJECT_ID('dbo.InsertIfNotExists') IS NULL
    EXEC ('CREATE PROCEDURE dbo.InsertIfNotExists as RETURN 0;');
GO
ALTER PROCEDURE dbo.InsertIfNotExists (
    @RaceConditionValue varchar(50)
)
AS
    SET NOCOUNT ON;

    INSERT dbo.RaceConditionTable (RaceConditionValue) 
    SELECT x.newval
    FROM (SELECT @RaceConditionValue as newval) as x
    LEFT JOIN dbo.RaceConditionTable as p WITH (UPDLOCK, HOLDLOCK) on 
        x.newval=p.RaceConditionValue
    WHERE p.RaceConditionValue IS NULL;
GO

Now the LEFT JOIN now has a hint telling SQL Server is taking out an (UPDLOCK) when it reads the row AND to protect that key range for the duration of the transaction (HOLDLOCK). So we’re using Serializable Isolation level and telling it to lock it like we’re modifying, no matter what.

Test Results: UPDLOCK, HOLDLOCK

Re-running OStress.exe, I get nice clean output this time:

ostress-insert-select-updlock-holdlock

No key insert violations! I get the same thing running across 20 threads, 1,000 executions per thread.

There Are Downsides to Taking Out High Locks and Holding Them Longer

Higher locks that are held longer can mean more blocking, and of course that can slow you down. If new rows come in relatively rarely, I probably want to make my code a bit more complex and only take out the higher locks when I really have to. I also want to make sure my indexes are optimized to help me read and lock the smallest range possible for speed. Check out a sample of that scenario in my prior post.