Tag Archives | blocking

Collecting the Blocked Process Report (XEvents and Server Side Trace)

unblocked-clean-upI’m a big fan of the built-in Blocked Process Report in SQL Server. It’s come in handy for troubleshooting blocking situations for me many times.

I wanted a friendly way to share code to configure and manage the Blocked Process Report, so I’ve created a gist on GitHub sharing TSQL that:

  • Enables the Blocked Process Report (BPR)
  • Collects the BPR with an Extended Events trace
  • Collects the BPR using a Server Side SQL Trace (in case you don’t care XEvents or are running an older version of SQL Server)
  • Lists out the Extended Events and SQL Traces you have running, and gives you code to stop and delete traces if you wish

View or download the code from GitHub, or get it below.

Tools to help decode the Blocked Process Report after you collect it

/***********************************************************************
Copyright 2016, Kendra Little - littlekendra.com
MIT License, http://www.opensource.org/licenses/mit-license.php
***********************************************************************/
/***********************************************************************
FIRST, TELL SQL SERVER TO ISSUE THE BLOCKED PROCESS REPORT
***********************************************************************/
/* Check if there are any pending configuration items before you start */
/* Take care of those before proceeding if they exist */
SELECT *
FROM sys.configurations
where value <> value_in_use;
GO
/* Show 'advanced options' -- the BPR setting is advanced! */
/* Warning: RECONFIGURE pushes through ALL pending changes! */
IF (SELECT value_in_use FROM sys.configurations
where name=N'show advanced options') <> 1
BEGIN
EXEC ('EXEC sp_configure ''show advanced options'', 1;');
EXEC ('RECONFIGURE');
END
/* Set the blocked process threshold (seconds) to a value of 5 */
/* or higher to tell SQL Server to issue blocked process reports. */
/* Set this back to 0 at any time to stop blocked process reports. */
EXEC sp_configure 'blocked process threshold (s)', 5;
GO
RECONFIGURE;
GO
/* You're not done-- you must configure a trace to pick up the
Blocked Process Report.
You may use either:
* SQL Trace (server side trace recommended)
* Extended Events
*/
/***********************************************************************
NEXT, TRACE THE BLOCKED PROCESS REPORT
THIS OPTION USES XEVENTS.
YOU CAN USE A SERVER SIDE SQL TRACE INSTEAD THOUGH (SCROLL DOWN)
***********************************************************************/
/* Pre-requisites and notes:
Configure 'blocked process threshold (s)' to 5 or higher in sp_configure
This works with SQL Server 2014 and higher
Change the filename to a relevant location on the server itself
Tweak options in the WITH clause to your preference
Note that there is no automatic stop for this! If you want that, use a
Server Side SQL Trace instead.
THIS CREATES AND STARTS AN EXTENDED EVENTS TRACE
*/
/* Create the Extended Events trace */
CREATE EVENT SESSION [Blocked Process Report] ON SERVER
ADD EVENT sqlserver.blocked_process_report
ADD TARGET package0.event_file
(SET filename=
N'S:\XEvents\Blocked-Process-Report.xel', max_file_size=(1024),max_rollover_files=(4))
/* File size is in MB */
WITH (
MAX_MEMORY=4096 KB,
EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=120 SECONDS /* 0 = unlimited */,
MAX_EVENT_SIZE=0 KB,
MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,
STARTUP_STATE=ON)
GO
/* Start the Extended Events trace */
ALTER EVENT SESSION [Blocked Process Report]
ON SERVER
STATE = START;
GO
/* Drop the trace when you're done with a command like this:
DROP EVENT SESSION [Blocked Process Report] ON SERVER
GO
*/
/***********************************************************************
THIS OPTION USES A SERVER SIDE SQL TRACE TO PICK UP THE BPR.
YOU ONLY NEED THIS *OR* THE XEVENTS TRACE -- NOT BOTH
***********************************************************************/
/* Modified from a script generated from SQL Server Profiler */
/* Pre-requisites and notes:
Configure 'blocked process threshold (s)' to 5 or higher in sp_configure
This works with SQL Server 2005 and higher
Change the filename to a relevant location on the server itself
Tweak options to your preference (including the end date)
THIS CREATES AND STARTS A SERVER SIDE SQL TRACE
*/
declare @rc int;
declare @TraceID int;
declare @maxfilesizeMB bigint;
declare @TraceEndDateTime datetime;
declare @TraceFilename nvarchar(500);
declare @rolloverfilecount int;
set @TraceEndDateTime = '2020-12-12 00:00:00.000';
set @maxfilesizeMB = 1024;
set @TraceFilename = N'S:\XEvents\Blocked-Process-Report';
set @rolloverfilecount = 4;
/* Create the basic server side trace */
exec @rc = sp_trace_create
@TraceID output,
@options = 2 /* trace will use rollover files */,
@tracefile = @TraceFilename,
@maxfilesize = @maxfilesizeMB,
@stoptime = @TraceEndDateTime,
@filecount = @rolloverfilecount;
if (@rc != 0) goto error;
/* Add the blocked process report event and collect some columns */
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 137, 1, @on
exec sp_trace_setevent @TraceID, 137, 3, @on
exec sp_trace_setevent @TraceID, 137, 12, @on
exec sp_trace_setevent @TraceID, 137, 15, @on
exec sp_trace_setevent @TraceID, 137, 26, @on
/* Start the trace */
exec sp_trace_setstatus @TraceID, 1
/* Return the trace id to the caller */
select TraceID=@TraceID
goto finish
error:
select ErrorCode=@rc
finish:
go
/***********************************************************************
NOT SURE HOW MANY XEVENTS TRACES YOU'RE RUNNING?
USE THIS TO CHECK, AND POSSIBLY STOP AND DELETE.
***********************************************************************/
/* List Extended Events Traces which are currently started.
Built-in sessions include:
system_health
sp_server_diagnostics session
hkenginexesession
telemetry_xevents
*/
SELECT
name,
pending_buffers,
create_time,
session_source
FROM sys.dm_xe_sessions;
GO
/* Plug the trace name you want to stop and drop into
the commands below */
ALTER EVENT SESSION [Blocked Process Report]
ON SERVER
STATE = STOP;
GO
DROP EVENT SESSION [Blocked Process Report] ON SERVER
GO
/***********************************************************************
NOT SURE HOW MANY SERVER SIDE TRACES OR PROFILER TRACES YOU'RE RUNNING?
USE THIS TO CHECK, AND POSSIBLY STOP AND DELETE.
***********************************************************************/
/* Want to clean up a server side trace for the Blocked Process Report, or anything else? */
/* This will list all Server Side Traces (whether or not they have started) */
/* The default trace is usually trace id=1,
it will show as having no stop time and have a path like
D:\MSSQL\DATA\MSSQL13.MSSQLSERVER\MSSQL\Log\log_123.trc
*/
SELECT *
FROM sys.traces;
GO
/* To stop a trace, get the id from the query above */
/* Stop the trace by setting it to status = 0 */
EXEC sp_trace_setstatus @traceid = ? , @status = 0;
GO
/* Delete the trace by setting the status to 2 */
EXEC sp_trace_setstatus @traceid = ? , @status = 2;
GO

 

 

Should I change the ‘locks’ configuration in SQL Server?

I recently got a fantastic question from a reader regarding lock usage in SQL Server. Here’s the question:

One of my production databases has a total lock count around 25,000 (select count(*) from sys.dm_tran_locks). The configuration setting for locks is set to the default of zero. This lock count is due to multiple procedures which frequently run and use the same 2-3 tables, repeatedly taking out and releasing locks. Do I need to change the configuration for locks or look into the SP’s so they can finish more quickly, rather than creating locks?

Our friend is correct to leave the ‘locks’ setting in sp_configure alone

The default setting of zero lets SQL Server manage the memory for locks dynamically. When use this dynamic setting, SQL Server will:

  • Allocate memory for 2,500 locks on startup
  • Acquire more memory for locks when it needs to do so (unless you’re in a memory pressure situation and it would cause paging)
  • Not allocate more than 60% of the memory allocated to the instance for locks
  • Trigger lock escalation when lock memory hits 40% of the memory allocated to the instance

If you change the setting for ‘locks’, you’re giving SQL Server a maximum number of locks that it can use.

Microsoft recommends leaving this at zero.

You could change ‘locks’ to raise the number of locks allowed. But is it a good idea to use more than 60% of your memory for locks? Even if you’re using that as a temporary band-aid, leaving less than 40% of your memory for buffer pool (data cache), execution plans, and everything else is going to be a world of hurt for most instances..

You could change ‘locks’  to lower the number of locks allowed. But not allocating locks means that the queries asking for locks are going to throw errors and fail. That’s not attractive, either.

So if you’re concerned about the number of locks you have, changing the ‘locks’ configuration setting isn’t likely to help you out.

The ‘locks’ configuration is also marked as slated to be removed in a future version. Microsoft doesn’t want you to be dependent on it.

What’s the memory overhead of those locks?

locksEach lock uses 96 bytes of memory. On the instance in question, 25,000 locks  = 2,400,000 bytes.

That’s only 2.3 MB of memory devoted to locks. Even though 25K  sounds like a lot, the memory footprint for that is pretty darn small.

I checked back with our questioner, and their instance has 32GB of memory. That’s a pretty small amount in the grand scheme of things (as of SQL Server 2014, Standard Edition can use up to 128GB of memory for the Buffer Pool), but 2.3 MB isn’t anything to worry about, percentage wise.

Do you have a high number of locks because you need better indexes?

Good indexes can dramatically reduce your lock overhead. Here’s a simple example using the SQLIndexWorkbook sample database.

For this sample query, run under the default read committed isolation level:

SELECT COUNT(*) FROM agg.FirstNameByYear WHERE FirstNameId=1;
GO

When this needs to do a clustered index scan, it requires 5,061 page locks.

After creating a nonclustered index on FirstNameId, the query requires only one page lock.

Indexes that help SQL Server find rows more quickly can dramatically reduce the number of locks that are taken out.

Are you waiting for locks because of blocking?

SQL Server is fast at acquiring locks — unless there’s a conflict, and you have to wait to get the lock because someone else is already using it.

In this case, the first step is to figure out when there is blocking, and who is blocking whom. I like to use alerts and the Blocked Process Report to help figure this out.

Do you have a high number of locks because of estimate problems?

One reason you might get a high number of locks is an inefficient execution plan based on poor estimates. If SQL Server thinks it’s only going to get a small number of rows, it may design a plan based on “lookups”. If it turns out that it’s got a lot more rows than it thought, it might have to execute this loop over and over and over– slowly looping and acquiring lots of little locks.

In this case, the stored procedures using this database are making heavy use of table variables. Table variables often lead to incorrect estimates in complex procedures, and could result in inefficient plans.

In this case, I wasn’t too worried about the 25,000 locks, but I thought it was possible that the performance of the procedures might be able to be improved if they have better estimates. I recommended:

  1. Testing out the procedures in a dev environment with temporary tables instead of table variables
  2. Evaluating how the procedures use indexes after the change — they likely will need different indexes for the new execution plans

If you have heavy use of table variables and can’t test out temporary tables, you can test out Trace Flag 2453 on SQL Server 2012 SP2 and higher. This trace flag doesn’t give table variables the full statistics support which temporary tables have, but does try to make SQL Server smarter about the number of rows in the table variable.

Disclaimer: changing from table variables to temporary tables doesn’t always make things faster. I wasn’t doing live troubleshooting here and I didn’t have actual execution plans– it’s possible that the rowsets are small and the table variables were doing well. You never know until you test!

Sometimes you should just take out a tablock

I don’t think this is the case for the person asking this question, but there are some cases when you just want to go ahead and take out an exclusive lock on a table. Not only can it simplify the number of locks for the table, but it can help make data loading more efficient.

Decoding Key and Page WaitResource for Deadlocks and Blocking

troubleshooting-deadlocksIf you use SQL Server’s blocked process report or collect deadlock graphs, occasionally you’ll come across things that look like this:

waitresource=PAGE: 6:3:70133

waitresource=KEY: 6:72057594041991168 (ce52f92a058c)

Sometimes there’s more information in the massive monster of XML that you’re scanning through (deadlock graphs have a resource list that help reveal the object and index name), but sometimes there isn’t.

Here’s a reference on how to decode them.

All of this information is out there on the internet already in various places, it’s just spread out! I’m going to pull the whole thing together, from DBCC PAGE to hobt_id to the undocumented %%physloc%% and %%lockres%% functions.

First we’ll talk through PAGE lock waits, then we’ll hit the KEY lock waits.

1) waitresource=PAGE: 6:3:70133 ” = Database_Id : FileId : PageNumber

If your query was waiting on a page level lock, SQL Server gives you the page address.

Breaking “PAGE: 6:3:70133” down, we’ve got:

  • database_id=6
  • data_file_id = 3
  • page_number = 70133

1.1) Decode the database_id

Find the database name with this query:

SELECT 
    name 
FROM sys.databases 
WHERE database_id=6;
GO

That’s the WideWorldImporters sample database on my SQL Server instance.

1.2) Look up the data file name — if you’re interested

We’re going to use the data file id in the next step to find the name of the table. You can just move on. But if you’re curious about the name of the data file, you can look it up by using the database and plugging the data file id into this query:

USE WideWorldImporters;
GO
SELECT 
    name, 
    physical_name
FROM sys.database_files
WHERE file_id = 3;
GO

In WideWorldImporters, this is the data file named WWI_UserData, and I restored it to C:\MSSQL\DATA\WideWorldImporters_UserData.ndf. (Whoops, you caught me putting files on my system drive! Oh no! The embarrassment).

1.3) Get the name of the object from DBCC PAGE

We know this is page # 70133 in data file 3 in the WideWorldImporters database. We can look at that page with the undocumented DBCC PAGE and Trace Flag 3604.

Note: I prefer running DBCC page against a restored backup elsewhere, because it’s not supported. In some cases, running DBCC PAGE can cause stack dumps.

/* This trace flag makes DBCC PAGE output go to our Messages tab
instead of the SQL Server Error Log file */
DBCC TRACEON (3604);
GO
/* DBCC PAGE (DatabaseName, FileNumber, PageNumber, DumpStyle)*/
DBCC PAGE ('WideWorldImporters',3,70133,2);
GO

Scrolling down in the output, I can find the object_id and IndexId:

dbcc-page-waitresource

Whew, almost there!

I can now find the table and index name with this query:

USE WideWorldImporters;
GO
SELECT 
    sc.name as schema_name, 
    so.name as object_name, 
    si.name as index_name
FROM sys.objects as so 
JOIN sys.indexes as si on 
    so.object_id=si.object_id
JOIN sys.schemas AS sc on 
    so.schema_id=sc.schema_id
WHERE 
    so.object_id = 94623380
    and si.index_id = 1;
GO

And behold, this lock wait was on the PK_Sales_OrderLines index on the Sales.OrderLines table.

Note: In SQL Server 2014 and higher, you could also find the object name using the undocumented sys.dm_db_database_page_allocations dynamic management object. But you have to query all the  pages in the database, which seems not as awesome against large databases — so I listed the DBCC page method.

1.4) Can I see the data on the page that was locked?

Well, yes. But … do you really need to?

This is slow even on small tables. But it’s kinda fun, so… since you read this far… let’s talk about %%physloc%%!

%%physloc%% is an undocumented piece of magic that will return the physical record locator for every row. You can  use %%physloc%% with sys.fn_PhysLocFormatter in SQL Server 2008 and higher.

Now that we know that the page lock wait was on Sales.OrderLines, we can see all the data in that table on data file = 3 and page number = 70133 with this query:

Use WideWorldImporters;
GO
SELECT 
    sys.fn_PhysLocFormatter (%%physloc%%),
    *
FROM Sales.OrderLines (NOLOCK)
WHERE sys.fn_PhysLocFormatter (%%physloc%%) like '(3:70133%'
GO

Like I said, this is slow even on tiny tables. I’ve added NOLOCK to the query because while we want a glance at this info, we have no guarantee that it’s the way it was earlier when the blocking happened anyway– we’re guessing, so we may as well do dirty reads.

But woo hoo, it gives me a clean display of the 25 rows which the query was fighting for:

physloc-data-on-page-with-resourcewait

That’s enough detail on waitresource=PAGE. What if you were waiting on a KEY?

2) waitresource=KEY: 6:72057594041991168 (ce52f92a058c)” = Database_Id, HOBT_Id ( Magic hash that you can decode with %%lockres%% if you really want)

If your query was trying to lock a row in an an index and was blocked, you get a totally different style of address.

Breaking “6:72057594041991168 (ce52f92a058c)” down, we’ve got:

  • database_id = 6
  • hobt_id = 72057594041991168
  • magic hash value = (ce52f92a058c)

2.1) Decode the database_id

This works the exact same way it did for the page example above! Find the database name with this query:

SELECT 
    name 
FROM sys.databases 
WHERE database_id=6;
GO

In my case, that’s still the WideWorldImporters sample database.

2.2) Decode the hobt_id

We need to use that database, and then query sys.partitions, with some helper joins to figure out the table and index name…

USE WideWorldImporters;
GO
SELECT 
    sc.name as schema_name, 
    so.name as object_name, 
    si.name as index_name
FROM sys.partitions AS p
JOIN sys.objects as so on 
    p.object_id=so.object_id
JOIN sys.indexes as si on 
    p.index_id=si.index_id and 
    p.object_id=si.object_id
JOIN sys.schemas AS sc on 
    so.schema_id=sc.schema_id
WHERE hobt_id = 72057594041991168;
GO

That tells me that the query was waiting for a lock on Application.Countries, using the PK_Application_Countries index.

2.3) Now for some %%lockres%% magic – if you want to figure out which row was locked

If I really want to know exactly which row the lock needed, I can decode that by querying the table itself. We can use the undocumented %%lockres%% function to find the row equal to that magic hash value.

Note that this is going to scan the table, and on large tables that might not be so awesome all the time:

SELECT
    *
FROM Application.Countries (NOLOCK)
WHERE %%lockres%% = '(ce52f92a058c)';
GO

I added NOLOCK to this query (as Klaus Aschenbrenner suggested on Twitter) because locking can be an issue — and in this case, you’re looking to get a glance at the data as it is now, not as it was earlier when the transaction ran– so I don’t think data consistency is a big issue.

Voila, the row we were fighting for appears!

lockres-sqlserver-waitresource-key

Credits and more reading

I’m not sure who first documented many of these things, but here are two posts on some of the less documented nitty gritty that you may enjoy:

Setting up Free Blocking Alerts and Deadlock Monitoring (Dear SQL DBA Episode 17)

What tools in SQL Server will notify you about blocking and help track the queries behind your toughest blocking and deadlocking problems?

Watch the 21 minute video, subscribe to the podcast, or read the episode notes and links below.

Dear SQL DBA,

What is the best way to set up blocking and deadlock alerts on the server? I want to be notified automatically without any impact on the prod server.

I have tried alerts with SQL server performance condition alerts with SQL server agent. They do not show the queries or tables involved etc?

Thanks,

All Blocked Up


Woo hoo, I love this question!

So first off, I’m going to answer this discussing the free, built-in tools with SQL Server. If you have a budget for custom monitoring tools, you can buy fancy tools that have customized notifications for blocking and which capture the queries and plans involved. If that’s the case, set up free trials against a test system.

But not everyone has budget for every single SQL Server instance. So it’s extremely useful to know what SQL Server offers to help you with this.

And by the way, if you’re going to be at the SQLPASS Summit in Seattle in just under a month, I’m giving a session that shows different blocking scenarios. Come see my session, The Great Performance Robbery: Locking Problems and Solutions on Thursday, Oct 27th at 10:45AM in room 6C.

Free, simple blocking notifications

I like to set up blocking notifications with a simple SQL Server agent alert on the “SQLServer: General Statistic: Processes Blocked” performance counter.

This will not give you the queries involved in the blocking — don’t worry, we’ll cover tracking that in the next step.

This alert is low impact and it will let you know when you need to look at the SQL Server.

To get this alert to work, you’ll need to:

  • Configure the SQL Server Agent service to auto start
  • Set up database mail, enable it on the  SQL Server Agent, and then restart the SQL Server Agent
  • Configure an Operator in the SQL Server Agent
  • Create a new ‘performance’ style alert, base it on the “Processes Blocked” counter, and tell it who to notify

A few things to note:

  1. The SQL Agent doesn’t poll counters constantly – and we want this to be lightweight, so that’s a good thing. It will only poll every 15-30 seconds, and there’s no published guaranteed SLA on that polling frequency.
  2. If you really need something more sensitive and reliable, you need a monitoring system fully independent / outside of the SQL Server to be polling in and checking it for availability and performance.
  3. You can configure the alert to only fire every X minutes. I highly recommend that, so you don’t get an inbox of alerts every 20 seconds

Create some blocking in a test database or in tempdb and make sure the alert works.

I have example code to create blocking and deadlocks for your dev environments in my post, Deadlock Code for the WorldWideImporters Sample Database.

For production databases, you can create a temp table and write similar code to create blocking in those.

lookout-query-comin-through-wide

Finding the queries involved with the Blocked Process Report

OK, we’ve got notifications. We need SQL Server to give us more information on who is involved in the blocking.

I like to use the built-in Blocked Process Report for this. This has been in SQL Server for a long time, and it’s extremely useful.

The Blocked Process Report shows you the “input buffer” of the commands involved – it may be partial information and not the full text of the query. It will also show you the login name for who is running what, and the type of lock requests involved.

The Blocked Process Report is pretty lightweight, because SQL Server has to frequently wake up and look for blocking situations that can’t resolve themselves. By default, the deadlock monitor wakes up every 5 seconds and looks around to see if there is a deadlock which it needs to break. You may enable a feature called the ‘Blocked Process Report’ that tells SQL Server to additionally issue a report on any blocking which it finds.

To get this to work, you need to:

  • Enable the sp_configure option for the blocked process threshold. This defaults to ‘0’, which is off. You configure the threshold to the number of seconds you’d like the threshold to be. This should be a value of 5 or higher, because making the deadlock monitor run constantly could tank your performance. A good ‘starting’ value is 30 seconds.
  • You also need to set up a trace to collect an event called the ‘blocked process report’. Setting the threshold causes the event to be output, but SQL Server won’t collect it for you unless you start a SQL Trace or an Extended events trace that collects that event.

Once you have the trace file, you can copy it off of the production server to interpret it.

Michael J Swart has written a great free tool called the Blocked Process Report Viewer to help interpret the blocking chains. It’s free at https://sqlblockedprocesses.codeplex.com.

The viewer makes it easier to see the root of the blocking chain and who was blocking whom.

This trace is pretty lightweight, but with any trace you want to make sure that you don’t add a bunch of events that you don’t need, and that you periodically clean up the files and don’t let it impact drive space.

When I talk about running traces, I don’t mean running Profiler

We’re entering dangerous territory here. Whenever you talk about tracing in SQL Server these days, someone gets offended.

Here’s what you need to know. There’s two main ways to run a trace:

  1. SQL Trace. This is the old school option. You can run this using…
    1. The Profiler client (I don’t like this option)
    2. A Server Side trace scripted out from Profiler (much better!).  You can get up to speed on Server Side Traces reading this article on generating a service side trace by Jen McCown. (Note that she wrote this article back in 2009. That’s fine, SQLTrace hasn’t been changing since then.)
  2. Extended Events. This is much easier to use on SQL Server 2012 and higher than in previous versions because a GUI was introduced in Management Studio for it under Object Explorer.

I do not like leaving the Profiler application running because I’ve seen it do everything from slowing down performance to filling up drives over the years. And creating Server Side traces isn’t very hard if you do want to use SQL Trace.

I personally only like to have a trace running if I know I need it and am going to look at it. So I only enable this when I have a blocking problem. Whenever you choose to leave a trace running, you need to periodically check in on the files its created and clean up after it.

Detecting and handling deadlocks

What about locking problems where SQL Server has to step in and kill one of the queries?

You have a few built in options about how to get info on this. There are some trace flags that you can turn on which cause some information about who is involved in the deadlock to be printed to the SQL Server Error Log. This isn’t my preferred option because the information is very hard to parse through and read.

I find it more helpful to get a ‘deadlock graph’, which is a picture of how the locking fight went down.

On older versions of SQL Server, you can capture the deadlock graph with a server side trace.

On newer versions of SQL Server, you can capture this with an Extended Events trace.

A great resource for deciding how to capture deadlock information is Jonathan Kehayias’ excellent Simple Talk article, Handling Deadlocks in SQL Server.  He covers how to collect the graphs, shows examples of how they look, and gets you started tackling them.

If you get through this point and need to get really fancy with deadlocks, Michael J Swart recently wrote about using Event Notifications to collect execution plans related to deadlocks in his post, “Build Your Own Tools“. Just don’t try to run before you walk: this is pretty advanced stuff and you need to be comfortable using Service Broker, which is part of Event Notifications behind the scenes.

Updates, Oct 12, 2016:

  • Deadlock alerting: While it’s always better for the application itself to catch, handle, and log/alert on failed queries (not just deadlocks, but timeouts and anything else)… if your application can’t do that, and you can’t change it, check out this SQL Magazine article by Michael Campbell on alerting on deadlocks from SQL Server.
  • Also, thanks to Johan for pointing out that the free SQL Sentry Plan Explorer client tool is also able to display deadlock graphs. Just save them as an XDL file and you can open and view them.

Quick recap – getting more info on blocking

A fast rundown of the free, built-in tools we covered:

  • I like to use a simple, light, performance alert in the SQL Server agent for notification about blocking
  • I like the Blocked Process report to find out who’s blocking whom – collected by a server side SQL Trace or Extended events
  • I find collecting deadlock graphs with either a server side SQL Trace or Extended Events to be the most helpful way to figure out who’s involved in the nastiest blocking tangles.

Want to submit a Dear SQL DBA Question?

Want clarification on some of the basics? Got a question that jumped into your mind reading or listening to this? I’d love to hear it– asking is always free and easy!

Deadlock Code for the WideWorldImporters Sample Database

If you haven’t checked out Microsoft’s new WideWorldImporters sample database for 2016, it’s a pretty cool new little database. The database makes it easy to play around with new 2016 features, and it even ships with some cool little executables to run inserts in the “workload-drivers” folder.

I recently put together some code to reproduce a simple deadlock in WideWorldImporters. This isn’t related to the new features at all– I was just exploring the sample database and writing some demos for locking and blocking. Deadlocks are easier to understand when you have hands-on code that you can step through and watch it happen live, rather than just being mystified by it when it happens later!

Setup – Restore WideWorldImporters

All you need to do is download the WideWorldImporters-Full.bak backup (121MB) and restore it to a test instance with Developer Edition running.

Deadlock-Priority

Let’s Cause a Deadlock

You will use two session windows to create your deadlock.

In session window #1, run this code:

USE WideWorldImporters;
GO

/* Run the BEGIN tran and the first statement in this session.
We're taking out a lock on the Countries table */
WHILE @@TRANCOUNT > 1 ROLLBACK
BEGIN TRAN

    UPDATE Application.Countries
    SET LatestRecordedPopulation = LatestRecordedPopulation + 1
    WHERE IsoNumericCode = 840;

Leaving that there, open session window #2, and run this code:

SELECT CityName, StateProvinceName, sp.LatestRecordedPopulation, CountryName
FROM Application.Cities AS city
JOIN Application.StateProvinces AS sp on
    city.StateProvinceID = sp.StateProvinceID
JOIN Application.Countries AS ctry on 
    sp.CountryID=ctry.CountryID
WHERE sp.StateProvinceName = N'Virginia'

The code in session window #2 should be blocked — you’ll see it sitting there executing.

Back in session window #1, run this:

    UPDATE Application.StateProvinces
    SET LatestRecordedPopulation = LatestRecordedPopulation +1
    WHERE StateProvinceCode=N'VA'
COMMIT
GO

Session window #1 should tell you that 1 row was affected. It’s done!

Look back in session window #2, though. You should see the following message:

Msg 1205, Level 13, State 51, Line 1

Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 A “deadly embrace”

SQL Server’s deadlock manager woke up, looked around, and saw that our two session windows were stuck. They each were requesting locks that the other session wouldn’t give up– and if the deadlock manager didn’t break the deadlock, they’d be stuck there forever.

I didn’t set the deadlock priority on any of my transactions, so the deadlock manager picked the session that it thought would be the least work to roll back– and it became the victim.

How to learn more about deadlocks

This repro code will get you started. It’s rerunnable: you can keep deadlocking and deadlocking those sessions.

To teach yourself more about how the locks work:

  • Run sp_WhoIsActive with @get_locks=1 from a 3rd session window at different points and observe lock grants and waits between the session
  • Set up an extended events session to pick up the deadlock graph, reproduce the deadlock, and teach yourself to decode the graph. (Erin Stellato has a free Stairway to Extended Events if you’re just getting started with those.
  • What are your options for fixing this deadlock? Test out optimistic locking, query rewrites, setting deadlock priority, and creating indexes to see what works (I’m not saying that all of them do in this case)
  • What is Countries_Archive, and why does it show up in the lock graphs?

Happy learning!

The Case of the Blocking Merge Statement (LCK_M_RS_U locks)

merge-troubles-rectangle

Recently I got a fun question about an “upsert” pattern as a “Dear SQL DBA” question. The question is about TSQL, so it lent itself to being answered in a blog post where I can show repro code and screenshots.

Here’s the Scenario in the Anonymized Question

We have a lookup table, which is the parent table in a foreign key relationship with a child table. The child table has lots more rows. When data comes in, we need to:

  • Check the Parent Table (the lookup table) to see if the ParentValue is already present.
  • If ParentValue is not present, insert a row for the new ParentValue into the Parent Table. This happens very rarely, but we need to check for it.
  • Then insert a row into the Child Table, using the ParentId.

The Problem: Blocking against the Parent Table

When run under a few sessions, locking and blocking issues were creeping up fast. A merge statement was being used to check for the values in the Parent Table and insert them when they weren’t present.

Let’s Look At the Code

This creates a database named upsert_test, then creates ParentTable and ChildTable objects in the database. Finally, a Foreign Key is created on ChildTable, referencing the ParentTable.

IF DB_ID('upsert_test') is not null
BEGIN
    USE MASTER;
    ALTER DATABASE upsert_test SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE upsert_test;
END
GO

CREATE DATABASE upsert_test;
GO

use upsert_test;
GO

CREATE TABLE dbo.ParentTable
(
    ParentId int IDENTITY(1,1) NOT NULL,
    ParentValue varchar(50) NOT NULL,
    CONSTRAINT pk_ParentTable PRIMARY KEY CLUSTERED (ParentId ASC)
);
GO

CREATE TABLE dbo.ChildTable
(
    ChildId INT IDENTITY(1,1) NOT NULL,
    ParentId INT NOT NULL,
    ChildValue VARCHAR(50) NOT NULL,
    CreatedDate DATE NOT NULL CONSTRAINT DF_Work_created DEFAULT (getdate()),
    CONSTRAINT PK_Work PRIMARY KEY CLUSTERED (ChildId ASC)
);
GO

ALTER TABLE ChildTable ADD CONSTRAINT FK_Work_Source FOREIGN KEY (ParentId) REFERENCES ParentTable (ParentId);
GO

Here’s the Upsert (aka Merge Query)

A stored procedure is used to handle incoming values.  It uses MERGE to look for matching rows in ParentTable, and insert when not matched.

IF OBJECT_ID('dbo.DoStuff') IS NULL
    EXEC ('CREATE PROCEDURE dbo.DoStuff as RETURN 0;');
GO

ALTER PROCEDURE dbo.DoStuff (
    @ParentValue varchar(50),
    @ChildValue varchar(50)
)
AS

    MERGE ParentTable with (HOLDLOCK) as p
    USING (SELECT @ParentValue NewParentValue) as new
        ON p.ParentValue = new.NewParentValue
    WHEN NOT MATCHED THEN
    INSERT (ParentValue) VALUES (new.NewParentValue);

    INSERT INTO ChildTable (ParentId, ChildValue)
    SELECT p.ParentId, @ChildValue
    FROM ParentTable p
    WHERE [email protected];
GO

Why is that HOLDLOCK Hint in the Merge Query?

My reader quite rightly used this hint in their merge query. Although MERGE looks like a single query, it’s actually just “syntactic sugar”. Behind the scenes, merge can be implemented as a select and an insert in two separate commands. Developers are advised to use HOLDLOCK to avoid race conditions with MERGE.

I asked one clarifying question — was the lock wait type they were seeing “LCK_M_RS_U”?

It was.

This confirmed that HOLDLOCK and merge were slowing them down instead of helping them.

Let’s Populate Some Rows for Testing and Reproduce the Blocking

exec dbo.DoStuff @ParentValue='Stuff', @ChildValue='Things';
GO
exec dbo.DoStuff @ParentValue='MoreStuff', @ChildValue='MoreThings';
GO
exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings';
GO

exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings x 2';
GO
exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings x 3';
GO

/* Create 5000 more ParentValues */
SET NOCOUNT ON;
DECLARE @namevarchar varchar(50), @i int=1
BEGIN TRAN
    WHILE @i <= 5000
    BEGIN
        SET @namevarchar= cast(RAND() AS VARCHAR(50));
        EXEC dbo.DoStuff @[email protected], @ChildValue='Whatever';
        SET @[email protected]+1;
    END
COMMIT
GO 

To see the blocking issue, just run the following code in three session windows at the same time. Note that we’re running this over and over with the same ParentValue, and the ParentValue of “Stuff” is already in the table. This will not have to insert any rows into ParentTable.

SET NOCOUNT ON;
exec dbo.DoStuff @ParentValue='Stuff', @ChildValue='Things';
GO 1000000

Here’s what the blocking looks like in Adam Machanic’s sp_WhoIsActive:

Range Lock Waits

HOLDLOCK = Serializable Isolation Level = Key Range Locks

The holdlock hint is a way to get serializable isolation level in SQL Server for a specific table, without having to change the isolation level for your entire session. Serializable is the highest isolation level in SQL Server using pessimistic locking.

When you “HOLDLOCK”, you tell SQL Server to protect any rows you read with a range lock– just in case someone comes along and tries to change one or sneak  one in.

That means that even when you’re just reading ParentTable and not inserting a row, you’re taking out a key range lock. You’re willing to fight other users over those rows to protect your statement.

There’s two parts to getting around the blocking and making this faster…

Index The Parent Table (Solution Part 1)

Currently, the only index on ParentTable is on ParentId.

Even if ParentTable is tiny, if we’re frequently accessing the table and looking up a ParentValue, we’ll benefit from creating a nonclustered index on that column. We should also allow only unique values into ParentValue for data integrity reasons. A unique nonclustered index is exactly what we need:

CREATE UNIQUE NONCLUSTERED INDEX ix_ParentTable_ParentValue on dbo.ParentTable(ParentValue)
GO

In my simple three session test, this makes the merge statement very efficient, and performance goes way up. You can no longer catch those LCK_M_RS_U waits in sp_WhoIsActive. However, I’m still concerned about them, and would still…

Ditch the Merge (Solution Part 2)

The “merge” command in SQL Server is often a let-down for folks. The syntax is confusing, most folks find out about the race conditions/concurrency issues the hard way, and the biggest problem is that it often seems “better” than other TSQL options because it was introduced as an enhancement in SQL Server 2008… but it isn’t always the better choice.

In this case, ditching the merge gives me more granular control of when I want to use that high level lock on ParentTable. The code is longer mostly because of a lot of comments.

ALTER PROCEDURE dbo.DoStuff (
    @ParentValue varchar(50),
    @ChildValue varchar(50)
)
AS
    DECLARE @ParentId INT;

    /* ParentId is very rarely new, so check for it first with only a shared lock */
    SELECT @ParentId=ParentId
    FROM dbo.ParentTable
    WHERE [email protected]

    /* Insert the new value if we have to. */
    /* Use the SELECT WITH UPDLOCK in case of race conditions */
    /* Get the new ParentId so we don't have to rejoin back to the table */
    IF @ParentId IS NULL
    BEGIN
        DECLARE @OutputVal TABLE (ParentId INT)

        INSERT dbo.ParentTable (ParentValue) 
        OUTPUT inserted.ParentId INTO @OutputVal(ParentId)
        SELECT x.newval
        FROM (SELECT @ParentValue as newval) as x
        LEFT JOIN dbo.ParentTable as p WITH (UPDLOCK, HOLDLOCK) on 
            x.newval=p.ParentValue
        WHERE p.ParentValue IS NULL;

        /* We are only ever going to have one row in @OutputVal */
        SELECT @ParentId=ParentId
        FROM @OutputVal;

    END

    INSERT INTO dbo.ChildTable (ParentId, ChildValue)
    SELECT @ParentId, @ChildValue;
GO

In our scenario, it’s rare for new ParentValues to come in. So I’ve used a pattern to try to use as many shared locks against ParentTable as possible, stick with the Read Committed Isolation level, and still protect against race conditions. The pattern is essentially this:

  1. Check if ParentValue already exists (shared read locks only)
  2. If this is the rare case that a ParentValue does not exist…
    • Insert the row into ParentTable
    • Protect against race conditions by inserting with a SELECT joining to ParentTable with UPDLOCK, HOLDLOCK in case the same new row happens to come in on two sessions at the same time
    • Use the OUTPUT clause to get the new @ParentId so we don’t have to join to ParentTable again in the next step
  3. Insert into ChildTable

Rough Comparison: Do These Changes Help?

I didn’t do thorough load testing on this. I just ran the call to dbo.DoStuff above in three sessions in a 4 core VM on my MacBook and looked at the BatchRequests/sec performance counter in SQL Server. Here’s what I saw:

Setup Batch Requests/sec with No Nonclustered Index on dbo.Parent Batch Requests/sec with Nonclustered Index on dbo.Parent
Merge 313 avg  4000+
Insert with Left Join 495 avg  4000+

In this test case, adding the nonclustered index makes a bigger difference than changing the TSQL. But I would still move away from merge, because I want to be able to control when anything tougher than a read lock is being taken out against ParentTable — that’s very attractive since new values come in rarely in this case. The more concurrent sessions that are running this, the more that will help.

Don’t Forget to Handle Errors

Error handling is important! The code in this post doesn’t have it for simplicity reasons. If you need a starter guide for error handling, check out Erland Sommarskog’s excellent whitepaper.

Further Tuning Thoughts

This code can be tuned further, but I’d want to set up a really clean load test using application servers (not SSMS) against a proper instance (maybe NOT on a MacBook). I would look at:

  • Whether validating ParentValue could be done in memory in the client application tier. Avoiding constant data access against dbo.ParentTable is attractive if that’s possible.
  • Wait statistics during execution to point to the next thing to tune.

What Do You Think of Merge?

Do you love or hate merge statements in SQL Server?