How To Fix Your Deadlocks

A common problem I see in systems is deadlocks.  In a lot of cases the DBA isn’t totally aware of them because applications use retry logic for deadlocked transactions.  I consider this to be a hack for a real problem that can be addressed in most cases.

In this post I will explain what a deadlock is.  I will also show the three most common scenarios that cause them.  I’ll also explain how you can fix the most commonly occurring ones.

I’m not going to go into every single detail of reading the deadlock graph; however, I am going to show you what you need to look for to quickly get to a fix.

What is a deadlock?

A deadlock is most easily described as what occurs when two sessions request a resource that the other session has, but each session is already holding a resource that the other session has a lock on.

There are several different scenarios I see that cause deadlocks:

  1. Two sessions are attempting to lock the same rows but in different orders.
  2. Two or more sessions are performing full scans and locking pages.
  3. Overly restrictive transaction isolation levels.
  4. Schema locks

Two sessions attempting to lock same rows in different order

This is pretty much the textbook explanation of a deadlock.  In this example I will show how two records (AccountNumber AW00012597 and AW00012596) are updated by two different sessions which results in a deadlock.

SESSION 60:
begin transaction
update CustomerTest
set ModifiedDate = getdate()
where accountnumber = 'AW00012597'

The above statement completes without issue.

SESSION 63:
begin transaction
update CustomerTest
set ModifiedDate = getdate()
where accountnumber = 'AW00012596'

The above statement also completes without issue.

SESSION 63:
update CustomerTest
set ModifiedDate = getdate()
where accountnumber = 'AW00012597'

At this point SESSION 63 will hang because it is waiting on SESSION 60 to commit or rollback the transaction on AccountNumber AW00012597.

SESSION 60:
update CustomerTest
set ModifiedDate = getdate()
where accountnumber = 'AW00012596'

At this point SESSION 63 will receive a deadlock error.  The reason this occurs is SQL Server detected two sessions were competing for the same locks which would cause the two sessions to hang indefinitely.  To prevent an indefinite hang SQL Server will rollback one of the sessions when it detects this scenario.

How often does this scenario occur?  Not very often in my experience.  Since this scenario isn’t that common I’ll skip details on fixing this and move on to the next scenario where I will provide more details.

Two or more sessions are performing full scans and locking pages

I see this scenario occur quite frequently.  If you’re new to resolving deadlocks, this can seem a bit difficult to grasp because typically when this occurs the two processes performing updates are not even updating the same rows.

To start with, I’m going to drop an index on the table we were using in our previous example.  The index was on the CustomerTest.AccountNumber column.

In this example we will perform a similar update; however, we will be updating 3 different rows: AW00012594, AW00012595, AW00012598.

SESSION 57:
begin transaction
update CustomerTest
set ModifiedDate = getdate()
where accountnumber = 'AW00012594'

SESSION 62:
begin transaction
update CustomerTest
set ModifiedDate = getdate()
where accountnumber = 'AW00012598'

Session 62 now hangs.  We’ll look at the output of sys.dm_exec_requests to see why this session is hanging.

select session_id, blocking_session_id, wait_type, wait_resource from sys.dm_exec_requests where session_id = 62

Session 57 is blocking session 62 despite the two sessions updating different rows.  Also note it is waiting on a page lock.

Let’s take a look at the execution plan this query is taking.  We can do this by getting the plan handle from sys.dm_exec_requests.

select CONVERT(XML, b.query_plan) from sys.dm_exec_requests a cross apply sys.dm_exec_text_query_plan(a.plan_handle, a.statement_start_offset, a.statement_end_offset) b where a.session_id = 62

Click on the XML to bring up the plan:

As you can see the CustomerTest clustered index is being scanned.  This is the result of us dropping the index.

We still haven’t generated the deadlock. Let’s perform one more update on session 57:
update CustomerTest
set ModifiedDate = getdate()
where accountnumber = 'AW00012595'

Session 57 updated successfully, but this occurred after session 62 received a deadlock error and as a result had its transaction rolled back.

Since a deadlock occurred, we can try collecting it from the system health event session.  The system health event session is an Extended Events session that is installed by default starting with SQL Server 2008 on.  Unfortunately, there is a bug in the SQL Server 2008 and SQL Server 2008 R2 ring buffer that makes these inaccurate sources of data for deadlock traces.  If you are on these versions I’d suggest enabling Trace flag 1222 in order to write the deadlock traces to the sql error log – or you can create an XE file target for the system health event session that mimics the one that comes with SQL Server 2012 (I’ll create a future post on how to do this).

For SQL Server 2012 and later, I would suggest finding the deadlock in the system health session by using the file target (as opposed to the ring buffer). This query won’t work in SQL Server 2008 or 2008 R2.

SELECT
object_name, CONVERT(XML, event_data), file_name, file_offset
FROM sys.fn_xe_file_target_read_file('*.xel', NULL, NULL, NULL)
where object_name = 'xml_deadlock_report'

There is a lag in when the system health session flushes to file, as a result the deadlock might not appear immediately.

From looking at the XML in the deadlock trace there are a couple areas we want to focus on:  

First make note, there are three processes (i.e. sessions) in this deadlock trace. Each process is contained within a <process> element in the trace.  Also make note that all of these are contending for page locks.  This is a clue that the lock issue is possibly plan related.  With this in mind we should probably get the execution plan for each of the statements involved.

Earlier I showed you how to get the execution plan from sys.dm_exec_requests. Unfortunately sys.dm_exec_requests only contains data for requests that are currently executing.  We can’t go back in time to see what the plan was at the time the deadlock occurred by using sys.dm_exec_requests.  As a result you have two options (well really three).

  1. Use SSMS to get an estimated or actual plan.
  2. Find the plan in the plan cache.
  3. Use a good APM tool that historically tracks execution plans collected from the plan cache (preferably SQLGrease).

Getting an estimated or actual execution plan from SSMS just requires that you have the query and can set up data appropriately to capture the plan.  The drawback to this is it might not be accurate for a variety of reasons – I’ll need to save that for a future post.

Getting the plan from the plan cache is the approach I prefer. The problem with this is the plan cache is relatively transient. There is a chance the plan that was used at the time of the deadlock is no longer present, there is also a chance the plan that is currently in the cache is not the same as when the deadlock occurred.

If we want to pull the statements from the plan cache, we will need the sql_handle and statement_start_offset.  These can be found within the <process><executionStack><frame> element of the deadlock trace.  You’ll notice there are multiple frame elements in the executionStack.  Use the sqlhandle and stmtstart that appear in the top frame of each executionStack:

Two processes were executing the same (update) statement so they both have the same stmtstart and sqlhandle.  There was a third process that wasn’t included in my example but it got tangled up in our deadlock – this process was executing a different statement (a select).

Now that we have the sqlhandle and stmtstart values we can try pulling the plans from the plan cache:

select convert(xml, query_plan), eqs.creation_time from sys.dm_exec_query_stats eqs
cross apply sys.dm_exec_text_query_plan(eqs.plan_handle, eqs.statement_start_offset, eqs.statement_end_offset) eqp
where sql_handle = 0x020000008104db03891ae1aac3d2458e779a35979d84bdb40000000000000000000000000000000000000000
and statement_start_offset = 36

Run the above statement for each sqlhandle and stmtstartoffset we found in the deadlock trace. This will return the execution plans.

The creation_time column indicates when this plan was created in the cache.  Compare this time to the timestamp in the xml_deadlock_report, if the creation_time is prior to the timestamp of the xml_deadlock_report you can be sure this is the plan that was being used at the time the deadlock occurred.    If it’s not, but a plan was still returned it is likely a better approximation of the plan than running the query from SSMS (I’ll have to do a future post on why I hold this opinion).

Looking at the plan for the update statement we see a clustered index scan on CustomerTest:

Looking at the select statement we also see a clustered index scan on CustomerTest; however, there is an index suggestion embedded in the plan.

How do we fix this?  Fix the scan.  In this case the the index suggestion fixes the problem.  After creating the index, we are able to run through our test scenario without issue.

Don’t laugh at me for being lazy and not changing the name of the index.  Well, okay laugh at me.

What are some other scenarios that might lead to a scan?

  1. Implicit Conversions (read my post for this)
  2. Unindexed Foreign Keys (ready my post for this)
  3. Stale or missing statistics
  4. Parameter sniffing

You really should read my post on unindexed foreign keys if you have deadlocks that involve delete statements.

Overly restrictive transaction isolation levels

The third common reason I see for deadlocks is using more restrictive transaction isolation levels than is necessary.  A frequent example of this is applications using Serializable transactions unnecessarily.  This is usually fairly easy to spot when you inspect a deadlock trace.

Open the deadlock trace and look for the <process> element.  Within the process element you should see an isolationlevel attribute:

Anything more than read committed (2) should be questioned.  In particular Serializable transactions should be questioned.  If you  haven’t read my other blog post, Serializable transactions are the default behavior when managing transactions with .Net TransactionScope objects. You can read more about it here.

On occasion I see deadlocks occur when a SELECT statement collides with UPDATE statements despite the UPDATE statements performing index seeks. Most of the time I end up finding these are queries that are reporting related and touch large amounts of data.  Whether or not a separate database for reporting purposes is a different issue to be discussed; however, a fix for this is to use NOLOCK hints in the SELECT queries if you can live with data that isn’t yet committed.

Schema locks

If you are making index changes to tables, you increase the likelihood of getting deadlocks.  Since schema changes don’t typically occur as regular everyday processing I won’t spend too much time addressing these.

Deadlock oddities

A session can deadlock on itself.  Strange right?  I wish I still had an example of this since it is very rare; however, I’ve seen a poorly written query take a parallel execution plan which resulted in the query getting a deadlock on itself.

SQLGrease and deadlocks

How can SQLGrease help you with fixing your deadlocks?  It collects all the historical information you will need to fix your deadlocks, as well as perform a lot of the tedious work of sorting through deadlock graphs.  Watch our demo to see more.

 

Watch us diagnose a query timeout with SQLGrease:

See how you can get query level wait events without Query Store:

Leave a Reply

Your email address will not be published. Required fields are marked *