Sitecore is a .Net based content management system. It uses SQL Server for its backend database. I recently got asked to help investigate a performance issue on a new implementation that was being performance tested. In this post I’ll describe the behavior causing the performance issue, and how through proper configuration this could be avoided.
Finding the problem
The first thing that was identified was a large amount of locking. The blocked queries were performing inserts to the EventQueue table. The query text of it is below.
INSERT INTO [EventQueue](
[Id], [EventType], [InstanceType], [InstanceData], [InstanceName], [RaiseLocally], [RaiseGlobally], [UserName]
)
VALUES(
@id, @eventType, @instanceType, @instanceData, @instanceName, @raiseLocally, @raiseGlobally, @userName
You can identify this by looking in sys.dm_exec_requests and joining to sys.dm_exec_sql_text or by using whatever is your monitoring tool of choice.
SELECT er.start_time,
er.total_elapsed_time, er.status,
er.cpu_time, er.wait_type,
er.wait_time, er.wait_resource,
er.session_id, er.blocking_session_id,
SUBSTRING(st.text, (er.statement_start_offset/2)+1,
((CASE er.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE er.statement_end_offset
END - er.statement_start_offset)/2) + 1) AS statement_text,
convert(XML, qp.query_plan) as execution_plan
FROM sys.dm_exec_requests er
INNER JOIN sys.dm_exec_sessions s on er.session_id = s.session_id
cross apply sys.dm_exec_sql_text(er.sql_handle) st
cross apply sys.dm_exec_text_query_plan(er.plan_handle,
er.statement_start_offset,
er.statement_end_offset) qp
WHERE er.session_id != @@spid AND s.is_user_process = 1
order by 1
After reviewing the output of the query it appeared the root blocker(s) were performing a select similar to the following.
SELECT [EventType], [InstanceType], [InstanceData], [InstanceName], [UserName], [Stamp], [Created]
FROM [EventQueue]
WHERE [Stamp] >= CAST (@fromTimestamp AS TIMESTAMP)
AND ([EventType] = @eventType0
OR [EventType] = @eventType1
OR [EventType] = @eventType2
OR [EventType] = @eventType3
OR [EventType] = @eventType4
OR [EventType] = @eventType5
OR [EventType] = @eventType6)
ORDER BY [Stamp]
The select above was taking on average 8 seconds to complete. During that period it queued hundreds of sessions in a blocked state that were performing inserts to EventQueue.
Fixing the Problem
The root cause was the long running SELECT statement causing blocking. The execution plans were examined and it didn’t appear there was a way to improve the plan via hints, indexing, index rebuilds etc. I noticed the EventQueue had over 4 million rows in it. It seemed odd that a “Queue” table would be so large. After doing some googling on Sitecore EventQueue, articles popped up regarding data retention on the EventQueue. One in particular I looked at was Brian Pedersen’s blog post on this. Apparently if you do not configure data retention properly for the EventQueue, it will continue to grow up to a point you will likely see the behavior I’m describing. The recommendation was for the table to have about 7000 rows.
Conclusion
I was surprised to see Sitecore have performance problems given there are some pretty large implementations of it. Fortunately this was a simple configuration change that resolved the issue. If you have a Sitecore implementation it wouldn’t hurt to get a row count on your EventQueue table just to validate you’re not headed for problems – just be sure to use a NOLOCK hint when you do it. Finding this problem was not very difficult since we were able to reproduce the issue as part of a scripted load test. Finding this in a production environment with no performance monitoring tools in place would have been much more difficult – this is why there is value in monitoring your SQL Server instances.
How can SQLGrease help you get to the root cause of your performance problems? Watch our demo video.
Watch us diagnose a query timeout with SQLGrease:
See how you can get query level wait events without Query Store: