What is a query_hash or query_plan_hash and why is it useful?

If you have ever queried sys.dm_exec_requests or sys.dm_exec_query_stats you may have noticed the query_hash and query_plan_hash columns.  If you’re familiar with a sql_handle and plan_handle the query_hash and query_plan_hash should be easy to understand.  A query_hash is a computed value that points to a query irrespective of literal values.  Similarly a query_plan_hash is a computed value that points to an execution plan irrespective of the literal query text it is associated with – it only relies on the operations and order of them within the plan.

Why is a query_hash or query_plan_hash useful?

There’s a lot of scenarios that make query_hashes and query_plan_hashes useful.

If you are looking for non-parameterized SQL cluttering your plan cache they can be really handy.  Take a look at one of my past posts where I show how to make use of them to force parameterization on non-parameterized SQL with template plan guides.

I’ve seen a customer use a query_plan_hash to raise alerts when certain plans that are known to be problematic show up in the plan cache.  This was helpful for a scenario where plan guides could not be used (due to the semi-parameterized nature of the query).

On occasion I may make notes on queries I’ve spent time tuning and want to go back and check on them.  A query_hash or query_plan_hash is a good bookmark for going back to look for the queries especially if there is some dynamic/non-parameterized component to them.

Query_hashes and query_plan_hashes are useful for performance monitoring.  When taking literal value  differences out of the picture it makes it easy to see which queries are impacting your system the most.  SQLGrease (shameless plug) monitors queries and execution plans based on query_hashes and query_plan_hashes.  This helps surface issues with non-parameterized SQL that otherwise would not bubble to the top.  This also helps keep the amount of data monitored fairly compact when a system doesn’t use parameterized SQL or stored procedures.

Here is an example of a query used to find the top 5 queries using total elapsed time (we don’t monitor this way in SQLGrease – this is just an oversimplified example):

SELECT TOP 5 total_elapsed_time AS [Total Elapsed Time],
SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1) AS statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY total_elapsed_time DESC;

We can rewrite this to use query_hash to take into account performance of non-parameterized queries.

SELECT TOP 5 SUM(total_elapsed_time) total_elapsed_time,
query_hash INTO #top_query_hashes FROM sys.dm_exec_query_stats AS qs
GROUP BY qs.query_hash ORDER BY SUM(total_elapsed_time) DESC
-- now get the first query from each of these
SELECT tqh.total_elapsed_time, tqh.query_hash,
(select top 1
SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1)
from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(qs.sql_handle) st
where qs.query_hash = tqh.query_hash) as statement_text
FROM #top_query_hashes tqh
ORDER by tqh.total_elapsed_time desc
drop table #top_query_hashes

You can see that I changed the query to run in two steps.  One that populates a temp table with the top queries and a second that gets the query text based on the hashes (I’ll admit this is not the most efficient approach).  If we compare the output of the two queries, we see different results for our top 5 queries by elapsed time.  The output of these two queries is below:

The top set of results if where we don’t group by query_hash.  In the bottom set of results we grouped by query_hash.  Notice the 4th result in the bottom set doesn’t appear in the top set?  Also make note that the query has non-parameterized SQL.

How is a query_hash different from a sql_handle?

A sql_handle points to a batch or stored procedure submitted to SQL as a whole.  The query_hash points to a single statement within the batch or stored procedure.  I think the best analogy is a query_hash points to a sql_handle, statement_start_offset, statement_end_offset.

Here’s an example query that helps illustrate this for a given sql_handle:

SELECT SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2) + 1) AS statement_text,
qs.sql_handle, qs.statement_start_offset, qs.statement_end_offset, qs.query_hash
FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(qs.sql_handle) st
where qs.sql_handle = 0x030026004E5D3F0983742D0001A9000001000000000000000000000000000000000000000000000000000000

Here is the output of the query above (the sql_handle I’m testing with is particular to a system I’m using for this demo, so this won’t return anything meaningful for you unless you replace the sql_handle with something from your system).

Notice how the sql_handle is the same across the statement_text of each query.  The statement_start_offset, statement_end_offset varies based on the location of the statement_text within the batch or stored procedure.  Each statement_text has a separate query_hash associated with it.  If you look at what the sql text looks like unparsed you’ll see the following:

select text from sys.dm_exec_sql_text(0x030026004E5D3F0983742D0001A9000001000000000000000000000000000000000000000000000000000000)


You can see that the output of this query was the full DDL used to create the stored procedure containing each of the individual statements listed above.

Of course the main differentiation is still that a query_hash is a generalized view of a part of the statement pointed to by a sql_handle; however, it’s important to note this relationship.

How is a plan_hash different from a plan_handle?

The query_plan_hash points to a particular query within an execution plan represented by a plan_handle.  This can best be described by calling sys.dm_exec_text_query_plan with a statement_start_offset and statement_end_offset vs retrieving it with sys.dm_exec_query_plan.  I wrote a post detailing why you’re better off calling sys.dm_exec_text_query_plan rather an sys.dm_exec_query_plan.

Where do I see a query_hash and query_plan_hash?

As mentioned earlier they will show up in sys.dm_exec_query_stats and sys.dm_exec_requests.  Additionally you can find them in:

  • Execution plan XML starting in SQL 2012 (Open the plan as xml and look for the StmtSimple element in the XML.  It will be in the attributes: QueryHash and QueryPlanHash. 
  • Extended events starting in SQL 2012 (not available in 2008 or 2008 R2) – I should note they are stored as a 64 bit Integer so it takes some deciphering to make them usable.
  • They don’t exist in SQL 2005

That last bullet was to point out that this article is probably useless if you are still on SQL 2005 or prior.  I should have probably mentioned that in the first paragraph and not wasted your time this long.

Does everything in the plan_cache get a query_hash and query_plan_hash?

No.  Statements that do not need to be optimized don’t seem to get a query_hash or query_plan_hash.  Things like FETCH statements, DDL, BACKUP, RESTORE, SET, etc don’t get a query_hash or query_plan_hash.  In these cases you will see a value of 0x0000000000000000 for the query_hash and query_plan_hash.  I’ve not been able to find any solid documentation in regards to how a query_hash or query_plan_hash get generated; however, I suspect (or at least am pretty sure) it gets generated as part of the query optimization process.

Will the same query generate the same query_hash and query_plan_hash across different servers?

So long as they are running the same version of SQL yes.  The same query should also generate the same query_hash and query_plan_hash across different databases (given the plans match).  I’ve noticed that when upgrading SQL I’ve seen different query hashes appear for the same queries run in both versions.  I’m not sure if this occurs between every version of SQL, but I can say don’t rely on the query_hash being consistent.  I suspect the compatibility level plays a factor in this, but I’ve not had enough free time to experiment with this.

Some query_hash and query_plan_hash oddities I’ve seen

Something odd I’ve noticed is when a query uses plan guides, in some but not all cases the query_hash and query_plan_hash magically seem to be the same value.  Try querying your plan cache for this if you have plan guides.  I’m not sure if only certain types of plan guides cause this (In the example here this was a XML plan guide).  If you’ve seen this please share in the comments!

select * from sys.dm_exec_query_stats
where query_hash = query_plan_hash AND query_hash != 0x0000000000000000

Another oddity I see on occasion in sys.dm_exec_requests is the query_hash and query_plan_hash will show up as 0x0000000000000000 despite them being statements that get optimized (i.e. they are not things like backups or DDL).  I suspect this is a factor of catching the executing request at an early stage in the optimization process and in turn returns empty values.

Back in the original incarnation of Azure SQL Database sys.dm_exec_requests always returned 0x0000000000000000 and 0x0000000000000000 for query_hash and query_plan_hash.  All that is long gone now, but I think it was interesting that it was exposed in sys.dm_exec_query_stats but hidden in sys.dm_exec_requests.

Something that is a little odd but I suppose not totally unexpected is these two queries generate the same query_hash.  This is more evidence that the hash gets generated as part of the query compilation process.

select * from CustomerTest WHERE AccountNumber = N'AW00014588' or AccountNumber = N'AW00014579'

select * from CustomerTest WHERE AccountNumber in (N'AW00014523', N'AW00014583')

Conclusion

I find a lot of people aren’t aware of this feature in SQL and I alway find it unfortunate given it is a great way to track and aggregate query performance.    Happy tuning.

 

Watch us diagnose a query timeout with SQLGrease (notice we use query hashes!):

See how you can get query level wait events without Query Store: