Tuning Substring Query Without Changing the Query

 Introduction

The blog shows how to tune a query using substring in the WHERE clause. The tuning solution is to use indexing on a computed column.

Performance Issue

A General Ledger financial report execution in Financial system takes a long time finishing more than 4 hours.

Performance Analysis

The first step in the performance tuning is to identify a long running query. Sometimes, it can be multiple queries or a single query. In general, the best practice is always starting with the top offending query to tune to get the bang for the buck. From the performance analysis, the culprit is the query listed below. The reason is the substring query in the WHERE clause takes a long time to run because the query execution plan uses Index Scan rather than Index Seek. To make it worse, the report calls the query several thousand times. For example, if the query execution time is 10 seconds, when running for 1000 times, it will be 10 x1000 = 10,000 seconds.

The reason substring function is used because the column ACCOUNTDISPLYVALUE has multiple values combined in one column. In this case, the query is searching for the Account Number which is the first six characters. The long running query is listed below:

set statistics io on

go

select sum(ACCOUNTINGCURRENCYAMOUNT)

from [d365].[GeneralJournalAccountMultiCompanyEntries]

where substring([ACCOUNTDISPLAYVALUE], 1, 6)  = ‘877601’

To get the baseline, run the query to get the I/O statistics shown below.

Table ‘GeneralJournalAccountMultiCompanyEntries’. Scan count 3, logical reads 133296, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

As suspected, the execution plan uses Index Scan rather than Index Seek. The Index Scan has to read 5.5 million rows. It is definitely NOT an optimized plan.

 

 

Tuning Process

To goal to tune the query is to have an index that can be used and show the execution plan using Index Seek.

To improve the performance, there are several possible solutions.

Solution #1 – Persisted Split Column During Data Loading

This approach is to add another column with the value of the substring and persist during the data loading. After that, add an index to the column. The downside of this approach requires several efforts:

  1. To modify the data loading to include pre-populating the new column.
  2. Change the query in the report which is not possible if it is a vendor code.

This solution was the original idea to tune the query until I had a conversation with my friend and CEO of SQLGrease Franklin Yamamoto who suggested to look at Computed Column which leads to Solution #2.

Solution #2 – FBI Approach

The reason I call it FBI approach which stands for Function Based Index. It is the technology and term used in Oracle way back in 90s. Franklin and I knew each other way back in mid 2000 when we worked in a project to tune Oracle databases. One of our tuning approach at the time was using Function Based Index.

Let’s dive into how it works. The first step is to add a computed column in the table.

alter table [d365].[GeneralJournalAccountMultiCompanyEntries] add [ACCOUNT] AS (substring([ACCOUNTDISPLAYVALUE], 1,6)  );

The next step is to add an index on the computed column.

create index IX_GJAMCE_ACCOUNT_INCL_ACCTCURRNUM ON [d365].[GeneralJournalAccountMultiCompanyEntries] ([ACCOUNT]) INCLUDE([ACCOUNTINGCURRENCYAMOUNT]);

Run the same query and get the I/O stats:

Table ‘GeneralJournalAccountMultiCompanyEntries’. Scan count 1, logical reads 155, physical reads 0, page server reads 0, read-ahead reads 1, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Notice the logical read is significantly reduced from 133,296 pages to 155 pages. In addition, the query does not need to change to use the new computed column, such as

… WHERE  [ACCOUNT] = ‘877601’

Looking at the execution plan, after adding the computed column and the index, it uses Index Seek.

Hover to the Index Seek step, the number of Rows Read is reduced from 5.5 million reads to 38 K reads. Also, the graph below confirms the execution plan picks up the new Index:  IX_GJAMCE_ACCOUNT_INCL_ACCTCURRNUM created on the computed column.

Recap

The solution #2 to use the computed column is a much simpler solution. The reason is there is no need to change the original query, redesigning the table and the ETL process. By adding the computed column matching the substring pattern in the where clause and creating an index on the computed column, the execution plan knows how to pick up the computed column index.

After putting the solution #2, the GL report finishes in 25 minutes comparing to more than 4 hours prior to the tuning. The solution is not limited to the substring function; it can be extended to other functions in the WHERE clause.