Index seeks and NULL values

A long time ago I heard or read that you couldn’t index a NULL value – I didn’t think that was correct.  The other day I saw a missing index (as identified by the optimizer) where the first column was looking for a NULL value.  I was working on optimizing a process that was pretty key to a customer’s business so I wanted to be absolutely sure of what the effect of indexing for a NULL value would be.

The example data

I’m using my CustomerTest table in this example.  I’m setting one row out of 4.6 million to have a NULL value in the AccountNumber column.  The number of NULL values in the table relative to the total row counts is key to making this example work.  Additionally I added a second predicate to the where clause just to prove that the index is being seeked on the columns following the first column.  Here is the query:

select count(1)
from CustomerTest where AccountNumber IS NULL and TerritoryID = 6

From looking at the execution plan you can see there is a missing index suggested.  You can also see the CustomerTest clustered index is being scanned.

The statistics time and statistics IO output shows 36,340 logical reads, 766 milliseconds of CPU time and an elapsed time of 2.099 seconds:

The missing index suggested by the optimizer is as follows:
CREATE NONCLUSTERED INDEX []
ON [dbo].[CustomerTest] ([AccountNumber],[TerritoryID])

After creating the index and re-running the query, things look quite different. The execution plan now shows a seek on our newly created index.

If you are paying attention you’ll notice I was lazy and didn’t name the index.

Reviewing the seek predicates shows that the index seek occurred on both columns:

The statistics time and statistics IO output looks much better:

We now have 3 logical reads, cpu time of 0 milliseconds, and 16 milliseconds of elapsed time.  This proves that an index seek can be performed on a NULL value.

Final thoughts

I think it is important to point out getting a seek to occur on a NULL value will depend on how many NULL values there are relative to the total number of rows in a table.  If a majority of the rows are NULL a seek likely wouldn’t occur.