This post is geared towards explaining why indexes are necessary in layman’s terms. My goal is for this to be understandable to somebody that doesn’t necessarily have a database background.
The Phonebook Example
Finding a phonebook on my front doorstep is a nuisance these days. I’ve quit picking them up over the years to try and make a point to the people that deliver them. Unfortunately they don’t seem to be getting the message. but I will use it to illustrate the value of an index.
If I need to find the phone number for an individual, let’s use Navin R. Johnson for example. I’d start somewhere in the middle of the phone book since J is roughly in the middle of the alphabet. If I flip to the P’s I know I’ve gone too far. If I flip back to the F’s I’ve gone too far in the other direction. Eventually I will find Navin R. Johnson’s phone number. Because the phone book is ordered by name we are able to accomplish this process of elimination.
B tree index
Computers perform this process of elimination by using a binary tree index (B tree index). A binary tree index cuts the result set of what you are looking for in half to narrow where to find the page your phone number is located on. The diagram helps illustrate a binary decision tree.
The binary decision tree starts at the top (O). If the value we are looking up is less than O (<O), we follow the tree to the left. Traversing the tree following this binary (yes/no) logic will eventually take you to a page that contains the information you are looking for.
Reverse Phone Number Lookup
If you need to lookup an entry in the phone book by a phone number rather than by a name this can certainly be done. The phone number is contained somewhere in the phonebook at least once (worse case more than once). This would require me to flip through every page until I locate an entry with the phone number on it. If there can be more than one entry with the same phone number I would need to flip through every page in the phone book just to be sure I didn’t miss a matching entry. As you can see this would be quite time consuming. The example below shows how we scanned all the pages just to be sure we found all instances of a phone number.
This is why doing lookups for data that is not indexed causes a “scan”. In the example above we are scanning all the pages of the phone book to look for a phone number. To avoid this it would be necessary to create a secondary index which points to the same data.
Clustered Indexes Vs Non-clustered Indexes
A clustered index contains the data that you are looking for within the index. A phone book is basically a clustered index. When you locate the key you are looking for, the data associated with that key is in the same location. For example we were looking for Navin R. Johnson. When we found his information, we didn’t need to go look in a different book to get his phone number and address. A non-clustered index contains a reference to the data you are looking for, but not the data itself. We can create a non-clustered secondary index for performing the phone number reverse lookup.
In this example we are looking up the number 404-980-2312. When we perform the search we traverse the index until we reach our result. Our result is a pointer to Navin R. Johnson. We would then need to perform a second lookup by name (Navin R. Johnson) in the phone book index to get the rest of his information (i.e. Address).
Parting thoughts
This example is extremely oversimplified. I left out details so far as how data is stored in pages for the sake of simplicity. My main goal of this post was to describe how indexes are used to locate data quickly.
Watch us diagnose a query timeout with SQLGrease:
See how you can get query level wait events without Query Store: