Sign in

    Top Five Considerations for Database Index Design in SQL Server

    By: Ben Richardson

    Database indexes are used to speed up different table operations. However before you create an index, it is important to know if you really need an index? And if you need to create an index what are the important points that must be kept in mind? This is where database index design comes in.

    This article aims to answer these questions about database index design and throw some light onto some of the major considerations that a database developer should take into account when designing an Index.

     

    1. Table Size

    The first question that a database developer must ask before creating an index is whether or not the table is large enough to efficiently use indexes. If the table size is small, SQL Server engine can scan the complete table more quickly than searching the table through an index. Indexes in such case have no use and create an overhead while performing database operations.

     

    2. Column Types

    Indexes should be created on a primary key column or any column that contains unique values and that has a NOT NULL constraint. Furthermore, it is advisable to create indexes on numeric columns as numeric columns tend to have more unique values compared to non-numeric columns. Poor database index design uses indexes on columns that have a very few unique entries and can result in very time consuming queries.

    Consider a table named Patients which contains hundreds of thousands of records. The Patients table would contain a column called “Gender” which can have only two unique values “Male” and “Female”. If you create an index on the “Gender Column”, the records will be sorted in ascending or descending alphabetical order.

    So if you have a million records in the Patients table and the number of male and female patients is equal, in the index the first half million records will have a gender “Female” and the second half million will have gender “Male”. Now if you want to search for a female that exists at the 490,000th row of the female records, the SQL Server Engine will have to scan through 490,000 records. On the other hand, with unique numeric values the search can be extremely fast since SQL Server indexes are stored in the form of B + Trees, and so numeric values in the tree nodes can speed up database operations.

     

    3. Number of Indexes

    Officially you can create one clustered index and as many non-clustered indexes as you like for each database table. However, it is good database index design to create one clustered index and only a limited number of absolutely necessary non-clustered indexes. Creating too many non-clustered indexes can actually slow down Update and Insert operations because when a record is updated or inserted and a column value is changed, all the associated indexes have to be updated.

    Consider a scenario where we have two non-clustered indexes, the first index sorts the records by age and the second index sorts the records by both gender and age.

    Here is the first index:

    Age

    Record Address

    10

    Record address

    22

    Record address

    29

    Record address

    32

    Record address

    33

    Record address

    36

    Record address

    40

    Record address

    49

    Record address

    54

    Record address

    59

    Record address

     

    And here is the second:

    Gender Age Record Address
    Female 10

    Record address

    Female 29

    Record address

    Female 33

    Record address

    Female 40

    Record address

    Female 54

    Record address

    Male 22

    Record address

    Male 32

    Record address

    Male 36

    Record address

    Male 49

    Record address

    Male 59

    Record address

    Now if a record with age 40 has to be updated to age 15 for some reason, then the first index will have to be updated to move the record from the 7th position(40) to second position in order to keep the index sorted. Similarly in the second index, the record in 4th index will be moved to the second index. A lot of reshuffling has to take place. Therefore it is wise to keep the number of indexes to a minimum for the columns that are regularly updated when thinking about database index design. Also one column should not be used in multiple non-clustered indexes.

     

    4. Storage Location of Indexes

    The storage location of an index can affect the performance of the queries that use the index and so is also part of good database index design. By default a clustered index is stored in the same filegroup as the table on which the index is created. For non-clustered indexes, the index can be stored in the same filegroup or in different filegroups spanning multiple disk drives. The query performance of non-clustered indexes can be significantly improved by storing non-clustered indexes on multiple disk drives. This is because the input/output performance of the query will be improved as a result of the data being distributed on different areas of the drive.

    The default storage location of indexes can also be changed by specifying a value for the FILLFACTOR option. Since, indexes are physically stored in the form of B+ Trees, the index data is stored on leaf pages.  With the FILLFACTOR option, you can set the percentage of the leaf-level pages to be filled. For instance, if you set the value of FILLFACTOR to 70%, only 70% of the total space of leaf-level page will be filled by index data. The remaining 30% will be left for automatic growth of index data in future.

     

    5. Index Types

    Another extremely important consideration in database index design is the type of index to use. In an earlier article (add a link to the article “When to use Clustered or Non-Clustered Index”) I explained the difference between clustered and non-clustered indexes. I also explained what they are and how they can be used. The decision whether to choose a clustered or a non-clustered index is crucial and should be carefully thought through.

    The following points should be kept in mind while deciding which index type to choose.

    1. For the columns that are used in SELECT/JOIN/GROUP BY/BETWEEN queries, use clustered indexes.
    2. Use non-clustered indexes for columns where you only want to retrieve values from that specific column and not from the other columns of the same row. SELECT queries that retrieve multiple records using a non-clustered index can be slow because the SQL Server engine first searches the column values on which the index is created and then using the row reference for the column value, the records from actual database tables are retrieved.
    3. For the columns that often undergo INSERT and UPDATE operations, use a non-clustered index. Make sure not to use one column in multiple non-clustered indexes since that can slow down update queries. Clustered indexes can be slow for INSERT/UPDATE operations because the complete row has to be updated instead of only a single column value as is the case with non-clustered indexes. 
    4. Since you can only create one clustered index, in their case where you need multiple indexes, use non-clustered indexes. However, if disk space is a major concern, keep the number of non-clustered indexes to a minimum.

    Other Considerations

    Although these are the five most important parts of database index design they are not everything. It is important to specify the correct order of the columns in indexes. As a rule of thumb, the columns that are used for decision making in WHERE clauses, and conditions such as greater than (>), less than (<) etc, should be placed before the columns not involved in these clauses. In a case of multiple columns in the WHERE clause, the most distinctive column names should be mentioned earliest in the Index definition.

    Apart from database index design, query design also plays an important role in the efficient use of index design. For optimized index maintenance instead of writing multiple queries that operate on a small number of rows, try to write fewer queries that affect larger numbers of table rows.

     

    Conclusion

    This article explains some of the major considerations that a database developer must take into account when looking at database index design. The article also explains the rationale behind these considerations and contains further suggestions to make sure that your database index design is efficient.

     

    January 9, 2020 10:57:10 AM PST
    Ben Richardson

    Written by Ben Richardson

    Ben Richardson runs Acuity Training a leading provider of SQL training the UK. It offers a full range of SQL training from introductory courses through to advanced administration and data warehouse training https://www.acuitytraining.co.uk/technical-training-courses/sql/. Acuity has offices in London and Guildford, Surrey. He also blogs occasionally on Acuity’s blog: https://www.acuitytraining.co.uk/news-tips/

    Topics