Database Query Performance Optimization Tips : Index Concepts, Clustered, Non-Clustered


June 13, 2023 Program

Database Query Performance Optimization Tips : Index Concepts, Clustered, Non-Clustered
Recording tips for optimizing query performance through indexing, differences between clustered and non-clustered indexes.

Foreword

🔗

When a table contains a large amount of data, even if the TSQL syntax is correct, there may be abnormal slowness during CRUD operations. This is closely related to the configuration of indexes. In fact, I believe that setting the right indexes from the beginning when designing the schema is essential and it's important to cultivate good habits in this regard.

Index

🔗

An index is a data structure used to accelerate query operations in a database. It is typically created on one or more columns of a table to facilitate the quick locating and retrieval of specific rows or data that meet certain conditions. In simple terms, it trades space for time. If there are well-defined indexes, when querying data, it can quickly retrieve the desired information through a B-tree structure, instead of scanning the entire table.

Clicking on the index of a table that can see the list of indexes.

The main types of indexes are Clustered Index, Non-Clustered Index, Unique Index, Primary Key Index, Clustered Index, and Full-Text Index.

This article primarily discusses:

Clustered Index: A clustered index determines the physical sorting order of data in a database. Each database table can have only one clustered index, which determines the physical storage order of the table.

Non-Clustered Index: A non-clustered index is created outside of the clustered index. It consists of index keys and pointers to data pages, allowing for fast data retrieval. A database table can have multiple non-clustered indexes.

Clarification on a few questions.

🔗
  1. Is it better to have more indexes on a table?

While having more indexes may improve query efficiency, it also increases disk space usage and can potentially slow down data modifications on the table.

  1. What type of index is PRIMARY KEY?

When creating a PRIMARY KEY, if there is no clustered index already present, a clustered index will be automatically created. Therefore, further evaluation is necessary to determine if it is suitable.

  1. What type of index is FOREIGN KEY?

Creating a FOREIGN KEY does not automatically create any index. If frequent JOIN operations are expected, it is recommended to create an index.

  1. Is a Clustered Index the same as a UNIQUE Index?

A Clustered Index is not necessarily a UNIQUE Index.

Principles of Index Selection

🔗

The principle of index selection is to choose the most frequently queried and less repetitive (higher uniqueness) columns for indexing.

Creating a Clustered Index

🔗
SQL
CREATE CLUSTERED INDEX IX_ClusteredIndexName
ON YourTableName (ColumnName);

A clustered index affects the sorting order of data when it is physically stored in a table. The suitable fields for a clustered index include:

  1. Fields used in WHERE conditions that involve searching for a large number of duplicate values, e.g., WHERE city='Taipei'.
  2. Fields frequently used in ORDER BY clauses.
  3. Fields used in range queries, e.g., WHERE [time] BETWEEN '20230101' AND '20231231'.
  4. Fields commonly used in join clauses.

In SQL Server, when creating a primary key index, it is by default a clustered index. When defining a primary key for a table, if the index type is not explicitly specified, SQL Server automatically creates the primary key index as a clustered index.

Using a GUID as a primary key can lead to performance issues due to the inherent randomness and lack of continuity in GUID values. Even if data is written sequentially, the GUID values for later-written data may appear before earlier ones. As a result, the physical storage location, governed by the characteristics of a clustered index, may require constant adjustments and reordering of existing data, leading to index fragmentation. This fragmentation can significantly impact both write and query performance. For more information, please refer to the provided resource What are the best practices for using a GUID as a primary key, specifically regarding performance?

Creating a Non-Clustered Index

🔗
SQL
CREATE INDEX index_name
ON table_name (column_name);

Creating a composite index with multiple columns.

SQL
CREATE INDEX idx_column1_column2
ON your_table (column1, column2);

Non-clustered indexes do not affect the sorting order of data when it is physically stored in a table.

Display estimated execution plan(Ctrl + L)

🔗

Regardless of whether indexes are created or not, when writing complex TSQL queries, you can also view the execution plan to identify which part of the query consumes the majority of the query performance.

Clicking this button will allow you to view the execution plan for each query.
Afterward, you can click on the execution plan to view it in each query result.

Actually, I'm still studying and don't fully understand highly complex queries either. However, in general, I can clearly determine whether the query is using an index seek or an index scan, which helps to provide a basic understanding of the query execution.

Clustered Index Seek and Clustered Index Scan

🔗
  1. Clustered Index Seek:

Searching with a clustered index is based on the key value of the index. The search operation utilizes the B-tree structure of the index, quickly navigating to the node containing the specific key value and directly accessing the required data pages. Clustered index search is typically an efficient access method, especially when the query conditions can effectively utilize the key value for filtering.

In general, if you see a clustered index seek, it indicates that the clustered index you have created is being used by the current query. However, if the query conditions do not align with the key value sorting order of the clustered index, the database engine may not be able to effectively utilize the clustered index and might resort to scanning the entire table or using other indexes (such as non-clustered indexes) to execute the query.

  1. Clustered Index Scan:

Scanning with a clustered index involves traversing the entire clustered index to find the data that matches the query conditions. The scanning operation requires traversing the entire index structure, so it may take longer to execute, especially for larger tables or indexes containing a large amount of data. Clustered index scanning typically occurs in the following situations: when the query lacks effective filtering conditions, when retrieval of the entire table is needed, or when the index covers all the columns required for the query.

In summary, clustered index searching involves quickly locating specific data based on the key value, while clustered index scanning involves traversing the entire index to find the data that matches the conditions. Searching operations are usually more efficient, while scanning operations take longer to execute.If scanning operations consume a significant portion of query time, it may be necessary to use other indexes or redesign the indexing strategy to improve query performance.

Conclusion

🔗

Index design is a challenge that every backend engineer is likely to encounter. While some companies have dedicated database administrators, it is still important for engineers to study and understand index design. I'm also making efforts to improve my knowledge in this area. Otherwise, it can be frustrating when your code seems perfectly written, yet the performance is unexpectedly slow, only to realize that it's actually a database issue.🤣

SQL



Avatar

Alvin

Software engineer, interested in financial knowledge, health concepts, psychology, independent travel, and system design.

Related Posts