SQL Server has been around over 30 years, and I’ve been working with SQL Server for almost as long. Kalen covers scans in Part One of SQL Server Internals: Problematic Operators.
I’ve seen a lot of changes over the years (and decades!) and versions of this incredible product. In these posts I’ll share with you how I look at some of the features or aspects of SQL Server, sometimes along with a bit of historical perspective.
Tuning your SQL Server queries is one of the best things you can do for better performance and optimization of SQL server diagnostics. But tuning is a huge topic! Knowing exactly how to tune in the best possible way requires not only a thorough knowledge of your data and your workload, but knowledge of how SQL Server actually makes its plan execution choices. So, what can you do if you’re not an expert on SQL Server Internals? One thing you can do is rely on people who are experts, as well as tools written by experts. Tools like Quest Spotlight Cloud Tuning Pack can give you some great suggestions to get started on the road to better query performance. Of course, no external tool knows your data and all the details of all your workloads, so thorough testing of any suggestion you decide to implement is always recommended.
In these posts on problematic operators, I will assume you have some basic knowledge of SQL Server index structures. Here is some information that will be helpful:
- A table without a clustered index is called a heap and has no ordering. There is no first row or last row. A heap is just a bunch of rows in no particular order.
- The leaf level of a clustered index is the table itself. (It is not a copy of the table, it IS the table.) The rows of the index are logically ordered by whatever column was defined to be the clustered index key.
- The leaf level of a nonclustered index contains an index row for every row in the table. The rows contain the nonclustered key columns and are logically ordered in the order the keys are specified. In addition to the key columns, the nonclustered index rows contain a ‘bookmark’ that points to the referenced row in the table. The bookmark can be in one of two forms:
- If the table has a clustered index, the bookmark is the clustered index key. (If the clustered index key is part of the nonclustered index key, it will not be duplicated.)
- If the table is a heap, the bookmark is a Row ID, or RID, which specifies the physical location of the row. The location is usually specified as FileNum:PageNum:RowNum.
SQL Server’s own tools provide multiple ways to view the query execution plan that the optimizer decided to use for a particular query. With the addition of Quest Spotlight Tuning Pack, you can get even more information about your plans.
The following code creates copies of two tables in the AdventureWorks database (I am using AdventureWorks2016, but you could use another version).
DROP TABLE IF EXISTS SalesHeader;
DROP TABLE IF EXISTS SalesDetail;
SELECT * INTO SalesDetail
Now execute a query that joins the two tables together, after turning on “Include Actual Execution Plan”
SELECT h.SalesOrderID, OrderDate, ProductID, UnitPrice, OrderQty
FROM SalesHeader h JOIN SalesDetail d
ON h.SalesOrderID = d.SalesOrderID
WHERE SalesOrderDetailID < 100;
Quest Spotlight Tuning Pack will report an issue with the query, so you can click on “View Analysis” and choose the option “Execution Plan”. You should see the following:
Understanding Table Scans
First, I want to go out on a limb and say there is no plan operator that is always bad! Why would the optimizer add it to your query plan if it was bad? It might indicate there is room for improvement in your data or index structures, but in itself it is not bad.
In the example above, the Tuning Pack seems to be spotlighting the table scans, indicating that they may be problematic. But it is not always true that table scans are problematic. A much worse situation would be to use a nonclustered index seek for a query that accesses every row in the table. For this particular query, I would agree that the scan might not be a good thing because we are only interested in a few rows in the SalesDetail table (99 out of 121,317 rows, or less than a tenth of a percent.)
So, we could look at the suggestions in the Analysis pane for building indexes. The suggestion for the SalesDetail table is to build a nonclustered index on the SalesOrderID column (the column in the JOIN clause) and INCLUDE every other column in the table that is returned by the query. The suggestion for the SalesHeader table is a nonclustered index on the SalesOrderDetailId column, which is the column in the WHERE clause, and INCLUDE the OrderDate column, which is the only other column returned from this table.
What if our query was slightly different? What if I had run this query using SELECT * instead of a specific column list. If you try it, and look at the recommendations, it suggests using the INCLUDE for every column in the table other than the single key column. Although such an index might make this particular query run a bit faster, it could end up slowing down other queries, in particular your UPDATE queries. This index basically is just a copy of the table, because the leaf level of the index will contain every single column in the table. If you see recommendations like this, suggesting an index that includes all the columns in the table, I definitely recommend stepping back a bit and not blindly creating it.
Query tuning for your SQL server diagnostics involves not only managing indexes, but also managing the queries themselves. For this particular query, we might actually be better off rewriting the query to NOT use the SELECT * to return every row in the table. Returning only a small subset of the columns might be sufficient, and then a much narrower index would suffice, as in the first example.
Would either of these indexes actually be a good index to create? The narrower index will be smaller overall and will be less impacted by updates to the data. An index on all the columns is like a second copy of the table, sorted in a different order than the table itself. There are situations where having a ‘second copy’ of the table in a different order can be useful, but there will be a lot of overhead for data modification operations. The only way to know for sure it to try out the recommendations on a test system with a representative workload. Only you know your data and your queries, so try it and see!
Understanding Index Scans
As I mentioned above, table scans are not always a bad thing. But what about index scans? Because a clustered index leaf level is the table itself, a clustered index scan is the same as a table scan! if a table scan is bad, a clustered index scan is just as bad. But it’s not always bad. Again, you need to test it on your system.
The recommendations from the SQL Server Engine that Quest Spotlight Tuning Pack shows you never suggest a clustered index. it may suggest a nonclustered which includes every column in the table (as mentioned earlier), which is just a duplicate of the table. Figuring out the best column or columns for your clustered index is a big topic in itself, so I won’t go into that here.
What is a seek? A seek operation in a plan means SQL Server is using the ordered data in the index tree to find a row, a set of rows, or the starting and/or stopping point in a range of rows. In general, using a nonclustered index seek is a perfectly reasonable operation if you are returning just a very small percentage of rows from a table. But a seek is not a good choice for a query that is returning LOTS of rows from a table. How many is LOTS? There is no simple answer but if your query is returning more than a few percent of the rows, you should make sure you test the index suggestions thoroughly. Sometimes a table scan, or clustered index scan, is better than an index seek. (For one such example, see my blog post here).
Tools such as Quest Spotlight Tuning Pack can give you great suggestions to get started on your tuning journey with SQL server diagnostics, but the more you know about how SQL Server indexes and the SQL Server optimizer work, the better you’ll be able to evaluate those suggestions for your queries and your data, and possibly even come up with suggestions of your own.
In the following posts in this series, I’ll tell you about other problematic operators that might show up in your query plans, so check back soon!