The Clustered Index is not the Primary Key

I was reading through a list of links for Database Weekly and ran across this script from Pinal Dave, looking for tables where the clustered index isn’t the PK. It struck me that this is one of those facts I consider to be so simple, yet I constantly see people confusing. If you click the Primary Key icon in the SSMS/VS designers, or you specify a PK like this:

    SomeUniqueVal INT PRIMARY KEY

What will happen is that a clustered index is created on this field by default. It’s not the the PK must be clustered, but that SQL Server does this if you don’t tell it otherwise. Tables should have primary keys, and while you can debate that, most knowledegable SQL Server people I know want a PK on tables. There are exceptions, but if you can’t name them now, use a PK.

However the PK isn’t a clustered index. They are separate concepts. The PK can be clustered or non-clustered, and what you choose it up to you. However, I like Kimberly Tripp’s advice. Choose the clustering key separately from the PK, keep it unique, fixed, and narrow.. If they’re the same, fine, but don’t try to make them the same. Choose what works well for your particular table, which means thinking a bit.

You get one clustering key, and it’s worth spending five minutes debating the choice with a DBA or developer, or even post a note at SQLServerCentral. Changing the choice isn’t hard, but it can interrupt your clients’ work on your database, so try to make good design choices early, without blindly accepting defaults. It’s worth a few minutes of your time to make a good choice.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio (2.0MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Rebuilding a Heap–Don’t Do It

I saw someone mention recently that you can run a REBUILD on a heap table. I hadn’t realized that, but when I looked in BOL, I saw that indeed you can run this on a heap. That’s interesting, in that you can now move the pages in a heap around to remove fragmentation. At first glance that sounds good, but I wasn’t sure.  A little more searching was required.

The authoritative source for me on many things like this is SQLskills, and sure enough, Paul Randal has a myth piece on this. Paul has a great explanation, but basically if you rebuild the HEAP structure, you are creating work in that every non-clustered index also has to be rebuilt. Why? The pointers back to the heap pages, which are locations, will change.

What about adding a clustered index and dropping it? Nooooooo, and again, I learned something new. This causes two rebuilds of the non-clustered indexes as they are rebuilt with the cluster addition and then rebuilt when the table changes back to a heap (to get the heap locations). That’s crazy, and certainly not what we want.

The short answer here is that you want a clustered index, for the maintenance reasons here, as well as others. If you don’t have a fundamental reason not to create a clustered index, just add one.

Just do it.

Reframing to Overcome Filtered Index Limitations

I’m continuing on with the Blogger’s challenge in this post.

Turning the Problem Around

In the last post, I wrote about limitations in filtered indexes. I proposed that my table had lots of data with NULL or blank spaces in the gender column, and I wanted to avoid indexing those rows. I tried this code

CREATE INDEX Users_Gender_Filtered2
 ON Users (gender)
  WHERE (gender IS NOT NULL OR gender = ' ');

However, that failed. I can’t use two sets of criteria in the filtered index. But I can use one, so I need to re-frame the problem.

If I look at the data, I have four choices: ‘M’, ‘F’, NULL, ‘ ‘. I can group two of those choices together, looking for a positive (matching) set of data rather than a negative (non-matching set. The easy way to do that is with an IN clause.

CREATE INDEX Users_Gender_Filtered2
 ON Users (gender)
  WHERE gender IN ('M', 'F');

This works, and I’ve got a filtered index. In the real world, I’d actually drop the first index (gender is not null), and only go with this one.


A few places I used to research this post.

Quick and Easy Blogging

This post occurred to me as I was writing the other post. I almost added a note on turning the index around, but realized this is a separate topic, and it makes a nice, second post. This post required < 10 minutes.

This post continues the April Blogger Challenge is from Ed Leighton-Dick, aimed at new bloggers, but anyone is welcome. I’m trying to motivate and cheer people on.

April Blogger Challenge 4–Filtered Index Limitations

The April Blogger Challenge is from Ed Leighton-Dick and aimed at new bloggers, but anyone is welcome. I’m trying to motivate and cheer people on.

Filtered Index Limits

I ran across a limitation recently with filtered indexes and was surprised. Apparently, you can’t make complex filters in your definitions.

In the last post, I wrote about creating a filtered index on gender, looking for non-null columns in a table. That code looked like this:

CREATE INDEX Users_Gender_Filtered
  ON Users (gender)

However, what if I find that I have lots of NULL values, but also lots of spaces, ‘ ‘, in my table because this is an optional field, and the data entry code changed at some point. Neither of these rows actually helps me in finding the rows with an ‘M’ or a ‘F’.

I tried creating an index like this:

CREATE INDEX Users_Gender_Filtered2
 ON Users (gender)
  WHERE gender IS NOT NULL OR gender = ' ';

That didn’t work, so I tried parenthesis.

CREATE INDEX Users_Gender_Filtered2
 ON Users (gender)
  WHERE (gender IS NOT NULL OR gender = ' ');

That didn’t work either. No matter how I moved parens around, they didn’t allow a complex (if two criteria are “complex”) filtered index.

This is a limitation of a filtered index right now. I’m not sure if it will be changed, though I would like it to be. There are other limitations, so read the docs carefully, and think about what might work well in your environment.


A few places I used to research this post.

Quick and Easy Blogging

This post occurred to me while I was writing some code. I mocked up a table in about 2 minutes, and then ran a quick search on the Internet. Reading a few links was about 10 minutes and then testing the code (including dropping the table and recreating it a few times) was less than 5 minutes. All told, I solidified some knowledge and completed this in about 20 minutes. I also have drafts and ideas from this post for 2 other posts that cover this same topic in a similar way.

Look for the other posts in the April challenge.

The DBA Team #1–Code and Slides

Our first DBA Team event, in Richmond, VA just before SQL Saturday #187 went well. Overall I think our experiment was a success and we’re already talking about where and when we might do this again.

In the meantime, we didn’t make a separate site for this series of events, being an experiment and all. I’m adding this post as a placeholder for the various slide decks and code.

Core Monitoring for SQL Server (Steve Jones)

Good DBAs ensure that they are always aware of the state of their instances. All systems should have monitoring in place, not just so you know when things go wrong, but so you understand what a normal workload looks like and can plan for the future. This session will cover the basics of monitoring a SQL Server system and the various metrics you should be tracking.

Getting Started with SQL Server Backup (Grant Fritchey)

Backups are fundamental to protecting the investment your business has in its data and they’re the foundation of disaster recovery planning. We’ll go over best practices for database backups, to ensure you’re establishing that foundation correctly within your systems. This introductory level session covers full, log, and differential backups, as well as restores and restores to a point in time. Come along to be sure you’ve got the right protection in place for your systems.

Understanding Database Corruption (Grant Fritchey)

A DBA’s primary purpose is to ensure that the information in their charge is accessible by the correct people within their organization. Despite everything you do to make sure you’ve got your servers configured, monitored, and tuned, with well-tested backups in place, you can still lose data through corruption. But what is corruption in a database? This session lays out exactly where database corruption can come from, how to find out exactly where the corruption is within a database, and  the methods you have available to recover from database corruption.

Indexing for SQL Server (Steve Jones)

Indexes are important for improving the performance of your queries, but they add overhead to your server and require maintenance. This session examines how indexes work and the basic maintenance that you should perform to ensure your system is running at its peak level.

Dropping Indexes

While working on some demos recently, I needed to drop an index for a test. I executed this generic statement for an index I’d just created.

   1: DROP INDEX ix_IndexName

Needless to say I was surprised when I got this error:

Msg 159, Level 15, State 1, Line 1

Must specify the table name and index name for the DROP INDEX statement.


I haven’t done much index maintenance in the last few years, but since I had specified the name, and I expected names to be unique, I was surprised. That’s not the case, however, since indexes aren’t seen as objects.

I created the index in AdventureWorks with this code:

   1: -- paste in create index statement


   3: ON Sales.SalesOrderHeader ( [TerritoryID],[ShipMethodID], [SubTotal], [Freight] )

   4: INCLUDE ([SalesOrderNumber], [CustomerID]);

As a test, I then added this index:


   2: ON Production.Product ( [Name],[ListPrice]);

Same name, different table.

A quick check in sys.objects surprised me.

   1: select *

   2:  from sys.objects

   3:  where name = 'ix_Indexname'

This returned no results. Hmmm, let’s investigate further. I next decided to check sys.indexes.

   1: select *

   2:  from sys.indexes where name = 'ix_Indexname'

This returned two results:


Two entries, with two object_ids. I wondered what those objects were, so I ran more code:

   1: select *

   2:   from sys.objects

   3:   where object_id in (1010102639, 1717581157)

I received the two tables back as the objects.


This surprised me, though I’m sure I’ve read the details in a book at some point, or even seen the documentation in sys.indexes. The entry for name says it is unique only within the space of the object, which would be the parent table.

I had assumed that indexes were objects, but they aren’t. They are an attribute of an object, and as such, I needed this code to remove my index:

   1: -- cleanup

   2: DROP INDEX ix_IndexName

   3:  ON Sales.SalesOrderHeader

   4: ;

   5: GO

Update: As noted in a few comments, you can also drop the index as:

   2: DROP INDEX Sales.SalesOrderHeader.ix_IndexName

And, of course, I needed to drop my test object.

   1: Drop INDEX ix_IndexName

   2: ON Production.Product

   3: ;

Review Your Indexing

index cards
How often do you re-examine your indexes?

In the latest versions of SQL Server, there are some amazing new features. Many of them allow us to expand the capabilities of SQL Server, but some are added to allow us to dive more deeply into how the system works. A couple of the newer DMVs are fantastic tools to allow us to find indexes that are unused, duplicate, or unneeded. If you’re not using sys.dm_db_index_usage_stats or sys.dm_db_missing_index_details, you should dig into a little and learn how these work. However running a diagnostic query to find unused indexes and then dropping those indexes is a bad idea. You need to ensure that those indexes aren’t rarely, or lightly used.

I thought about this recently while giving a talk on maintenance. Indexes require routine maintenance, and many of us schedule rebuilds or reorganizes in our databases to ensure that fragmentation doesn’t become an issue. That’s a good start, but there’s more you can do.

Every month or two you should schedule time to analyze your indexes. Capture a workload from a Trace and analyze it with the Database Tuning Advisor. Take the results and compare them to your current indexing schema. Make a judgement or two on which indexes are used by different queries and spend a few hours testing changes to your systems. You might need new indexes, you might want to remove old indexes that aren’t being used, or you might decide to add a column or include to an existing index.

There are lots of articles on SQLServerCentral and blogs on indexing that can help you learn more about what changes might improve performance, but ultimately you will really need to test any changes on your own systems. With a little practice, you can build a short routine that allows you to take a few hours every month and analyze a few indexing changes, perform a little testing, and perhaps greatly improve the performance of your applications.

Steve Jones

The Voice of the DBA Podcasts

We publish three versions of the podcast each day for you to enjoy.

Trace Flag 2371 and Statistics

One of the issues that I see published often on forums like SQLServerCentral is the advice to update statistics on your tables if you have strange performance issues, or sudden changes in performance. Statistics are important for the query optimizer, and you should understand the basics of how they work.

However there’s a problem with statistics. They get out of date. SQL Server will automatically update statistics, but it doesn’t do this constantly. It does it after 20% of the table changes (by default). If you have 1000 rows, that means 200 rows changed (or added) can trigger the update. If your table has 50 changes every couple days, you’ll get statistics updated every week.

If you have 1mm rows (think large, historical data here), then those same 50 changes won’t trigger statistics updates for a long time. 200,000 changes will be needed then.

There’s a trace flag, 2371, that can help with the minimum needed to trigger a stats update (or you can do this with your own jobs). By choosing this, you can lower the minimum for triggering an update.

What you do really depends on your issues. If you find poor performance in queries, look for wildly incorrect estimates of rows in your query plans. If you find that your statistics aren’t being updated, or not updated enough, you might enable the trace flag, or create your own job to update statistics manually.

Note that if you are rebuilding indexes, you don’t need to also update statistics on the columns in the index. They are done as part of the index rebuild.

Common SQL Server – Not Indexing FKs

This series looks at Common SQL Server mistakes that I see many people making in SQL Server.

Foreign Keys

It’s way too often that I see people building databases without including declared referential integrity (DRI) in their databases. Even when I see people setting a primary key on tables, it seems that often they ignore foreign keys and creating linkages between tables that link them together.

However, even when people have declared a FK, they often don’t create an index on that column. Perhaps they assume that SQL Server will create the index like it does for PKs, but it does not.

If I create these two tables and join them with a FK:

CREATE TABLE [dbo].[Products](
    [ProductID] [int] NOT NULL,
    [ProductName] [varchar](50) NULL,
    [ProductID] ASC

CREATE TABLE [dbo].[ProductDetails](
    [ProductDetailID] [int] NOT NULL,
    [ProductID] [int] NULL,
    [SKU] [varchar](50) NULL,
    [Price] [numeric](18, 2) NULL,
    [ProductDetailID] ASC
ALTER TABLE [dbo].[ProductDetails]  WITH CHECK ADD  CONSTRAINT [FK_ProductDetails_Products] FOREIGN KEY([ProductID])
REFERENCES [dbo].[Products] ([ProductID])

ALTER TABLE [dbo].[ProductDetails] CHECK CONSTRAINT [FK_ProductDetails_Products]

If I go and check indexes on ProductDetails, I find that there is only one index, the index for the PK.


Why is this a problem? It’s because of performance. We should realize that indexes speed up performance by reducing the amount of work that SQL Server has to do.

With FK columns, what I’ve often found with child tables is that I know the value of the FK column I am searching for and don’t need to join with the parent table. However without an index on the FK column, this query requires a table scan.

, price
from ProductDetails pd
where pd.ProductID = 3

If you are creating FKs in your database, don’t forget to index them where appropriate.

Auto Creation

I’ve seen some people ask why SQL Server doesn’t automatically create indexes on those FK columns. I am torn on this, but I like the 80/20 rle. If 80% of the tables would benefit from it, I think it should be done. I am leaning towards some intelligent mechanism to do this.

The main issue is that you might not want just an index on the FK column. You might want some sort of covering index that includes columns in addition to the FK column to prevent key/bookmark lookups to the clustered index. If you can avoid those, you can drastically increase performance.

There is also the chance that with your query load, you never use these indexes. That can be horrible for performance as well since there is overhead to maintain these indexes on all insert/update/delete operations.

The Advice

Look at the queries that are coming into your database. Check the missing index DMVs and if you find that the FK columns are being used, index them.

If you’re not sure, or don’t know how to look for missing indexes, here’s a reference.