The Difference Between Change Tracking and Change Data Capture

Change Tracking (CT) and Change Data Capture (CDC) were both added to SQL Server in 2008. At first it seems like these two items ought to be synonyms, but they’re separate features. They are similar, but there are some differences, and you might choose to use them in different situations.

Change Tracking (CT)

CT is not as well known as CDC, and I see it talked about less. This is really a feature that allows the net changes made to data to be easily returned from a query. This only lets you know that a particular row has changed since your last query. You have no idea

  • how many times it’s changed
  • the various change values over time

The queries you run will return a table that lets you know which rows have changed since the last check, and then let you know the type of DML change. You need to join this table with the source table to actually get the data.

This is really useful for those applications that cache data and periodically query to update their caches. Lots of .NET frameworks allow this, and it’s a great way to limit the load on your database server.

Change Data Capture (CDC)

CDC is more well known, and seemed like a great tool when I first saw it, but like many useful enhancements, there is a bit of complexity that you have to work through in order to understand and use this feature.

CDC is a little more complex to implement, and it creates a bit more data in your database. You get a change table that is a copy of your table, along with a few additional columns that contain metadata. For each DML operation, you get a row(s) added to the change table. Inserts get one row (new data). Deletes get one row (old data) and updates get two rows (old and new data).

This gives you lots of history and information about your table, but it’s a lot of data. The amount can grow quickly in a busy database, so you need to be sure that you extract the information you need and prune the change tables periodically.

Which One?

Which feature should you use? Is CT better than  CDC? They work differently, and they capture different amounts of data. There’s an entry in BOL that compares them, and you should understand the differences. However you really need to spend time working with both to make a decision about which one meets your needs.

Pick a table or two, enable one, test with some workload changes, then evaluate. Then repeat with the other. You might find that you need to use CT in some places, and CDC in others, depending on the downstream processing of the data.

SQL Server 2008 – CDC Retention

I had done a little work with CDC last year, experimenting with the way that it handles changes in your database. Someone had asked me the question about the retention period since the default of 3 days was not sufficient for their environment.

BOL lists the default retention period as three days, and this is based on a cleanup job that goes through and removes data from the tables. However you can alter that with a stored procedure: sys.sp_cdc_change_job.

There are a number of parameters for this procedure, most dealing with managing the load of the cdc jobs. You can alter the cleanup or capture jobs with this, but for the retention period, you want to change the cleanup job.

@retention is the parameter that indicates the number of minutes that change rows are going to be retained in the CDC tables. If you pass in a NULL, the old retention period is retained. You can include the number of minutes you’d like, up to 52494800 (100 years). I would recommend you choose something less.

The other thing to be aware of is that you have various intervals that are set, and also a @threshold, which determines how many rows are deleted on each scan. If you limit the cleanup to something less than continuously operating, then be aware that if @threshold is too low, it might not delete all the rows. If you are changing these values to manage the load of deletions, be sure that you are monitoring how many rows are changing as it might grow over time.