Change Data Capture vs Change Tracking

Make sure to check out the comments on this one!

A short conversation on Twitter Monday night reminded me of this topic. SQL 2008 introduced two new features with similar names and similar-sounding functionality, which are actually quite different. I had learned about these once upon a time and then largely forgotten the details until I took a recent course with Paul Randal (blog | twitter) and Kimberly Tripp (blog | twitter), so I’m guessing others could use a little refresher as well.

So here is a quick taxonomy of Change Data Capture and Change Tracking. I’m going to keep this short, but if you really want all the details, skip to the links at the bottom for complete information.

Change Tracking (CT)

Change Tracking is a synchronous mechanism which modifies change tracking tables as part of ongoing transactions to indicate when a row has been changed. It does not record past or intermediate versions of the data itself, only that a change has occurred. It is recommended to use snapshot isolation with Change Tracking! (See the links below for details on why.)

Change Data Capture (CDC)

Change Data Capture is asynchronous and uses the transaction log in a manner similar to replication. Past values of data are maintained and are made available in change tables by a capture process, managed by the SQL Agent, which regularly scans the T-Log. As with replication, this can prevent re-use of parts of the log.

If you suffer from excessive log reuse waits, you can post a comment on the SQLAgentMan blog and ask for your log back.

Comparing Editions, Methodology, and Notes…

The following recap is largely a paraphrasing of the prior two paragraphs, with some Edition info thrown in:

Feature Change Tracking Change Data Capture
Supported Editions Express, Workgroup, Web, Standard, Enterprise, DataCenter DataCenter, Enterprise
Summary Tracks when a row has changed. Which columns changed can also be tracked. Tracks when data has changed and includes the values as well. Entire table or subset of columns can be captured.
Methodology Synchronous with DML, records change tracking info as part of transaction Asynchronous from Transaction Log
Recommendations Use with snapshot isolation. Note: places some limitations on DML (see links below) Use esp for large scale data warehouse applications. Note: can be used in a DB which is also a replication publisher if need be.

Links/More Information

Previous Post
DBA Checklists: Check out Free OneNote in Office Live
Next Post
Error Configuring DataCollector: A collection set cannot start without a schedule

Related Posts

No results found

6 Comments. Leave new

  • Just one correction: Snapshot isolation is recommended for both systems, depending on how you’re acquiring the most recent version (LSN in CDC, change version in CT).

    FYI, I’m doing a presentation at PASS that compares and contrasts both of these technologies in depth.

    Reply
  • A popular way to use CDC is to implement it on a replication subscriber so downstream consumers of the data can easily pull data incrementally. This has led to a major pain point for users however. If you have to reinitialize the subscription or you re-apply a snapshot, every record will show as having been updated and automated processes will try to download the entire data set.

    Reply
  • Adam, I’m interested in more details about why snapshot isolation is required for CDC. Paul’s technet article doesn’t list it as recommended and I haven’t seen that in BOL, so curious.

    Robert, we have a team here who is looking at that very option. The re-init scenario is definitely the biggest discussion topic!

    Reply
  • Kendra,

    The reason it’s suggested for CT is the workflow: You call the CHANGETABLE function, followed by a call to CHANGE_TRACKING_CURRENT_VERSION. Doing it in a snapshot transaction ensures A) That the current version number returned won’t change between the call to the CHANGETABLE function and the call to the CURRENT_VERSION function, and B) That you won’t block transactions (the same thing could be done with a serializable transaction if you didn’t care about blocking). You could reverse the process and call CHANGE_TRACKING_CURRENT_VERSION first, then use CHANGETABLE with an additional filter in the WHERE clause, which should eliminate the need for a snapshot transaction. But most people won’t do that.

    The same workflows are applicable to CDC, the difference being that the people who wrote the initial white paper on that topic didn’t notice, so no one is repeating it as a best practice 🙂 … in the CDC case, you have a very similar two choices: Call fn_cdc* based on a max LSN less than or equal to the current date, then call fn_cdc_get_max_lsn to retrieve the maximum LSN, or call fn_cdc_get_max_lsn() first, then use that LSN for the fn_cdc* function. A snapshot transaction is required in the first workflow. Perhaps not in the second. But either way it’s just as recommended (or not) as in CT — it all depends on how you’re pulling back the data.

    Hope that makes sense?

    Reply
  • If we have always on enabled (sql server 2014 beyond), is it possible to get change tracking information of primary server’s tables on secondary servers?

    Reply
    • Hi there,

      I can’t recall testing this personally, but per the documentation, “Change tracking data must always be obtained from the primary replica. An attempt to access change data from a secondary replica will result in the following error:
      Msg 22117, Level 16, State 1, Line1
      For databases that are members of a secondary replica (that is, for secondary databases), change tracking is not supported. Run change tracking queries on the databases in the primary replica.”

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Menu