Along with Slow Changing Dimensions, the Change Data Capture is one of the most important mechanisms to handle changes. While Slow Changing Dimensions define how you want to treat changes in you Dimensions attributes, Change Data Capture allows you to define how you want to handle the extraction of changed records from the source systems.
As you can imagine, the extraction of data from operational sources is very important in Extract, Transform and Load operations using Integration Services or any other provider and this is because you do not want to process all the data again when you run your packages, you just want to process the records that had suffered changes (inserted, Updated and Deleted). If you schedule the execution of you ETL process every day at 2 in the morning for the processing of a four year operational system database, you don´t want to process the data from those four years, you just want to process the records that have changed since the last night, this way your ETL process will be faster and much more efficient.
This way I will show you how you can enable this change tracking mechanism in SQL Server 2008+ editions. Let’s start by opening the SQL Server Management Tools to run some queries. The first step is running the following SQL Query script. In your case, you just need to change the database name. the sp_cdc_enable_db is the store procedure used by SQL Server to enable CDC in the current database.
USE MYDB; -- DATABASE_NAME GO --Enable CDC on the database EXEC sys.sp_cdc_enable_db; GO
After running the previous command, SQL Server will create a special schema “cdc” which will be used to store all the object it needs to manage the mechanism. In these objects you will be able to find all your “shadow tables”. You are now able to check if the CDC is correctly enabled for that particular database. To do it, you can run the following SQL query which will retrieve the name and the value of the “is_cdc_enabled” attribute for your current database.
USE MYDB; -- DATABASE_NAME GO --Check CDC is enabled on the database SELECT name, is_cdc_enabled FROM sys.databases WHERE database_id = DB_ID();
In order to the CDC mechanism work you need to make sure that your SQL Agent is running for the target SQL Server instance. You can activate in directly from the SQL management Tools or using the SQL Server Configuration Manager.
Now it’s time to show you how you can start monitoring the changes of a table. Has I have already told you, this will create a shadow tables in the “cdc” schema with all your source table columns plus some special “cdc” ones.
The mechanism will than work in a very single algorithm, when a record of each data manipulation language (DML) operation applied to the table is written to the transaction log, the change data capture process retrieves this information from the log and writes it to these shadow tables that are accessed by using a set of functions.
To enable it in a table you need to run the stored procedure “sys.sp_cdc_enable_table” with some special parameters as you can see in the following code block.
USE MYDB; GO --Enable CDC on a specific table EXECUTE sys.sp_cdc_enable_table @source_schema = N'dbo' ,@source_name = N'MY_SOURCE' ,@role_name = N'cdc_Admin' ,@capture_instance = N'MY_TABLE' ,@supports_net_changes = 0;
This stored procedure as you can see has multiple parameters. In the previous example I have just used a set of them, and you have other that you can use to enhance your CDC mechanism over a table. Lets understand them all:
sys.sp_cdc_enable_table [ @source_schema = ] 'source_schema', [ @source_name = ] 'source_name' , [ @role_name = ] 'role_name' [,[ @capture_instance = ] 'capture_instance' ] [,[ @supports_net_changes = ] supports_net_changes ] [,[ @index_name = ] 'index_name' ] [,[ @captured_column_list = ] 'captured_column_list' ] [,[ @filegroup_name = ] 'filegroup_name' ] [,[ @allow_partition_switch = ] 'allow_partition_switch' ]
- Source schema: schema in which the source table belongs
- Source Name: source table on which to enable change data capture
- Role Name: database role used to gate access to change data
- Capture Instance: Name of the capture instance used to name instance-specific change data capture objects. If not specified, the name is derived from the source schema name plus the source table name in the format schemaname_sourcename.capture_instance
- Supports Net Changes: The
supports_net_changesoption enables you to retrieve only the final image of a row, even if it was updated multiple times within the time window you specified. This parameter can only be activated if you have a primary key
defined in the source table. (1- enabled; 0 – not enabled)
- Index Name: unique index to use to uniquely identify rows in the source table.
- Captured Column List: Source table columns that are to be included in the change table. captured_column_list is nvarchar(max) and can be NULL. If NULL, all columns are included in the change table
- Filegroup Name: Filegroup to be used for the change table created for the capture instance.
- Allow Partition Switch: Indicates whether the SWITCH PARTITION command of ALTER TABLE can be executed against a table that is enabled for change data capture. Default =1
Now you can use several commands to evaluate if CDC if correctly working for your table. The first option is to use the is_tracked_by_cdc attribute in the “tables” system table of SQL Server. To use it run the following command which has you can see only shows you if CDC is running or not.
USE MYDB; GO --Check CDC is enabled on the table SELECT [name], is_tracked_by_cdc FROM sys.tables WHERE [object_id] = OBJECT_ID(N'dbo.MY_TABLE');
Another option that will give you more information on the tracking is using the built in CDC stored procedure sp_cdc_help_change_data_capture which you give you a lot more information like which columns are being tracked, the capture instances available (You can have two per table) and some other that you can explore. To run it, you can use the following code:
USE MYDB; GO <span style="font-family: Consolas;">--Use the built-in CDC help procedure to get more information EXECUTE sys.sp_cdc_help_change_data_capture @source_schema = N'dbo', @source_name = N'MY_TABLE'; GO</span>
With the source tables being tracked you can now take advantage of the shadow tables directly from SQL Server, because it allows you to query them and see the changes that have been occurring. The CDC mechanism will create and name these shadow tables with a standard: “Source table name” + “_CT”, like for instance “myTable_CT”, which means that by knowing the source table name you also know the shadow table name. To query it you can use a simple SQL Select like the following:
SELECT * FROM cdc.MY_TABLE_CT
At this stage and because we haven´t made any changes into the source table your result should be an empty table. Now try to make a simple DML update against the source table and then re-execute the previous query.
The columns is this “shadow” table are very important, for example to use in Integration Services because it will use them to know for instance what operations has a record had. Let’s understand all of them. The __$start_lsn and __$seqval columns identify the original transaction and order in which the operations occurred. The __$operation column shows the source operation that caused the change (1 = delete, 2 = insert, 3 = update (before image), 4 = update (after image), and 5 = merge). The __$update_mask column contains a bit mask indicating which specific columns changed during an update.
That’s it. Your CDC mechanism is working and you can use it as you need.