CDC Performance Insights

I had completed some basic testing on CDC capture performance to get a baseline on how fast CDC processes changes and is available on the target CDC table.

This test isn’t a deep dive and performance can drastically change based on several factors

Number of transactions and transaction type/size (several updates in a single transaction vs one transaction at a time)
CDC capture settings (maxtrans / maxscans / polling interval) – these are all settings that can be tweaked based on the workload above
Columns captured on the tables
Number of different cdc tables

There’s an excellent whitepaper on this topic if you want more information.

For the testing set up:

All tests were done on a 64 core / 700 GB memory machine
Tests were done against a table name ‘Listing’ (100+ mil records)
Default capture settings (which we currently have in prod)
Inserts/Updates were not batched – each insert/update had its own transaction scope. All inserts/updates were candidates to be captured by cdc
Capture job was caught up and then stopped prior to starting each test. sys.dm_cdc_log_scan_sessions was used to capture accurate start/end times

Key Takeaways:

CDC skips non-cdc related transactions very efficiently (Tests 1,2, 6).
- (03/31/2021 Update): I did hit an issue with a user DB where high throughput inserts that generate a massive amount of log (500GB log generated/day) slowed down CDC. The table that generated the log was a non-CDC table, which invalidates Test #6. The test performed here may not have pushed the log enough to uncover the delay. The fact that SQL did skip 40GB of log (Test #1) may point to a mechanism where large, single transactions / log blocks are skipped efficiently vs thousands of single transactions that need to be processed by CDC.
Processing duration seems to increase linearly for the most part (10x inserts = 10x delay) -which means rate of processing seemed consistent based on these tests (Tests 3,4,5).
CDC baseline from these tests seem to be ~800 CDC transactions processed per second
This makes CDC prone to additional delays when there are large spikes / massive updates that produces transactions above baseline
CDC delay impacts all CDC tables. We can’t have one SLA for Table 1 and another for Table 2 (Test 7)

There are a bunch of knobs that can be tweaked based on workload (maxtrans / maxscans / polling interval). I haven’t tested this as these tests are time consuming and is really an iterative tweaking exercise based on the workload

Potential outages for CDC:

SQL Agent issues
CDC bug
Data loss if CDC instance needs to be rebuilt because of one of the issues above (prior CDC instance could be dropped and changes skipped)

CDC Performance Insights

Key Takeaways:

Share this: