Andrew Alumkal

SQL Server Engineering

CDC Performance Insights

I had completed some basic testing on CDC capture performance to get a baseline on how fast CDC processes changes and is available on the target CDC table.

This test isn’t a deep dive and performance can drastically change based on several factors

  • Number of transactions and transaction type/size (several updates in a single transaction vs one transaction at a time)
  • CDC capture settings (maxtrans / maxscans / polling interval) – these are all settings that can be tweaked based on the workload above
  • Columns captured on the tables
  • Number of different cdc tables

There’s an excellent whitepaper on this topic if you want more information.

For the testing set up:

  • All tests were done on a 64 core / 700 GB memory machine
  • Tests were done against a table name ‘Listing’ (100+ mil records)
  • Default capture settings (which we currently have in prod)
  • Inserts/Updates were not batched – each insert/update had its own transaction scope. All inserts/updates were candidates to be captured by cdc
  • Capture job was caught up and then stopped prior to starting each test. sys.dm_cdc_log_scan_sessions was used to capture accurate start/end times

Key Takeaways:

  • CDC skips non-cdc related transactions very efficiently (Tests 1,2, 6).
    • (03/31/2021 Update): I did hit an issue with a user DB where high throughput inserts that generate a massive amount of log (500GB log generated/day) slowed down CDC. The table that generated the log was a non-CDC table, which invalidates Test #6. The test performed here may not have pushed the log enough to uncover the delay. The fact that SQL did skip 40GB of log (Test #1) may point to a mechanism where large, single transactions / log blocks are skipped efficiently vs thousands of single transactions that need to be processed by CDC.
  • Processing duration seems to increase linearly for the most part (10x inserts = 10x delay) -which means rate of processing seemed consistent based on these tests (Tests 3,4,5).
  • CDC baseline from these tests seem to be ~800 CDC transactions processed per second
  • This makes CDC prone to additional delays when there are large spikes / massive updates that produces transactions above baseline
  • CDC delay impacts all CDC tables. We can’t have one SLA for Table 1 and another for Table 2 (Test 7)

There are a bunch of knobs that can be tweaked based on workload (maxtrans / maxscans / polling interval). I haven’t tested this as these tests are time consuming and is really an iterative tweaking exercise based on the workload

Potential outages for CDC:

  • SQL Agent issues
  • CDC bug
  • Data loss if CDC instance needs to be rebuilt because of one of the issues above (prior CDC instance could be dropped and changes skipped)

Next Post

Previous Post

© 2024 Andrew Alumkal

Theme by Anders Norén