Accelerated Database Recovery – Deep Dive Part 1

Accelerated Database Recovery (ADR) is a redesign of the traditional database engine recovery process introduced in SQL Server 2019, and also available in Azure SQL Database. At a high level, ADR provides 3 main benefits:

Fast and consistent database recovery
Instantaneous transaction rollback
Aggressive log truncation

I had completed an initial phase of testing in November 2019 with a write-up that can be found here. Part 1 will take a much deeper dive into ADR, and Part 2 will consist of some practical testing, good insights, and troubleshooting.

Traditional Recovery process (ARIES)

Before jumping into the changes with ADR, let’s quickly recap the current recovery process. Microsoft SQL Server, IBM DB2, and other database platforms leverage ARIES recovery algorithm to recover databases. When a user makes a modification on the database, WAL (write ahead logging) approach is used to log all operations to the transaction log prior to performing modifications to data pages. Log records need to be flushed to disk and hardened prior to committing a transaction.

When a data page is requested for a read or write, it’s pulled from disk and cached in-memory. All transactions that update data always update it in-memory. Periodically, the database uses a checkpoint process to flush all modified(dirty) data from memory back to disk, after guaranteeing that the relevant log records are also flushed to disk first. In the event of a failure, the WAL approach guarantees that all committed transactions can come back to a consistent state, even if all the modification remained in memory since the transaction log can be used to replay back all changes.

Analysis Phase:
- SQL Server scans the log forward, starting from the last checkpoint to determine the state of each transaction at the time SQL Server stopped.
Redo Phase:
- SQL Server scans the log starting from the oldest uncommitted transaction
  - For the portion of the log from the oldest uncommitted transaction to the last checkpoint:
    1. SQL acquires locks for all transactions found in the analysis phase (committed or uncommitted)
    2. These locks are required to continue redoing committed transactions and properly undo uncommitted transactions
    3. No actual redo of data modifications need to be done in this phase since the checkpoint has guaranteed all data was written to disk up to this point
- For the portion of the log from the last checkpoint to the end of log
  1. SQL server will redo all data modifications for transactions that were committed prior to the failure
  2. Once Redo phase is complete, the database is technically online as it has acquired all the locks necessary for the next phase – Undo. User queries will be blocked if they attempt to access data that’s part of the undo.
Undo Phase:
- SQL Server traverses the log backwards and will undo all data modifications that were flushed to disk, but not fully committed (checkpoints flush both committed and uncommitted pages)

Based on this design, the database engine can take a long time to recover, especially if there were long running active transactions such as a bulk insert, as this will need to be undone. The undo time is proportional to the length of the open transaction. This is also true when rolling back a large transaction as it uses the same undo process described above. In addition to this, the transaction log also grows during long running transactions and cannot be truncated as it’s needed for the recovery process.

Multi version concurrency control (MVCC)

ADR recovery algorithm is still based on ARIES but also leverages MVCC (Multi version concurrency control) to be able to instantly rollback transactions and aggressively truncate the transaction log.

MVCC was introduced in 2005 to support snapshot/RCSI transactions. At a very high level:

When a row is modified, the previous version of the row is stored in an append-only version store in tempdb. Further updates to the same row generate newer versions and is chained together with the other versions.
This prevents readers from having to take locks when reading data pages. The last committed version of that row is available in the version store – guaranteeing them the latest committed data.
When a transaction no longer needs an old version, its aggressively cleaned up from the version store.
Given that versions are only required for active open transactions, the version store doesn’t need to be preserved across restarts – which is why its stored in tempdb. All operations to the version store are also not logged, making it very efficient.

Accelerated Database Recovery (ADR)

Let’s take a look at all the different pieces of ADR and see how it compares to the traditional ARIES approach.

Persistent Version Store (PVS)

At a high level, the persistent version store works the same way as I described in the MVCC section.

When a row is updated:

The previous copy of that row is moved to the PVS
All activity against the PVS is logged in the transaction log, unlike MVCC in tempdb. This makes all data in the PVS recoverable and accessible even after a crash.
All other queries can read the previous version of the row concurrently if needed from the PVS
If the data modification is committed successfully, the version can be discarded from the PVS as long as no other active transaction needs it
If the data modification needs to be rolled back, it’s simply marked as aborted and all queries carry on accessing the version from the PVS.

The method is especially helpful in the presence of long running transactions since rollbacks are now virtually instant. SQL Server no longer needs to ‘undo’ all the changes (which can take as long as the transaction itself). It now simply redirects all queries to access the row from the PVS.

There are two types of version stores for PVS:

Off-row version store
In-row version store

Off-row version store

The off-row version store works pretty much the same way as described earlier. An entire copy of the row is stored separately in this version store.

Key takeaways:

This version store is a single table (with no indexes) that maintains versions for all user tables in the database
Generating a version is effectively an insertion into this table
Reads are accessed based on the versions physical locator (page id, slot id)
All activity to the table uses regular WAL logging mechanism
When older versions are no longer needed, its deleted from the table and space is deallocated
Space is always pre-allocated to avoid having to perform allocations as part of generating a version
It’s highly optimized for concurrent inserts – its apparently partitioned by core. Threads running in parallel can insert rows into different sets of pages to eliminate contention

In-row version store

The in-row version store is a new optimisation introduced as part of ADR. Instead of creating a version of the row in the version store, the previous version is stored within the data page being updated

Key takeaways:

In the case of an update, if the modification to the row is small (how small is still a mystery..), the existing row is updated, and a diff is computed and stored along with the row.
If the previous version of the row is needed, it can be reconstructed using the diff
In the case of a delete, the row is simply marked as deleted – the previous version is still available on the data page if needed
- With the traditional MVCC + tempdb approach, the previous version of the row is stored in the tempdb version store and the row on the data page is marked with a stub indicating the row was deleted

Benefits:

Reduces storage overhead since it can be stored along with the modified row – the diff is smaller than a whole copy of the row
Reduced transaction logging
- The version is logged together with the data modification
- The increase in log generated is only increased by the size of the diff
- This is much better than logging a complete copy of the record off-row separately in the PVS
Reads are faster and uses less of the buffer pool since the version exists in the data page itself and doesn’t need to be read from a page in the PVS

Overhead:

Computing and reapplying diffs require additional CPU cycles
It can negatively impact the performance of the system as adding the diff can increase the size of rows in the data pages and lead to page splits – which is an expensive operation, both in terms of reads and writes
According to the ADR whitepaper:
- “To mitigate this issue, the size of in-row versions is capped both in terms of the size of the diff, as well as the size of the row it can be applied to. When these exceed certain thresholds, PVS will fail back to generating off-row versions, on a different page”

Logical Revert

Logical revert is the process of version-based undo, providing instant transaction rollback/undo. Rollback is performed by using the PVS to get the last committed version of the record. All locks are immediately released after the undo as they are no longer needed to rollback the entire transaction.

Short transaction optimization: For OLTP transactions – having high volume, short transactions that need to rollback is an overhead to track it (in the Aborted Transaction Map – ATM). Due to this, ADR dynamically decides based on the transaction size if the transaction will use an ADR based logical undo, or an actual traditional undo. This way there is less overhead on the ATM and versions can be removed more aggressively.

System Operation Optimizations with ADR

These are operations that are used to maintain internal data structures, like allocating/deallocating new pages, page splits, etc. For example – a large data load. In these cases, all system transactions such as allocating pages are short-lived and commit immediately. If a failure occurs, ADR does not rollback these operations, unlike the traditional method. The allocated space and updated data structures will be lazily cleaned up in the background

Slog

Logical operations such as lock acquisitions are now logged in a secondary in-memory log stream known as the Slog. In the example of a data type modification of a large table, the transaction log will have a massive amount of records to update the underlying column data, but the Slog will only store a tiny fraction of log to store the acquisition of locks.

Since the amount of log stored in the Slog is extremely small, this optimization helps in the new recovery process.

ADR Recovery process

Analysis Phase:
- This phase remains the same as the traditional recovery process, but in addition will also reconstruct the Slog back in memory from the last Checkpoint
Redo Phase:
- For the portion of the log from the oldest uncommitted transaction to the last checkpoint
  - Redo processes only the Slog to acquire all necessary lock acquisitions. Since the log records are tiny, this is done almost instantly
- For the portion of the log from the last checkpoint to the end of log
  - Redo will follow the regular process of redoing all operations from the transaction log
  - Since the system takes checkpoints frequently, this portion of the log will usually be bounded
Undo Phase:
- Undo uses Logical revert to point back to the versions in the PVS and will only have to undo a small subset of logical operations making it almost instant.

This improved recovery process makes the redo portion much more effective and the undo portion almost instantaneous. And with the addition of the Slog and logical revert and system operation optimizations, the transaction log can be aggressively truncated.

Cleanup

Since all data modifications generate new row versions, a cleanup process runs in the background to cleanup these versions. In the case of successfully committed transactions, the versions in the version store are only required for read purposes for RCSI/Snapshot transactions. Once these transactions are complete, these versions can be removed immediately as they are no longer needed.

In the case of aborted transactions, the versions generated in the PVS are the ‘true’ committed data and need to be brought back to the data page at some point. The cleanup process differs based on which version store is used.

Off-row version cleanup

Off-row version cleanup is straightforward at a high level. The versions from the version store are brought back to the data page and cleaned up from the version store. This process is logged in the transaction log.

In-row version cleanup

For this cleanup, SQL Server will traverse data pages with in-row versions, and update the data page by overwriting the aborted row with the last committed row. Once complete, the diff can be cleaned up.

Optimization to overwrite aborted transactions

For cases where a new transaction updates a data page that has an existing aborted transaction, it would be too expensive to run a cleanup to revert back to the last committed transaction on demand prior to the update. In these cases, there is an optimization built in to directly overwrite the aborted version with the new version, and still point back to the last committed version.