Wednesday, February 09, 2005

What happens during the online Normal backup?

Mike Lee of the Exchange Team posted recently a good description of how a backup really works.  This is worth the read...

Note: the following applies to “streaming” backups, not to the VSS backups.

 

Exchange transaction logs track all the changes to a database. If a database crashes, no harm done, because the transaction logs always secure changes before the crash. When you start the database again, the logs have everything needed in them to get you going back up to the instant of the crash. This crash recovery mechanism is handy too after restoring from backup. When you restore from backup, the transaction logs let you "roll forward" and not lose any changes that happened since the backup, because the logs know every change that happened to the database since the backup was done.

 

Of course, since the logs record everything, they can use up a lot of disk space. Every now and then, you have to delete the old logs. The usual way this is managed is to have the online backup process take care of it. Once you have the database backed up, old logs before the backup can be removed, and they are removed automatically as part of backup. At least, that's the way it worked in Exchange 5.5. In Exchange 2000 and 2003, it's more complex.

 

In Exchange 5.5, either all databases on a server were running or all databases were down. You couldn't pick and choose which ones you wanted to mount. This made it a cinch to figure out which log files it was safe to delete. But in Exchange 2000 and 2003, you might leave a database unmounted for months without backing it up. The Exchange team had to deal with that. So the rules about removing old log files during backup had to change. Here are the new rules:

 

- If you want backup to truncate (remove) any of your old log files, then all databases in the storage group you are backing up must be running when any one of the databases in that storage group is backed up. If even one database is down, no log files will be removed, period.

 

- If you want backup to truncate (remove) old log files on disk in a reasonably prompt way, then you must backup each database in a storage group in a reasonably prompt way. Generally, the log files in each storage group will be as old as the oldest not-backed-up database in the storage group.

 

So, don't leave a database unmounted for weeks and weeks, and backup all (not just some of) your databases frequently and you'll be fine.

 

If you want to know the details about how this works, then keep reading. But you don't have to. Really, this next section is more punishment than enlightenment. But, if you're still here, here's the process that happens when you do an online backup of an Exchange 2000 or 2003 database, or multiple databases in the same backup:

 

1. The backup agent establishes communication and initializes a backup session with the Information Store service on the target Exchange server. (In Exchange 5.5, the backup session was established with the System Attendant process.)

 

2. Depending on database transactions during the backup, the transaction logging checkpoint could be "frozen" during online backup. New changes will still be accepted and written into the database files, but the checkpoint might not move again until the backup session ends. The checkpoint log is the oldest log that has not yet had all its transactions flushed to the database files. Typically, the checkpoint lags three or four log files behind the current transaction log. If the database were to crash suddenly, Exchange knows that logs older than the checkpoint are not needed for recovery. The checkpoint log and all newer log files must be replayed into the database to recover it after a crash. When a backup is restored, the situation is somewhat similar to recovering after a crash - a certain number of log files must be replayed into the database after restoration, and these log files are always backed up with each database.

 

3. The first log that must be copied to tape with the backup is recorded in the database header in the "Current Full Backup" section. This may or may not be the current checkpoint log, depending on the backup status of other databases in the storage group.  However, this will always be the checkpoint log or an older log, never a log newer than the checkpoint.

 

4. Copy of the database files to tape begins.  If you are using Exchange 2000 SP1 or earlier, a patch file is generated (database_name.PAT). The patch file holds database changes that occur during backup that are not captured in the log files. During normal operation, all changes to the database can be reconstructed or replayed from the transaction log files. But, during backup, there may be certain complex changes that affect parts of the database already backed up as well as parts of the database not yet backed up that are not captured in the log files. These changes go in the .PAT file. Improvements made in Exchange 2000 SP2 resulted in the number of such changes being so small that they could be suspended in memory during backup of a very large database over a very long period of time without ill effect. These changes are now applied to the database files after backup finishes. If used, the .PAT file is copied to tape after the database files have finished, and then is deleted. 

 

Along with capturing changes during backup that logging does not, the patch file also contains important header information used during restoration. When the patch file was dispensed with in SP2, this information was moved into the backed up database file. Now, as a database file finishes being copied to tape, Exchange adds a single extra page to the very end of it--on tape, but not on disk. Thus, a database in an online backup is always 4K larger than the same database on disk. This page is a "mini header" that records the essential information that used to be in the .PAT file header.

 

If you run Eseutil /MH on a database that has been restored from online backup but on which recovery has not yet run, you will see the "mini header" information displayed as the Patch Current Full Backup section.

 

5. The current transaction log file is forced to "roll over" and close immediately after all database files have been copied to tape. This happens regardless of how full the log is. The current log is always named Enn.log (where nn is 00, 01, 02 or 03). After a log closes, it is given a five digit hexadecimal "generation number." So Enn.log becomes EnnXXXXX.log, and the XXXXX keeps increasing by 1 as each new log is generated.

 

The reason the log is forced to roll over is that log files cannot be backed up while they are open. This log needs to be on tape, because it contains operations applicable to the database(s) that were just backed up. Therefore, the log is closed so it can be appended to the tape. You will never see a log file called Enn.log in an online backup set. Only closed, XXXXX- numbered log files are backed up.

 

6. The range of logs needed to reliably recover the backup are copied to tape. These will include at least all the logs starting from the frozen checkpoint up through the log that was just forced to close.

 

Note that if all databases are mounted in the storage group and all databases have been selected for backup, then this range of logs will only be from the checkpoint log to the highest available numbered log. But if some databases are dismounted or not all databases are being backed up, then the range of logs copied to tape may reach back before the current checkpoint. In any case, Exchange ensures that all logs needed for replay into the backed up databases will be present on tape. Exchange follows a "better safe than sorry" policy, and always backs up all logs that might possibly be needed to get the database running again after restoration.

 

7. Log files that no database in the storage group needs for recovery are truncated (deleted from disk). The headers of all the databases in a storage group keep track of last backup time for each database, and which logs were required. If any database in a storage group is dismounted, its header will not be read and Exchange will make no calculations about which log files can be safely deleted.  And this means...if any database in a storage group is dismounted while any other database is being backed up, no log files will be deleted as a result of the backup. If you want backup to remove log files from disk, you must have all databases in a storage group running while backup occurs, or nothing will ever get deleted. This is true even if you are only backing up a single database. All the other databases must be running, even if not being backed up, in order for log truncation to happen. If all databases are mounted, then Exchange will cross-reference all the headers, and figure out which log files it is really safe to delete.

 

8. The Previous Full Backup section of the database header is updated to reflect the time and log range of the backup that just completed.

No comments: