Programs may return a code when they exit. In C this is the value that main returns. By convention, success is 0, failure is non-zero.
Apparently, however, the developers of Robocopy — which is otherwise an exemplary and indispensable utility – didn’t get this memo, which caused a problem that plagued me for 2.5 years.
A bit of background here:
When a database is modified, the database engine writes the changes into the database proper, and also logs the change in aptly-named log files. It’s usual to put the log files on a separate physical volume from the database proper.
This way, if the volume containing the database proper fails, you can take your last backup, merge all the changes recorded in the log files into it, and it’s like nothing ever happened.
The issue here is that the database engine has to know that a backup occurred, at which point it can purge all log files (since you have the database in a consistent state with the transactions in the logs merged in, which makes them irrelevant and unnecessary).
Under the hood a bit here. Windows has the abilities to make shadow copies, which are basically a snapshot of a drive. So you take a snapshot — i.e. make a shadow copy — of a drive, and then you can copy off of it while the drive is still being used.
This is a great way to backup production servers. But what if you take a snapshot with a file to be backed up in an inconsistent state? And if you create a shadow copy and then copy it, how does a database engine know you created a backup?
For this reason, database engines have a VSS writer, which the shadow copy service can contact and notify about copies. When you create a shadow copy, the write is notified, and can delay the snapshot until it’s placed the files it controls in a consistent state. When you use a shadow copy to make a backup, the writers are notified and can purge logs.
The way you can do this simply and easily is with a command line utility present in Server 2008 and Server 2008 R2 — DiskShadow.
DiskShadow can operate unattended by running from a DSH file, which is essentially a script that runs within DiskShadow.
Here’s what one looks like:
set verbose on
#delete shadows all
set context persistent
writer verify {76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}
set metadata C:\Backup_Scripts\shadowmetadata.cab
begin backup
add volume C: alias SH1
create
expose %SH1% P:
exec C:\Backup_Scripts\exchangeserverbackupscript1.cmd
end backup
delete shadows exposed P:
exit
(A detailed explanation of the above can be found here.)
The important lines are “begin backup” and “end backup”. These cause VSS writers to be notified. And assuming success, on “end backup” will cause log files to be purged automatically.
However, when Robocopy returned 1 — causing the script containing it to return 1 also — DiskShadow thought the backup had failed, and therefore my Exchange — which is what was being backed up (its write has the identifier seen above) — thought the backup had failed, and therefore believed that I didn’t have a copy of the database in a consistent state, and therefore kept every log file for 2.5 years, which accumulated to 45 gigabytes.
For comparison, the database itself is only ~5 gigabytes.
The solution? Filter Robocopy’s return codes, as follows:
robocopy "P:\Program Files\Microsoft\Exchange Server\Mailbox\First Storage Group" "\\leahyfs\J$\E-Mail Backups\Day 1" /MIR /R:0 /W:0 /COPY:DT /B
IF ERRORLEVEL 1 exit /B 0
exit /B 1
You’ve been warned…