Triple Check Your Restores

I used to work at a nuclear power plant. Not really in the plant, but as a network administrator in the building next door to the plant. Probably a good thing since I struggled to get to work on time and everyone going into the plant had to go through a metal detector like that at most airports. My tenure might have been shorter if I had been late every day to my desk.

However, there was one thing that got drilled into me by each person I knew who did work closely with the power generation side of the business. Everything was (at least) triple redundant. Not only did they want a backup for a component, or a system, but they wanted a backup for the backup. There was a limit to the paranoia, but in critical places, where radiation was concerned, we had multiple backups. One notable, and impressive, area was the power for the control rooms and pumps. In addition to batteries for short term power loss, there was a large diesel generator for each of our two reactors, plus a third that could take the load if either of the first two failed. Those were impressive engines, each about the size of a very large moving truck and jump started with a few dozen large canisters of compressed air that could spin the crankshaft in a split second.

This week there was a report that the database for the US Air Force Automated Case Tracking System had crashed. Apparently the database became corrupted, which happens. However, the surprising part of this story is that the company managing this system reported they didn’t have backups and had lost some data going back to 2004. They are looking to see if there are copies in other places, which I assume might mean exports, old backups or something else, but the reports make this seem like a completely unacceptable situation. I assume this is an RGE event for a few people, perhaps all of the staff working the system.

I was reminded of my time at the nuclear plant because we had a similar situation. We didn’t lose any data, but we found a backup system hadn’t been working for months. Those days we had a tape drive system that automatically rotated 7 tapes. I think this would last us about 4 or 5 days, so it was a once a week job for one administrator to pull the used tapes and replace them with new ones. We had a system where tapes were used 4 or 5 times before being discarded, and our rotation had a tape being used every 3-4 months. However, the person managing the system rarely restored anything.

One day we decided to run a test. I think this was just my boss giving us some busy work to keep us occupied but in a useful way. When we went to read a tape, it was blank. Assuming this was just a mix-up, we grabbed one of the tapes from the previous day and tried it.

Blank.

At this point, my coworker turned a bit red and started to stress. He was in his 40s, with a family and mortgage. I was in my early 20s and had no responsibility here, but I could appreciate his concern. We frantically loaded tape after tape, even looking at the oldest tapes we’d just received from our off-site provider. None were readable, and most were blank. We nervously reported this to our boss, who had us request a sample of tapes from off-site storage going back over 6 months.

Eventually we realized that we hadn’t had any backups for about 4-5 months. The tape drive had stopped working properly, hadn’t reported errors, but dutifully kept retrieving files and rotating tapes each week, unable to properly write any data. No databases, no email, no system was being backed up.

A rush order to our computer supplier had been placed the first day to get us two working tape drives that we manually loaded tapes in each day, and checked them the next morning. Eventually we replaced the drive in our tape leader and instituted random weekly restores to be sure we had working backups. I’m not sure if the plant manager or upper IT management was ever told, but I’m glad we never had to deal with a hard drive crash during that period.

Backups are something we all need to perform. I note this as the #1 thing a new DBA or sysadmin should perform on systems. However, backups are only good if you can read them and actually restore data. I’ve made it a point to regularly practice restores as a DBA, randomly restoring backups with diffs, logs, or to a point in time. Not only do I test the backup, but I test my skills. I’ve also tried to keep an automated process around that restores all production systems to another server to test both the restore as well as run a DBCC CHECKDB. Corruption can live in databases for a long time. It flows through backups, at least in SQL Server, and this is something to keep in mind.

I’d suggest that you make sure you ensure that your backup plan is actually working by performing a few restores. Build an automated process, but also run some manual restores periodically. You want to be sure that you can really recover data in the event of an emergency.

Steve Jones

The RGE

I first heard this little acronym from Grant Fritchey (b | t). He used it when talking about backups and restores, and I like it. However I realize that I’ve never actually noted what it is, so a short blog to do so today.

An RGE is a Resume Generating Event. This is usually when you make a mistake so egregious that you’ll be packing up your personal effects and exiting the building. If it’s really bad, such as releasing financial or other confidential information, you might be escorted out and someone else packs up your things. I’ve seen it happen, and it will shake you. Don’t do this.

We talk about forgetting about backups, or writing bad code or some important task we often perform as causing an RGE. In my experience, that doesn’t happen too often. Companies usually have a fairly high tolerance for mistakes.

However, that tolerance is usually extended only once. Don’t make the same mistake again. I’d also note that some managers can be very short tempered, and a single, large issue might be an RGE in their eyes.

I don’t usually worry about causing an RGE, but I keep the acronym in mind. Especially when I do something that could affect the core parts of my organization’s business.

Back Home, Despite a Scare

I’ve spent the last 11 days traveling in Europe, returning home late yesterday. It wasn’t a holiday or even corporate boondoggle as this was 3 cities, 2 events at which I spoke, travel in planes, trains, and automobiles, a minor sinus infection, and movement across 6 hotels. All with just this luggage:

13087787_10209272872407736_4048621762789380628_n

Packed too full, and heavy, but I made it.

I returned late afternoon and in between spending time with family, I booted up my desktop and was surprised to find that Dropbox gave me an error. As I dug in, one of my SSDs wasn’t visible in Windows, containing VMs, my local DropBox folders and a few other things. Not what I wanted to deal with the first night back.

I shut down the system and rechecked connections this morning, then rebooted to find things working, but it was a good reminder to double check my backups and ensure that I’ve got copies of data in case I do lose an SSD.

Losing All Traces of Data

I was reading a thriller recently, in which a businessperson had their child threatened if they didn’t get some data for the criminals. Once the person had retrieved the data, they were told to delete it from the system and from all backups. Of course, they could do this, all in a few paragraphs of a novel. I’m sure plenty of lay people read this passage and accepted it as a possibility

While I certainly understand how a user might be able to delete data from a system, especially in many third party applications that are poorly written but have sold well. However, could someone actually delete all information from backups? I’d say that in most of the companies I’ve worked in, this wouldn’t be possible. If the information was of any age, it would be stored in multiple locations on different media, some of which would be offline.

However I haven’t worked lately in some enterprises where companies have moved to using disk backups, with systems connected together and managing versions. I suspect that it is possible in some of these enterprises to actually remove all traces of data from the organization, which isn’t what I’d ever want possible. If for no other reason than this is an incredible attack vector for ransomware, a malicious virus, or some other destructive process (including rm -rf). There’s also the issue of mistakes made by users; should they be able to remove all traces of data?

There may be valid reasons to remove all copies of data from an organization, especially when some time has passed. However I think this should be a difficult process, have some hurdles to overcome, not the least of which is physical access, and should require multiple people to approve the actions. As we connect more and more systems, and rely on data being available, allowing anyone to permanently remove data without oversight will become a problem.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.8MB) podcast or subscribe to the feed at iTunes and LibSyn.

Multiple Backup Files

I’ve been writing a little PowerShell lately that will back up databases, move files, and restore them. I’m often testing things, and having scripts to quickly and easily move files is very handy. After one of my posts recently, a reader asked if I’d considered multiple files and how to handle them in scripts. I confessed I hadn’t, mostly because I haven’t had to deal with them.

In my career, I’ve tended to work with small to medium sized databases. I’ve had young children for a large portion of my SQL Server career and had no desire to babysit multi-hour (or multi-day) restores when things break. I know some people have been through those situations, and good for them. I just know I’ve had enough issues with the low-GB sized database, and haven’t been interested in supporting TB sized systems.

However that likely wouldn’t be the case in the future. More and more companies are collecting and storing data that reaches into the TB, even in small companies. The rapid advances in sensors, development tools, and cheap storage means that many people are dealing with hundreds of GB in at least one of their databases. That means for a reasonable RTO, making quick backups, and maintaining good performance, multiple backup files are becoming a necessity.

Is that really the case? Data volumes are exploding, but you many of you using this feature? I wanted to see how many people have implemented, or at least thought of striped backups. The poll this week is:

Do you have any databases that benefit from backup to multiple files?

I’ve consulted with clients that accidentally produced striped backups and then lost one of the files. That’s never a good situation, and it’s bad news to have to give as a consultant. However, I’m sure many of you have consciously implemented striped backups because they can perform better than single file backups and make for quicker restores. Others of you may suspect (or have tested) striped backups will help your systems but haven’t gotten around to setting things up.

Let us know if this is a common feature you use, or is it still something esoteric that you have no need for. And if you have put striped backups in place, have you tested a striped restore? I certainly hope so. Let us know either way.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.7MB) podcast or subscribe to the feed at iTunes and LibSyn.

Powershell – Copy the Latest Backup

I got an email recently where someone asked me how they can refresh a dev environment with Powershell. I guess I’d written something about this in 2009, though that would have been for testing as Red Gate had already banned me from development on SQLServerCentral by that time.

I dug around and came up with a few partial scripts and cleaned them up for these posts. This post will look at getting the backup and a later one will examine the restore.

Finding the Latest Backup

I’ll assume that you make backups on a known path somewhere. My philosophy is that I want the machines to stand alone as much as possible. That means that I don’t want the source machine (the one making the backup) to be working on refreshing the backup elsewhere. I want a pull system.

For a high level overview, this process looks like this:

  • Search the backup path for files matching a pattern.
  • Find the most recent one, based on date.
  • Copy that most recent file to another location.

For the sake of consistency and easy, I want to copy the file with the same destination name every time. That will simplify my restore process, which I could easily then do in T-SQL.

Let’s examine how to do this. I’ve got a folder with a few backups in it.

backuplatest

For my PoSh, I’ll start by setting a variable to the path.

$backuppath = "D:\SQLServer\MSSQL11.MSSQLSERVER\MSSQL\Backup" 

Once I have this, I can now look for the files in this path. To do that, I’ll use Get-Children.

get-childitem -path $backuppath

This will return to me a list of the files. That’s what I want, but I want to limit the files to a pattern. In this case, I’m looking for .bak files, from the EncryptionPrimer database. All of these

There’s nothing special about what I do that’s not contained in plenty of places. I don’t have this running on an environment currently as someone else manages that process, but here’s the process I’ve followed in the past:

  • Find the latest backup (whatever the date) in the source folder.
  • Copy this with a set name to the destination folder, overwriting previous backups with the same name.
  • Restore the known name to the development database, moving files as needed.

I’ll go through each of these steps in my PoSh script.

Find the Latest Backup

This is fairly easy. I’ll use the Get-ChildItem method, which I found in a StackOverflow post. I’ll use a variable for the path I need, and then check the path.

$backuppath = "D:\SQLServer\MSSQL11.MSSQLSERVER\MSSQL\Backup"

get-childitem -path $backuppath

That works well, but since I’m building a process for a specific backup type, I’ll add a filter.

$backuppath = "D:\SQLServer\MSSQL11.MSSQLSERVER\MSSQL\Backup"

get-childitem -path $backuppath -Filter "EncryptionPrimer*.bak"

 

To find the latest backup, we’ll pipe the output through the Where-object filter, removing folders. Then we use sort-object to order things by creation date and select-object to get just the one file.

$backuppath = "D:\SQLServer\MSSQL11.MSSQLSERVER\MSSQL\Backup"

get-childitem -path $backuppath -Filter "EncryptionPrimer*.bak" |

    where-object { -not $_.PSIsContainer } |

    sort-object -Property $_.CreationTime |

    select-object -last 1

 

The last part of the script is the copy-item command, which is again the recipient of piped output. We give a standard name, and path (another variable).

$backuppath = "D:\SQLServer\MSSQL11.MSSQLSERVER\MSSQL\Backup"

$destpath = "d:\SQLServer\Backup"

get-childitem -path $backuppath -Filter "EncryptionPrimer*.bak" |

    where-object { -not $_.PSIsContainer } |

    sort-object -Property $_.CreationTime |

    select-object -last 1 | copy-item -Destination (join-path $destpath "EncryptionPrimer.BAK")

Once this is done we can restore things. I learned how to do this from PoSh using this post: http://stuart-moore.com/day-11-31-days-sql-server-backup-restore-using-powershell-basic-restore/

However, since I have a standard backup file name, I’d probably do this in T-SQL and set a job that I can just run anytime. It’s simpler and easier, and since most of the time I’d want to do this from SSMS, a job works well.

Here’s the PoSh script.

Import-Module "SQLPS" -DisableNameChecking

$sqlsvr = New-Object -TypeName  Microsoft.SQLServer.Management.Smo.Server("JollyGreenGiant\SQL2012")

$BackupFile = "D:\SQLServer\Backup\EncryptionPrimer.BAK"

#

#echo ""

#echo "Databases"

#echo "———"

#foreach ( $db in $sqlsvr.Databases) { write-host $db.name }

echo " "

echo "Begin Restore"

echo "============="

$Restore = New-Object "Microsoft.SqlServer.Management.Smo.Restore"

$Restore.NoRecovery = $false

$Restore.ReplaceDatabase = $true

$Restore.Action = "Database"

$Restore.PercentCompleteNotification = 10

$BackupDevice = New-Object ("Microsoft.SqlServer.Management.Smo.BackupDeviceItem") ($BackupFile, "File")

$Restore.Devices.Add($BackupDevice)

$RestoreDetails = $Restore.ReadBackupHeader($sqlsvr)

$logicalFileNameList = $Restore.ReadFileList($sqlsvr)

$Restore.Database = $RestoreDetails.Rows[0]["DatabaseName"]

foreach($row in $logicalFileNameList) {

    $RestoreDBFile = new-object("Microsoft.SqlServer.Management.Smo.RelocateFile")

    $RestoreDBFile.LogicalFileName = $row["LogicalName"]

    $RestoreDBFile.PhysicalFileName = $row["PhysicalName"]

    $Restore.RelocateFiles.Add($RestoreDBFile)

        }

$Restore.SqlRestore($sqlsvr)

write-host ("Completed the Database Restore operation on server for Database " +  $RestoreDetails.Rows[0]["DatabaseName"] + " on server $server")

 

That’s it. I ran this a few times, and it worked well. A handy script to get the last backup and have it ready in a dev/test environment.

Backup Your Certificate for TDE

If you’ve enabled TDE, you need to be sure you have a copy of the certificate that protects the Database Encryption Key (DEK). If you follow my instructions, then you have one.

If you didn’t make a backup, or you have just discovered a TDE database, make one now, and secure the password you use with your DR materials (off site).

How do you make a backup? That’s easy. Use the BACKUP CERTIFICATE command. Here’s the command I use in demos:

USE master ; go BACKUP CERTIFICATE TDEPRimer_CertSecurity TO FILE = 'tdeprimer_cert' WITH PRIVATE KEY ( FILE = 'tdeprimer_cert.pvk', ENCRYPTION BY PASSWORD = 'AStr0ngB@ckUpP@ssw0rd4TDEcERT%') ; go

 

The certificate for TDE is in master, so you must make sure you’re in master for the backup. The TO FILE option lets you choose the file path. By default, this will be in the DATA folder for your instance, but you can choose other locations. You can give an extension if you like. This file is the certificate (public).

There is a private key portion of the certificate, which is backed up with the “WITH PRIVATE KEY” portion of the command. This is where you specify the password and provide the protection for your certificate.

You will need this password on restore, so keep track of it.

Restoring a TDE Database on a New Instance

You’ve enabled Transparent Data Encryption (TDE) on one of your databases, and the server has failed. How do you get this database working on a new instance? This short post will show you how this works.

Files Needed

There are two files you need in order to restore the database.

  1. A full backup of the TDE database
  2. A backup of the server certificate that protects the Database Encryption Key (DEK).

You might have multiple files for the backup, and potentially other backup files (diff, log), but the process for those will be the same as any other restore once you complete this process.

If you have multiple full backup files (striped backup), just include them in the restore command as you normally would.

Prepare the New Instance

To prepare the new instance for restore, you need to ensure that you have a database master key (DMK) in the master database. You can do this by checking the master_keys DMV.

SELECT * FROM sys.symmetric_keys

 

If you have a key, that’s fine. If you don’t, you can create one like this:

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'sdkj3G3$sh'

Now you need to restore the certificate from your source instance. You do this with the CREATE CERTIFICATE command, and the FROM FILE option. You’ll typically find the backup from your BACKUP CERTIFICATE command in the DATA folder for SQL Server if you didn’t specify a complete path.

You do have a backup, right?

For me, I’ll run the create certificate command:

CREATE CERTIFICATE TDEPRimer_CertSecurity FROM FILE = 'D:\SQLServer\MSSQL11.MSSQLSERVER\MSSQL\Backup\tdeprimer_cert' WITH PRIVATE KEY ( FILE = 'D:\SQLServer\MSSQL11.MSSQLSERVER\MSSQL\Backup\tdeprimer_cert.pvk', DECRYPTION BY PASSWORD = 'AStr0ngB@ckUpP@ssw0rd4TDEcERT%') ; go

 

This works

tde_a

Now I’m ready to restore the TDE database.

tde_b

I select my backup file, and everything proceeds as a normal restore. How do I know the certificate worked?

Because I have a “Ready” at the top of the dialog.

tde_c

If I didn’t have the certificate on the instance, I’d get this:

tde_d

Double clicking that would bring up the error:

tde_e

If the certificate is not on the instance, then the server cannot decrypt the DEK and restore the database.

Hope this helps, and if you use TDE, make sure you can do this.

The DBA Team #1–Code and Slides

Our first DBA Team event, in Richmond, VA just before SQL Saturday #187 went well. Overall I think our experiment was a success and we’re already talking about where and when we might do this again.

In the meantime, we didn’t make a separate site for this series of events, being an experiment and all. I’m adding this post as a placeholder for the various slide decks and code.

Core Monitoring for SQL Server (Steve Jones)

Good DBAs ensure that they are always aware of the state of their instances. All systems should have monitoring in place, not just so you know when things go wrong, but so you understand what a normal workload looks like and can plan for the future. This session will cover the basics of monitoring a SQL Server system and the various metrics you should be tracking.

Getting Started with SQL Server Backup (Grant Fritchey)

Backups are fundamental to protecting the investment your business has in its data and they’re the foundation of disaster recovery planning. We’ll go over best practices for database backups, to ensure you’re establishing that foundation correctly within your systems. This introductory level session covers full, log, and differential backups, as well as restores and restores to a point in time. Come along to be sure you’ve got the right protection in place for your systems.

Understanding Database Corruption (Grant Fritchey)

A DBA’s primary purpose is to ensure that the information in their charge is accessible by the correct people within their organization. Despite everything you do to make sure you’ve got your servers configured, monitored, and tuned, with well-tested backups in place, you can still lose data through corruption. But what is corruption in a database? This session lays out exactly where database corruption can come from, how to find out exactly where the corruption is within a database, and  the methods you have available to recover from database corruption.

Indexing for SQL Server (Steve Jones)

Indexes are important for improving the performance of your queries, but they add overhead to your server and require maintenance. This session examines how indexes work and the basic maintenance that you should perform to ensure your system is running at its peak level.

The Backup Process

This used to be my backup process. Five copies, one for each day of the week.
This used to be my backup process. Five copies, one for each day of the week.

I was reading Scott Hanselman’s post onThe Computer Backup Rule of Three, and I completely agree with him. If things are important, three copies are the minimum you should keep. I have had backups before, but I’ve found that when something breaks, like a tape drive or hard drive, I might not get to replace it right away. Usually this is procrastination, or as it’s more commonly known, laziness. During that window of opportunity, I’ve had Mr. Murphy strike. My inherent paranoia of having that second extra copy has saved me, both personally (at home) and professionally (at work).

All too often I run into DBAs that seem to trust and assume their backups and processes are very reliable. After setting up a backup process, usually to disk, and testing it with a restore on another system, they assume it will work in the future. It should, but there is always the chance something will fail at some point in time. There’s the chance that some hardware failure or software reconfiguration will cause an issue with your process. There’s also the simple chance that your I/O system might introduce corruption into your backup files.

If you aren’t regularly testing your processes, you can’t be sure they are working as they have in the past. The ideal environment would perform a restore of every backup file taken, every day, but that isn’t always possible. However restoring a random backup once every month or two will help you to ensure your backup, and restore, processes, have the best chance of succeeding when you actual need to perform them because of a disaster.

This isn’t hard to script and automate, but it is something you need to do periodically. However even if you automate restores of all your databases, make sure you still practice your manual skills regularly. Automation might not work in a disaster situation, but those quarterly practice sessions restoring a database to a point in time with a tail log might just relax you when a crowd in your cube is asking if the database will be back up soon.

Steve Jones


The Voice of the DBA Podcasts

We publish three versions of the podcast each day for you to enjoy.