This editorial that was originally published on Aug 15, 2007. It is being rerun as Steve is traveling.
This is an interesting list of disasters that can befall a production system. It comprises more than database servers, but certainly can happen to them as well.
The list is pretty good and of these I’ve had more than a few of them happen to me. There were a few that I’ve never seen and it makes me worry a bit that this guy doesn’t have a good staff or set of vendors working with him. But one of these is very interesting.
The second item on the list, a controller going bad and corrupting disks, is an interesting one in today’s world. What would you do if this happened on your SAN? Actually I know what you’d do. First you’d be in denial, and I don’t mean the river in Africa.
Then you’d tell the SAN guys. They wouldn’t believe you. You’d argue and they’d check things, then they’d wonder how it could happen and why? You’d scream at them to fix it, probably using 4 letter, not 3 letter words. They’d start looking for ways to recover data and your hair would be slowly thinning as upper managers started calling down looking for answers.
It shouldn’t happen, but it could. Some marketing VP is showing off that nice piece of SAN equipment with it’s high speed switches, dozens of drives, and lots of colored wires. He sloshes some coffee onto the system, freezes, but when the flashing red lights and siren from the Enterprise don’t go off (Red Alert!), he continues his tour while the controller scribbles on your disks.
Restoring from backup might be easy. Of course if you have a system like some I’ve seen that shares physical disks among multiple LUNs, it might not.
Disaster Recovery is rarely the hurricane Katrina type of issue. Usually it’s something like a disk drive, raid controller, cut wire, etc. that you have to deal with. So think about all of the minor disasters that you want to be sure you can handle and get some practice in on those. Make sure you have spare parts, you can rebuild a server (QA machines are handy for this), you know how to perform a restore.
And most of all, be sure that you know where the backup files are stored. Preferably on different disks than the production data.