I once worked at a large, 10,000+ person company. We had a large data center with hundreds of machines, where we one day we lost power. Not power from the electric utility and had our UPSes and generator kick in. We lost power when some maintenance caused all of our UPSes to trip off line and cut power to all the servers.
I was in the data center, surprised by the sudden quiet. Unfortunately one of our senior executives was also in the data center and proceeded into the raised floor area. As various technicians and sysadmins attempted to restore power and reboot systems, this senior executive watched, commenting, questioning, and often berating the employees. Not a good situation for anyone, least of all the people trying to reconnect high voltage wires together.
Most of you will never experience a large disaster and need to recover your systems. Even fewer of you will recover from disasters with anyone other than your peers or a direct manager watching you. However you shouldn’t count on being that lucky. Whether the disaster is small or large, your fault or a natural occurrence, I hope that you are able to successfully restore your systems with some professionalism and grace under pressure.
The key to a strong performance in a stressful situation is the same in technology as it is in sports, music, and almost any other endeavor with an audience. They key is practice.
Simulate disasters, pretend that refresh of a development system is really a restore after a fire. Think about the various possible scenarios that might require you to recover a system and incorporate practice time into your daily routine.
The Voice of the DBA Podcasts
We publish three versions of the podcast each day for you to enjoy.