This piece was originally published in Database Weekly.
Two words that no database administrator ever wants to put together are “data” and loss”. We go to great efforts to ensure that our SQL Server data is protected, backed up, available on alternate systems, even replicated to remote machines. Our goal is always to have zero data loss in all situations, even in those situations where we cannot prevent downtime from occurring.
This past week an article at InfoWorld caught me eye with the phrase “solid-state drives suffer data loss” in the subtitle. The article is written about a study from HP Labs and Ohio State University that studied the effects of power loss on various SSDs. Using a number of “enterprise quality” SSDs from various manufacturers, the authors of the study cut power to the drives while they were in use, a scenario that I’d expect would happen periodically to systems. We reboot all kinds of systems at times without shutting them down at times, for a variety of reasons. I’d expect that file systems and low level software would properly handle these situations in a manner similar to SQL Server, with some sort of recovery running when power was reapplied.
In the study, there were six potential failure types, and of the fifteen manufacturers whose drives were used, five of these failures occurred. The susceptibility of various kinds of corruption to power faults led the authors to conclude that systems with critical data should not use SSDs, test them thoroughly since they were not aware of ways to design a storage system to account for these potential issues. They did not say that SSDs were more unreliable than hard drives, but in this specific type of disaster, unexpected power loss, there could be problems.
We can design around these issues somewhat with redundant UPSes, battery backing of SSD cards and more, but disasters in the form of an accident can occur. I’ve seen two occasions where large data centers lost power because of maintenance work on power distribution units resulted in improper fail overs. The more complex our system design becomes, and the more we use SSDs in databases, the more likely for some failure or corruption to occur.
I wouldn’t panic and remove SSDs from systems, but I would ensure that any disaster recovery processes and procedures I had were up to date and working. You never know when you will need to recover from hardware issues on a primary system and have to restore on completely separate hardware. Advance preparations and practice are the only way to ensure you can successfully recover when the need arises.