This is the final lecture this week at SQL University. We are covering the topic of capacity planning this week for SQL Servers. In the first lecture we looked at an overview of capacity planning and some general considerations for DBAs. Our second lecture dove deeper into disk capacity planning, examining both space and throughput requirements.
Today we talk about a few other considerations for keeping control of your environment and planning for the future. By
Disaster Recovery Planning
One of the places where I find people skimping on capacity is with Disaster Recovery (DR) systems. Typically the hardware size for DR servers is some fraction of the regular production servers. This is often because DR servers aren’t often used, or maybe never used, by many companies. I know that I have not used them 99% of the time in my career, and as a result have typically not sized them to handle the full load of my primary systems. The logic being that in a disaster, likely some of our customers will also be affected, and most of them are likely to put up with slower performance if disaster strikes.
That may not be true in your environment, and I know a number of companies that size their DR environment exactly as the same size as their primary systems. For your situation, please ensure that you handle this appropriately and size your systems, and set expectations for business people, based on what you do.
However even if you size your DR system at the same level as your production server, you are not done. In three companies that I have worked with, they sized their DR server appropriately, but then forgot about them as they upgraded and expanded their live production servers. In all cases we were fortunate to discover this fact without a disaster. A test in one case and audits in the others exposed the fact that a disaster would have been a true disaster with us not able to even restore because of a lack of disk space.
Quite often management doesn’t want to spend more money on DR than absolutely necessary, and that often means less resources for those systems. If you are stuck in this situation, be sure that you quantify the issues for management and document differences. You should try to interpolate the possible performance issues, or test them, and come up with specific numbers where possible. Document differences like “The sales report will run in 40 minutes instead of 4”, using verbiage that your managers will understand. They might agree to spend more money on DR, or you might need to use this as justification for poor performance when you do fail over.
Test and Development Servers
For each database that you have in production, you ought to have a test instance with a copy of his database. Even for third party databases, you should still have some suite of tests run against patches or service packs.
If you write code, then you also need to have a development instance, and if you have lots of clients, you might even have a User Acceptance Test (UAT) environment as well. I have also seen Customer Service instances in places that needed to replicate an issue quickly to work with a customer.
In terms of capacity planning, there are two issues to consider with all these additional instances: disk space and performance. You also face the cost and resource issues with additional instances, as well as a more difficult argument to get appropriate funding for these environments.
Disk space is always an issue as data sizes grow. If you imagine a 1TB database in your production environment, that usually means that you need a 1TB for each of your test and development instances as well as backup space. Since you often restore databases from a local backup file, this means that 1TB of data needs at least 2TB on each server, and across 3 environments (production, test, dev) that’s 6TB of space.
Each time to manage space in your production environment and proactively grow databases, you should be aware of the impact on other environments. If you buy more disk space for your main database, be prepared to buy more disk space in other environments as well.
Virtual Machines and cheap disks can ease the burden of setting up other environments and sharing hardware among some environments, but they might not be enough for some critical databases. No one may be concerned if that company birthday tracker runs a little slow, but a drop in performance on the financial analysis database could actually cost the company money.
There are always some databases that truly need a duplicate environment in order to test changes and gauge their impact on performance. In these situations you must ensure that as your production environment requires more resources, you are also adding those same resources to the test or development environments.
This seems obvious, but it can be easy to forget. Make sure that you have clear documentation and notes that remind you to order additional hardware resources for your other environments when you are adding production resources.
I have never really liked the term “human resources” as I think humans aren’t easily replaceable, and certainly aren’t interchangeable. Each person is unique and brings something to your environment
One of the capacity issues that seems to constantly be an issue for many companies is the employee staffing to handle the workload. As environments grow, companies are often slow to hire additional resources, even when it would allow them to substantially grow their business and increase revenue. I actually think that this is a good philosophy, but it can become a problem over time.
Each database server adds something to the workload of the administrators involved. If there is code to be written, even more load is added. While a smart DBA can be efficient, write code that helps him/her manage the servers as groups, and automate many of the tasks that need to be accomplished, there is still a limit.
I have managed hundreds of instances myself, but only when everything is automated and most of them aren’t critical and overloaded. Usually I find there are a few instances that require most of my time, and I spend every week or every day working with them. If you have two or three instances with issues, you can easily be overwhelmed.
As you grow your environment, you ought to have an idea of where you spend your time, and how much time you spend working on those instances that need lots of hand holding. If you find yourself regularly falling behind, or your hours constantly exceeding 50 in a week, you need help. You need to plan for additional people to help ensure you environment continues to run smoothly and you continue to function at a high level.
Don’t burn yourself out trying to handle too large a workload. Examine the human capacity of your environment periodically and manage that just like you manage disk space or CPU cycles.
This lecture concludes the Capacity Planning week for SQL University. We’ve looked at capacity planning at a high level, dived deeper into disk issues, and talked about a few things that people forget when trying to manage their environments.
If you have additional comments or questions, or would like to add your won notes to this week, please feel free to leave a comment or link to a blog/article below.
Thanks for attending this week of SQL University and enjoy the rest of your semester.