This excerpt is from the book The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems Vol 2, reprinted with permission of the authors and publisher Pearson/Addison-Wesley Professional.
Capacity planning needs to provide answers to two questions: What are you going to need to buy in the coming year? And when are you going to need to buy it?
To answer those questions, you need to know the following information:
- Current usage: Which components can influence service capacity? How much of each do you use at the moment
- Normal growth: What is the expected growth rate of the service, without the influence of any specific business or marketing events? Sometimes this is called organic growth.
- Planned growth: Which business or marketing events are planned, when will they occur, and what is the anticipated growth due to each of these events?
- Headroom: Which kind of short-term usage spikes does your service encounter? Are there any particular events in the coming year, such as the Olympics or an election, that are expected to cause a usage spike? How much spare capacity do you need to handle these spikes gracefully? Headroom is usually specified as a percentage of current capacity.
- Timetable: For each component, what is the lead time from ordering to delivery, and from delivery until it is in service? Are there specific constraints for bringing new capacity into service, such as change windows?
From that information, you can calculate the amount of capacity you expect to need for each resource by the end of the following year with a simple formula:
Future Resources = Current Usage x (1 + Normal Growth + Planned Growth) + Headroom
You can then calculate for each resource the additional capacity that you need to purchase:
Additional Resources = Future Resources ñ Current Resources
Perform this calculation for each resource, whether or not you think you will need more capacity. It is okay to reach the conclusion that you don't need any more network bandwidth in the coming year. It is not okay to be taken by surprise and run out of network bandwidth because you didn't consider it in your capacity planning. For shared resources, the data from many teams will need to be combined to determine whether more capacity is needed.
Before you can consider buying additional equipment, you need to understand what you currently have available and how much of it you are using. Before you can assess what you have, you need a complete list of all the things that are required to provide the service. If you forget something, it won't be included in your capacity planning, and you may run out of that one thing later, and as a result be unable to grow the service as quickly as you need.
What to track
If you are providing Internet based services, the two most obvious things needed are some machines to provide the service and a connection to the Internet. Some machines may be generic machines that are later customized to perform given tasks, whereas others may be specialized appliances.
Going deeper into these items, machines have CPUs, caches, RAM, storage and network. Connecting to the Internet requires a local network, routers, switches and a connection to at least one ISP. Going deeper still, network cards, routers, switches, cables and storage devices all have bandwidth limitations. Some appliances may have higher-end network cards that need special cabling and interfaces on the network gear. All networked devices need IP addresses. These are all resources that need to be tracked.
Taking one step back, all devices run some sort of operating system, and some run additional software. The operating systems and software may require licenses and maintenance contracts. Data and configuration information on the devices may need backing up to yet more systems. Stepping even farther back, machines need to be installed in a data center that meets their power and environment needs. The number and type of racks in the datacenter, the power and cooling capacity and the available floor space all need to be tracked. Data centers may provide additional per-machine services, such as console service. For companies that have multiple datacenters and points of presence, there may be links between those sites that also have capacity limits. These are all additional resources to track.
Outside vendors may provide some services. The contracts covering those services specify cost or capacity limits. To make sure that you have covered every possible aspect, talk to people in every department, and find out what they do and how it relates to the service. For everything that relates to the services, you need to understand what the limits are, how you can track them and how you can measure how much of the available capacity is used.
How much do you have
There is no substitute for a good up-to-date inventory database for keeping track of your assets. The inventory database should be kept up to date by making it a core component in the ordering, provisioning and decommissioning processes. An up-to-date inventory system gives you the data you need to find out how much of each resource you have. It should also be used to track the software license and maintenance contract inventory, and the contracted amount of resources that are available from third parties.
Using a limited number of standard machine configurations and having a set of standard appliances, storage systems, routers and switches makes it easier to map the number of devices to the lower-level resources, such as CPU and RAM, that they provide. Next: How much are you using now?