Blog

Data Center Capacity Planning: Complete Guide

Beginner’s Corner

Slika

UMBOSS Team

Apr. 24, 2024
6 min. read
Slika

A data center is filled with highly valuable and expensive elements. Oversizing all components undoubtedly creates a reliable and carefree playground, but it will inevitably lead to a financially unsustainable business and quickly cause the business to go bust. Therefore, properly designed and regularly updated capacity for the data center ensures both business sustainability and high technical and service performance.

What is Data Center Capacity?

In our previous blog about Data Center Infrastructure Management (DCIM), we touched upon the numerous components of a data center: data rooms, power engines, power distribution systems, UPSs, HVAC, access control, racks, networking, computing, and storage systems, among others. All these elements contribute to one sole purpose – providing reliable IT and communication services to end customers.

Services constantly change in terms of functionalities and the resources required to operate properly. For example, customers frequently activate or deactivate certain service options or initiate computationally demanding processes, all of which demand more computing resources, and consequently, more power and cooling.

Services utilize various resources, and accommodating the ever-changing landscape of services and new customers being added requires spare resources that can be activated at any time to meet new service and customer requirements.

Data center capacity therefore refers to the maximum quantity of infrastructure resources available in a data center at any point in time. If any of these resources fall below a critical value, the services provided from the data center may suffer degradation, disruption, or even outage. Thus, providing enough resources without over exaggerating is crucial for good data center performance and corresponding business success.

How to Measure Data Center Capacity?

The only way to understand how much resource capacity is in a data center is to implement a Data Center Infrastructure Management (DCIM) tool and maintain precise documentation of all elements within a data center. This involves comparing metered data from a monitoring system to the documented capacity and resource use and reservations.

Some capacity elements can only be monitored through a thorough documentation discipline. This discipline usually comes with well-defined (ITIL) change management processes.

On the other hand, some capacities can only be determined by the use of a proper data center monitoring system. For example, a UPS battery’s capacity can be documented, but only by measuring the actual capacity from the battery itself can one truly understand the UPS's actual capacity when utility power goes down.

Data center experts typically define a set of Key Performance Indicators (KPIs) that are constantly updated to reflect actual capacity and its utilization. These KPIs include but are not limited to:

  • Power capacity utilization: the ratio between current average or peak power consumption divided by the installed power.
  • Power capacity remaining time: the duration until the entire power capacity is depleted, typically necessitating accurate forecasting.
  • Power Usage Effectiveness (PUE): the total power consumed by the data center divided by the power used by the equipment that the data center accommodates.
  • Rack space capacity utilization: the ratio of occupied vs. total installed rack units in one rack, a group of racks, or in the entire data center (rack density).
  • Stranded rack space capacity: reserved but not yet used rack space capacity.
  • Stranded rack power capacity: allocated but unused power per rack or room that can be reallocated for another purpose.
  • HVAC/Cooling capacity utilization: the percentage of cooling capacity currently used to support accommodated network and IT equipment in the data center.
  • Uplink port utilization: the percentage of unused ports available to connect the equipment in the data center.
  • Server and storage utilization: a KPI that indicates the intrinsic capacity of servers and storage systems that can be theoretically utilized to accommodate new services.
Slika

What is Data Center Capacity Planning?

Put simply, capacity planning is the process of forecasting future infrastructure resource needs, optimizing existing resource usage, and determining the future capacity of all elements of a data center. This is done in a manner that provides proper resource capacity for new services and customers. The planning must be executed in such a way that the minimum resources are installed to achieve these goals, ensuring the financial sustainability of the data center business.

Now, it sounds simple, but there are many challenges to it. First, forecasting future resource needs obviously involves translating customer requirements into technical resource requirements. Second, one must determine how historic data is accounted for when projecting the future and how current or near-future trends influence such forecasts. Proper capacity planning will also take into account potential risk and implement capacity contingencies for situations such as a sudden surge in customer demands or needing to provide redundancy for other data centers in case of a major outage.

Finally, to be able to do any forecasting, one must also have an accurate base of data. This data is available only from DCIM tools backed with measured capacity from monitoring systems.

Later in this post we will investigate basic strategies and planning methodologies. As we’ll explain, planning is not a one-off process. In fact, continuous periodic (annual or quarterly) planning is necessary to be able to respond to modern data center capacity demand trends.

Importance and Benefits of Data Center Capacity Planning

From the preceding discussion, it becomes evident that capacity planning is the cornerstone of the efficient operation of an organization's data center. It reconciles several critical aspects, including:

  • overall operational efficiency,
  • the financial health of the data center business,
  • the ability to meet customer demands, and
  • adaptability to technological changes.

Capacity planning ensures that a data center is well-equipped to meet the evolving demands of customers, both expected and unexpected, including bursts. By accurately forecasting future resource requirements and technology development, data center managers can tailor their infrastructure to deliver reliable services, preventing disruptions and ensuring customer satisfaction. Continuous, periodic planning discipline is an effective proactive approach that helps align the data center's capacities with the dynamically changing needs of its end customers.

Constant and rapid technological advancement significantly influences capacity planning, requiring the adaptation of future resource capacity to accommodate new technologies and evolving industry standards. This may result in more resources being required than before, but often involves a reduction in spatial or power capacity. However, this also means that data center facility design must be capable of accommodating new technologies without the need for significant and costly interventions in its original design. Therefore, planning starts when the data center is designed in the first place.

Periodic and preemptive capacity planning enables the timely procurement and installation of necessary resources. By accurately forecasting demand, organizations can streamline the procurement process, preventing delays that could impact service delivery. This timely acquisition of equipment and resources ensures that the data center is always ready to support increasing workloads and evolving technology requirements.

All the aforementioned factors have a direct impact on the financial sustainability of a data center. By preventing over-provisioning, organizations can optimize resource usage, thus minimizing unnecessary costs associated with unused capacity and excessive financing expenses. Simultaneously, this approach helps avoid under-provisioning and the consequent risk of revenue loss or the need for urgent and expensive procurement of technology resources.

Meeting the Service Level Objectives (SLOs) of data center services is critical for the sustainability of a data center business. Capacity planning ensures that the infrastructure can meet or exceed the agreed-upon performance objectives. This is vital for organizations providing mission-critical services and builds the much-needed reputation to attract new customers.

Therefore, data center capacity planning is not merely a technical discipline. It is a shared activity involving data center capacity management, technology, marketing, and sales, aiming to achieve an agile organization that is responsive to customer needs and aligned with technological advancements. It is a practice that ensures the long-term success of the data center.

Slika

Data Center Capacity Planning Process

Starting with the not-so-great news: unfortunately, there are no predefined processes with exact steps to follow to develop your data center's capacity plan. Each data center organization and facility setup must customize its planning processes because each organization and data center is unique and specific in some sense.

Now, the good news: there are general steps involved, and a crucial common denominator, a prerequisite for successful capacity planning, is having proper DCIM in place—something we've emphasized more than once. DCIM tools provide a set of functionalities that greatly facilitate the execution of any defined capacity planning process. This typically involves forecasting based on historical data, whether manually entered or measured by a monitoring system.

Another important prerequisite is establishing a planning team, with the project owner in charge, dedicated to continually developing planning methodologies for the specific data center deployment, services, and organization.

Update inventory and assess the current capacity

Having a DCIM in place with correct and up-to-date current and historical data is the ultimate goal. Once all the data is in place, you are ready to extract all the reports you need to execute capacity planning correctly. But how can you achieve this?

There are several steps to be taken. First, it is crucial to establish a correct baseline – documenting current assets in a data center at time zero that you can trust. To achieve this, one must conduct a manual or semi-manual inventory of all passive assets and utilize discovery and reconciliation tools to provide correct and up-to-date data on the current active assets (network equipment, storage, servers, etc.).

Since the passive part of the job cannot be done constantly, once you have the baseline, you must enforce the use of change management processes. Every change in the data center, from simple patching to adding new racks and equipment, must go through well-defined processes that include a planning and as-is documentation phase in the DCIM tool. Even then, one must use discovery and reconciliation tools to detect any deviations from the documented situation.

The third element that is needed is combining the documentation with records of live data about performance (i.e., power usage, cooling efficiency, network interface traffic, CPU and memory load, etc.). This data must be provided by a robust umbrella performance management system that monitors all metrics across all technology domains. Only then do you have measured capacity utilization insight of your data center.

From the data you now have in DCIM, you can easily evaluate the current state, including power and HVAC usage and efficiency, floor and rack space usage and strained capacity, network and computing resources bottlenecks, top N component generating most faults and causing SLOs being violated, etc. This way engineers can easily detect what elements should be replaced or upgraded, and how stranded capacity can be released by “reshuffling” certain elements. These are all essential inputs to further the capacity planning process.

Define future growth demands and business objectives

And now, things get messy. Your marketing and sales department, or at least the organization's management, take part in this step. A chasm of misunderstanding is always there. Sales talks in terms of RGUs, ARPUs, the number of customers, etc., and you need to translate this into racks, kilowatts, megabits per second, and other technical measures. It's an exciting feast, right?

Having historic data for well-established services is a helpful tool used to model the capacity requirements for future customers and upsells and cross-sells. Of course, one must account for future technological improvements, if there are any.

On the other hand, launching new services in the marketing department is more challenging in terms of translating them into technical quantities. The only reliable way is to conduct extensive service testing to understand the extent of the requirements needed to run the new service.

However, regardless of the linear algebra-based projections, one must account for the heavy-tailed distribution impact – the customers that are outside any projections. Again, past experience and gut feeling provide guidance as good as any mathematical model in this case.

Once the exercise is finished, business requirements can be translated into actual space, power, cooling, network bandwidth, computing, and storage capacities needed to sustain further business growth.

Execute forecasting and factor in technology development impact

Forecasting of future requirements of existing customers in your data center must also be added to forecasts driven by business inputs. For that purpose, a DCIM tool provides advanced forecasting algorithms that combine statistical, heuristic and machine learning algorithms to estimate future values of critical variables: space occupancy growth, network utilization, future power and cooling requirements, future computing and storage requirements, etc.

Now, to ensure that forecasting algorithms yield reliable results, one must furnish historical data. DCIM and umbrella monitoring systems stand as key sources of data. DCIM offers historical data on space/rack utilization, while monitoring provides data on power consumption, cooling power and efficiency, network utilization, and many others. Umbrella monitoring systems also furnish early warnings on future capacity exhaustion, triggering timely total or isolated resource capacity re-planning and allowing for the timely initiation of the procurement process.

The forecasted data, combined with business-defined requirements and the impact of technology development, forms the baseline for executing actual capacity planning. One crucial factor to consider is spare capacity to be used in case a part of the data center goes down due to any unforeseen event. This excess capacity is extrapolated from the Service Level Objectives (SLOs) that must be achieved (e.g., expected uptime or reliability). The "new normal" also involves caring about the CO2 footprint of your data center, and accounting for investment in more energy-efficient systems also contributes to the total estimate of future resources needed in the data center.

Finally, when all capacity calculations are completed, one must ensure that all calculations are accurate. One very useful tool for that purpose is what-if analysis, provided by DCIM tools. These not only help verify if the planned capacity expansion can accommodate future needs but also help devise plans to follow in case customers have high capacity demands that even the extra capacity accounted for cannot accommodate.

Refresh assessment for existing risks and anticipate new ones

One important aspect to consider when executing a capacity plan is the re-evaluation of existing risks and new risks, as well as regulatory measures. For instance:

  • new cybersecurity or physical security threats demand investment in more sophisticated protection mechanisms,
  • climate changes may necessitate a higher capacity for HVAC,
  • regulatory decisions may require more redundancies to be implemented in the data center, such as redundant utility power supply, etc.

In the process of reassessing the risks, one can easily devise plans for their mitigation – an essential asset that truly helps in crisis situations.

Do the financial math

You may take pride in your brand-new capacity expansion plan, but it makes sense only if it is sustainable from a financial and business perspective. Therefore, a business plan must be updated to reflect new capacity expansion costs (both CAPEX and OPEX) as well as the revenues the expansion will facilitate.

Unfortunately, very often the business case of the devised plan falls short of the expectations of the organization’s management. In this step, one must go through a tedious process of reconciling business objectives with technical requirements and find a fine balance of both that will provide a positive business case.

Prepare final capacity sizing and devise the implementation plan

Capacity planning does not stop at this point. One might say it only truly begins. Now one must coordinate the procurement process and ensure that the costs correspond to the projected budget. Once the timeline for equipment and installation delivery services is defined with vendors, one must devise an operational plan that includes IT and network engineers, facility construction and maintenance staff, and other organizations. The plan must be executed by issuing concrete work orders to install and configure the equipment. One must also make an appropriate reservation of capacity in DCIM to ensure that the additional capacity is genuinely used for the purposes defined in the plan.

After implementation, audit the new situation and update documentation in DCIM

The planning process will happen again. To ensure the next cycle goes smoothly, after the installation of additional capacity and optimizations are complete, engineers must make sure to:

  • Document all changes in DCIM.
  • Rediscover the entire data center infrastructure and check for errors in documentation.
  • Configure the umbrella monitoring system to manage changes (new devices) in the data center.
  • Configure new reporting to accommodate new procedures and regulatory measures implemented during the upgrade.
  • Reconfigure reports to account for changes in capacity measuring and planning methodology.
  • Track the alignment of assumptions made during the planning process with the actual situation and implement planning process adjustments where needed.
Slika

Best Practices for Better Data Center Capacity Planning

With many aspects of data center planning explained, it is easy to summarize some of the best practices of this critical process:

  • Use and maintain DCIM tools for documentation.
  • Implement and maintain umbrella monitoring systems and use them to calculate and visualize KPIs in real-time.
  • Introduce capacity outage alarming based on forecasting to understand when to execute limited or full-scale capacity planning.
  • Enforce planning changes and document the alterations in the data center in the DCIM tool as the source of performance data that cannot be measured by the monitoring system.
  • Use DCIM’s forecasting mechanisms to determine future capacity requirements.
  • Utilize DCIM’s what-if analysis capabilities to check your calculations and devise procedures for mitigating unexpected events.
  • Listen to the business for new plans so you can anticipate the need for capacity expansion in the data center.
  • Continually learn about technology trends that help optimize existing resources and minimize future capacity expansion.
  • Revise the capacity plan strategy frequently.
  • Account for new risks and regulations.

UMBOSS & Data Center Capacity Planning

UMBOSS plays an important role in the capacity planning process of many organizations. Its native integration with FNT Software’s DCIM solution makes the entire DCIM solution a perfect fit for data center management and capacity planning needs. Read about how these two systems blend together to deliver a state-of-the-art data center management solution in this case study featuring Data Box, FNT Software, and UMBOSS.

Interested in discovering more?

You can read all you want about UMBOSS, but the best way is to experience it through a demo.

Slika