Having a good disaster recovery plan is crucial for businesses in every industry - when systems go offline, companies can lose data and revenue, and it's hard to get those things back once they're gone. It's especially important in the colocation business because customers around the world depend on the computing hardware housed within data centers to run their day-to-day functions. A data center outage results in double the time and money lost: once for the client, once for the colocation provider itself.
So how can colocation providers prepare for the worst so that when disaster strikes, they can deal with it quickly and return to full operation with the least impact on revenue and client relationships? One important part of creating both incident response and disaster recovery plans is making sure data center managers are aware of different metrics throughout their facilities. Temperature, humidity, power usage and server utilization are all measurements that should be taken into consideration when creating DR plans.
Let's take a look at how careful monitoring of specific metrics could potentially impact DR strategies:
Temperature and humidity
Within a computing environment, the accepted temperature is established by the American Society of Heating, Refrigerating and Air-Conditioning Engineers. According to Energy350, the most recent iteration of the ASHRAE's technical committee 9.9 states that this temperature can range from 59 degrees Fahrenheit to 89.6 degrees F, with a relative humidity between 20 percent and 80 percent.
Data center managers should lay out specifically what temperature they are keeping their facilities at in their DR plans. That way, employees in charge of checking the readouts from temperature sensors know what to do when, for instance, the server room gets too hot.
If a company hasn't installed power distribution technology that includes monitoring and alerts, power surges could take place within the data center. These spurts of high voltage could lead to higher electric bills - or they could take the whole facility offline. In the case of Google's Belgium data center, which was taken down after four successive lightning strikes near the building, around 0.000001 percent of data on the center's total persistent disk space was lost, according to Fortune.
Power distribution units that include sensor technology provide a comprehensive view of power usage in the data center. With this information, managers can determine the optimum levels of power utilization and include these numbers in any incident response or DR plans. In general, these kinds of monitored PDUs can help prevent circuit overloading, as well.
The human element
According to Data Center Knowledge contributor Yevgeniy Sverdlik, human error is probably the reason for more than 80 percent of data center outages. However, this number is only possible if system design, commissioning and training are taken into account. In actuality, operator error, which is what occurs when an employee does something wrong within the facility itself, could account for up to 18 percent of outages. This is generally due to a lack of documented procedure of what to do during emergency situations. Therefore, creating a DR plan to instruct employees what to do when things go wrong is a crucial part of managing the computing facility.
In addition to making sure there is a procedure to follow, investing in monitoring solutions like the kinds offered by Geist can make a difference in the long run when it comes to disaster-proofing the data center. Staying on top of the aforementioned metrics will allow managers to quickly respond in case something changes within the facility, and Geist's monitoring equipment can provide the right kind of information for every scenario.