Have a data center checklist for both natural (and unnatural) disasters
According to the U.S. National Oceanic and Atmospheric Administration (NOAA), “in 2021, there were 20 U.S.-based weather/climate disaster events with losses exceeding $1 billion.” And Mother Nature’s destructive force is predicted to intensify in 2022. NOAA’s outlook for the 2022 Atlantic hurricane season predicts a 65% chance of an above-normal season.
When Mother Nature and even human error strikes, mission-critical facility managers must prepare beyond the typical sources of downtime such as temperature, humidity, and power surges. Hurricanes, for example, are different. They can cause prolonged outages that last weeks as seen when Hurricane Sandy hit New York and New Jersey, inflicting $70 billion in damages. These prolonged outages are where typical disaster recovery plans often fail. Amid the chaos of a catastrophic data center outage is not the time to discover the faults within an existing disaster recovery plan. Instead, it’s time to execute against a pre-established disaster recovery data center checklist to promote the necessary course of action to restore data center services.
5 Essential Steps to Reduce the Impact of Downtime
The best disaster recovery checklist takes a holistic approach to help mitigate mistakes and oversights during downtime. The following are five essential steps data center owners and operators can take to lessen the impact of downtime before a disaster strikes:
1. Implement an Evacuation Plan - Have a detailed plan ready to evacuate any personnel at risk and communicate with staff to confirm their safety.
2. Backup Your Data - Consider making daily backups a regular practice and ensure the location of the backed-up data is not on site or close to the affected data center.
3. Check the Generators - Are they full of clean fuel, and have they been adequately maintained? Test the generators regularly and ahead of any anticipated weather events. Consider lining up at least three vendors to deliver fuel in the event of an extended outage because fuel is at a premium after a disaster.
4. Communicate With Local Utilities - Communicate early with utility providers to set up contingency plans. Create a contact list and have a plan for communicating if traditional channels are compromised.
5. Contact Vendors - Establish a list of vendors and prioritize those requiring communication in an emergency. Then, reach out to them early and make the necessary arrangements so you can be free to focus on more immediate needs during the crisis.
Unfortunately, disaster recovery is a simple concept that gets quite complex quickly. Therefore, having policies and procedures in writing is vital to a successful data center disaster recovery strategy. A facility manager’s approach before, during, and after a disaster is paramount to reducing the impact and should include items such as runbooks, test plans, and other communications plans.
4 Questions to Prepare for a Disaster
Before a disaster strikes, it’s imperative to make sure response teams know the answers to these critical questions:
Who decides to label an event a “disaster”?
What are the most vital applications and hardware?
Where do the essential applications reside, and to what tier do they correspond?
Depending on the type of disaster, when do different response teams need to engage?
In addition to the aforementioned disaster-related questions, it’s also important to monitor different metrics throughout the facility. While usually leading to shorter outages than natural disasters, human negligence or not responding to threshold breaches is often the cause of downtime. Temperature, humidity, and power failures must be considered when creating a thorough disaster recovery plan that accounts for the causes of both short-term and long-term downtime.
Temperature and Humidity - Data center managers need to determine what temperature they should maintain in their data centers (temperatures should range from 59 to 89.6 degrees Fahrenheit). That way, employees checking the readouts from temperature sensors know when a threshold is breached.
Power Surges - These spurts of high voltage could lead to higher electric bills or take the whole facility offline. Facility managers need to use sensor technology to provide a comprehensive view of power usage.
Remember, your organization could have the most powerful technology, but without a solid disaster recovery plan and checklist, your workload could literally blow away with the wind.