Home Did you know ? What Can Go Wrong With an IT Disaster Recovery Plan?

What Can Go Wrong With an IT Disaster Recovery Plan?

by Mic Johnson

This may sound surprising but a study by IT resource site Spiceworks reveals that 95 percent of organizations have a disaster recovery (DR) plan in place. However, why is it that many companies still suffer extended downtimes or disruptions in operations after a cyberattack or some other form of disaster?

There are many reasons for these, and they are worth examining in view of the recent cyberattacks that left undeniable damage and large-scale disruption. Disaster recovery plans provide real-world benefits, but just like cybersecurity systems, they are prone to defects and failures.

Here’s a look at the most critical issues or problems encountered with disaster recovery plans. Many organizations tend to overlook or disregard these details, as they focus more on mere compliance.

Incompatible plan

There is no one-size-fits-all plan when it comes to disaster recovery. Different organizations have different needs. Copying some other company’s plan is not only lazy on the part of an IT manager, but it can also result in an ineffective plan that does not enable business continuity in case an actual disaster or cyberattack takes place.

An effective IT disaster recovery plan is developed with inputs from and deliberations with different sections of an organization. The IT department takes the lead in coming up with the plan, but it significantly helps to consult other departments that use IT resources and those that may be impacted by or have a participation in the plan’s implementation.

The plan must be crafted after accounting for and identifying potential threats or risks to complete computer room environments and critical IT hardware such as the network infrastructure, computers, peripherals and accessories, servers, wireless devices, as well as IoT devices. It is also important to anticipate possible issues with service provider connectivity, enterprise software applications, and data storage devices or apps.

Lack of flexibility

It is crucial to consider contingencies. The plan cannot be one-dimensional or focused on a single or limited option. Take the case of the recent Colonial Pipeline ransomware attack that happened just this second quarter of 2021. The company had backups for its ransomware-affected data to restore operations on its own, but it still eventually decided to pay the approximately $5 million ransom.

One cybersecurity expert agrees with the decision, saying that the fallout could have been worse if the company did otherwise. For major organizations, paying the ransom is way cheaper compared to incurring losses from suspended operations. The US Federal Government, however, is strictly against paying ransoms to criminal entities. The good news is that the Federal Government managed to recover nearly half of the ransom paid.

Colonial Pipeline suffered around 8 days of disruption in operations. In contrast, a similar ransomware attack on meat production company JBS only resulted in around a day’s worth of disruption. The company likely learned from Colonial’s experience and did what the latter failed or refused to do earlier. JBS has been silent about paying the ransom.

Failure to communicate the plan and ensure cooperation

As IBM defines it, “a disaster recovery plan is a formal document created by an organization that contains detailed instructions on how to respond to unplanned incidents such as natural disasters, power outages, cyber attacks and any other disruptive events.” In other words, there has to be an actual document, physical or digital, that can serve as a reference whenever problems are encountered.

There are no self-executing IT disaster recovery plans. Automation and AI have not reached a level advanced enough to autonomously undertake actions necessary to restore operations. As such, there have to be concrete plans known to everyone who will be playing a role in it.

A detailed and comprehensive plan does not serve its purpose if the people who will be enforcing it lack the awareness and understanding of such a plan. Copies of the plan should be made available to key people and notifications should be sent accordingly if the plan is being updated.

Inadequate or no plan testing

Going back to the DR plan study by Spiceworks, it is also notable that while most companies say they have DR plans, a significant 23 percent of them admit that they never test their plans. Around 61 percent of the respondents said that they do not have adequate time to conduct testing while 53 percent cited insufficient resources as the reason for their failure to test. Also notably, 34 percent of the respondents believed that disaster recovery plan testing was not considered a priority.

“Unless an organization performs regular recovery tests, the effectiveness of a disaster recovery program cannot be justified,” writes Chandrasekar S of HCL Technologies. “Often there is a challenge to IT leadership that automation does not permit to perform selective DR tests for a subsystem of application or it doesn’t permit to perform an end to end integrated DR test. Automation has proven that DR tests can be performed without disrupting the production environment or the production data. Integrating cloud and on-premise DR solutions is also possible.”

The United States Federal Bureau of Investigation (FBI) issued an advisory on ransomware prevention and response, which aims to compel organizations to verify the integrity of their backups and test their restoration process to ascertain that they work as designed. Organizations should determine if their restoration process is fast enough to ensure prompt recovery.

Not including the cloud in the plan

Research from Spiceworks also shows that only 28 percent of businesses include cloud services in their disaster recovery plans. This is not about using cloud storage or SaaS solutions to fortify DR plans. The problem here is not anticipating possible issues that can happen with the cloud services organizations rely on. Organizations may not be creating backups for data stored in third-party servers. They may be over-relying on cloud solutions.

A good example of how this can become a problem emerged very recently as the content distribution network Fastly encountered problems that led to the unavailability of several popular websites including the New York Times, Twitch, CNN, and various websites of the UK government. The problem lasted for only around an hour, but it already sent many businesses into panic mode.

Reports say it was not a cyberattack but a technical issue on the side of Fastly. Still, the experience should be enough to remind organizations to always take the cloud into account when planning for disaster recovery and business continuity.

Too little or too many details

A disaster recovery plan is not a generic framework like a country’s constitution. It has to have enough details to facilitate the effective and prompt resolution of an interrupting incident. It has to provide quick guidance to employees, not hints or clues on what to do next.

At the same time, DR plans cannot be overly detailed or complex, which may be too difficult for users to use. Disaster recovery should be agile–something that may not be achieved when there are too many steps or choices laid out.

Not everyone memorizes their full DR plans. Most only remember the basics and may need to refer back to their prescribed courses of action when faced with unfamiliar or novel cyberattack situations. It matters that the DR plan is detailed enough but concise and easy to scan.

Plans rarely become the reality

Expect all kinds of the unexpected when dealing with instances that entail disaster recovery. It seldom happens that things go as planned, so it is advisable to prepare for all kinds of contingencies and possibilities that are ordinarily deemed unlikely.

It is not enough to simply have a plan. The plan should be thoughtfully and meticulously designed to be suitable to provide the appropriate solution for disaster recovery and business continuity.

You may also like