By Duncan MacRae
Downtime is not an option for IT departments, with more and more mission-critical internal business processes relying on IT. Duncan MacRae discovers it is in the CIO's best interests to prepare for the worst.
If a business is to be successful there are a number of assets that need to be well taken care of. These obviously include employees, customers, potential customers and all equipment necessary for the day-to-day running of operations.
While it is of utmost importance that these are well maintained, there is one asset that is by far the most important and can often be brushed under the carpet.
This is simply data - the very essence of any enterprise, small, medium or large. Research undertaken by Gartner suggests that two out of every five companies that experience a disaster leading to a loss of data will go out of business within five years as a result.
Further studies from the National Archives & Records Administration in the USA claim that 93% of companies that have lost their data centre for 10 days or more due to a disaster went on to file for bankruptcy within one year of the disaster.
Dhiren Harchandani, COO at Latitude Systems - a solutions provider that offers data protection and disaster recovery solutions through its partner, Sonasoft - says loosing access to critical data for even an hour can cost a company millions of dollars.
"The Meta Group reports that the downtime cost for each company in the energy industry is US$2.8 million/hour, in the telecom industry US$2.0 million/hour and for financial institutions, US$1.4 million/hour," Harchandani explains.
Data loss does not just mean a loss of money though. Firms are essentially losing information. They might have the resources and manpower to re-establish the company post-disaster, but if they don't have the data, which is the basis of the company, businesses can't contact their customers. They have lost them.
It is all too easy to be complacent - to think that because nothing has gone wrong so far it probably won't go wrong in the future. The fact is all businesses experience some sort of disaster and, in turn, data loss at some point in their lifespan.
When people think of ‘disasters' it is often images of earthquakes, tsunamis and hurricanes that the mind conjures up. These natural disasters, while devastating, are few and far between, particularly in the Middle East. It is the smaller, more discrete workplace disasters that are far more frequent and all too often forgot at the company's own peril.
Disasters within the business can take on many guises, including computer viruses, theft, power loss and hardware or systems malfunctions. In the GCC the power grid is fairly reliable, while in the Levant grid is very sensitive.
"Most of the companies in the GCC don't realise that no utility company gives them a guarantee that their power will be available 100%, 24/7, 365 days a year," says Vipin Sharma, VP of EEMEA sales at Tripp Lite, a manufacturer of uninterrupted power supply (UPS), surge suppressors, and other power protection devices.
According to Sharma, power infrastructure is something that businesses need to look at. Generally, when the power goes out, data storage and data processing devices are at stake. This can include high-density servers, storage servers, voice over IP servers, networking equipment, healthcare equipment and banking terminals - essential components of most businesses.
The two most common causes of data loss, however, are human error and employee sabotage, which may come as a surprise to many. Employees may potentially be stealing data or corrupting data.
"Of course nobody is going to admit that their employees sabotage their own data but you would be amazed at how often it happens," says Avinash Advani, CIO at Latitude Systems.
"Corruption doesn't just generally happen. If you protect yourself, your server and your infrastructure well enough, your data is protected strictly within the server itself, by RAID for example," Advani explains. "You've got RAID 1, RAID 3, RAID 5 in your hard drives. Why do those exist? If one hard drive fails or gets corrupt you've got another one to fall back on. That disaster recovery still exists, so why does data still get corrupt? You've got user error coming into play."
An IT manager may put too much stress on a single server, not configuring it the right way or not designing their infrastructure the right way, where there is more load on the infrastructure than there is capacity to handle it. This is where corruption can occur.
A user may not be aware of the impact that his or her actions can have on a system. An example would be a user upgrading an application and not understanding the impact that upgrade would have on other systems. These errors can be in the form of misconfiguring the application or typing in an incorrect command.
These are the common problems that CIOs in the Middle East are having to face in relation to data loss, but it may be worth sparing a thought for those original images of disaster conjured up by the imagination.
"In this region we've not really had the natural disasters that you might get in the United States, but recently we felt an earthquake in the UAE and have had major gale wind sandstorms. If there's a big enough earthquake in Iran then you're going to feel tremors or worse," Advani adds.
These particular tremors were felt in the east of the UAE in early March 2007 and, although they did not register particularly high on the richter scale, they are not a rare occurrence in the region.
It was just over three years ago that a devastating earthquake struck Iran, killing thousands, flattening a large portion of infrastructure and completely cutting off electricity supply. Research on earthquakes prior to this had been non-existent and, to this day, there is still very little information to go on in order to gauge the likelihood of another such event. What has been discovered since is that Iran is sitting on top of a major fault line - a fault line that runs through the Middle East, down into the heart of the UAE.
"Arabia could experience another major earthquake at anytime," explains Dr Ali Oncel, seismologist at the Earth Sciences Department of King-Fahd University of Petroleum and Minerals.
"In order to have a better idea of when it could happen we need detailed monitoring of seismicity to detect active faults, something which has never been carried out in this region. Until this happens we have to assume that another major earthquake within the next few years is probable."
Even if the probability is low it is a game of chance and in a game of chance there is always a loser sooner or later.
Countries in other parts of the world where the frequency of earthquakes is known to be low, such as Switzerland, have implemented earthquake codes and disaster recovery plans on a wide scale to deal with disasters and protect company assets.
The only person to have written a researched report on Middle Easter seismology, Max Wyss, warns businesses that they must be prepared for major power loss or risk folding.
Wyss, Wadati Professor for Seismology at the Geophysical Institute at the University of Alaska, says: "Larger earthquakes can wipe out all power supplies. At the very least, if I had a business that depended critically on uninterrupted power supply, I would install a generator.
"Throughout the world, a great deal of money is spent on disaster recovery. Each dollar spent in preparation for disaster significantly reduces the potential losses for a business once disaster strikes," adds Wyss.
Enterprises in the region need to be aware of all the potential hazards when it comes to power loss and disaster recovery - that is half the battle. The next step is preparing the IT defences.
"If you're at a company that relies on data as its lifeblood you need to protect yourself upfront by focusing on business continuity," Latitude's Advani explains.
"If you have a data centre in Dubai and you're smart and don't just think that earthquakes happen in California, or any natural disaster for that matter, you will take all your data and replicate it through standby systems.
"Sonasoft can provide that and data can be mirrored on a second by second basis. This standby server can be anywhere in the world. As long as you've got an active internet connection you can have the ability to take your data from one place to the other and ensure that one server will come back on-line the minute one server goes down," Advani adds.
The further apart these two servers are the better but it usually boils down to how much money a business has. While it would be best to have a back-up server in a different country, or even continent, many companies' budgets cannot stretch that far. In many cases businesses based in the centre of a city have to make do with having their backup located on the city's outskirts.
Looking at one example, the Etisalat contact centre in Ajman, UAE, deals with calls from thousands of customers every day. When a call is received it is crucial that the call centre staff have instant access to all customer data 24 hours a day, seven days a week.
While the centre is based in Ajman, it has a backup centre located in neighbouring emirate Sharjah. For general data losses this would be a wise move but, if natural disaster struck, having this failsafe so close to home may prove to be no use whatsoever.
The telecoms operator, however, is confident that it has enough physical assets in place to safeguard its data and keep the business running all day, every day.
At the back of the contact centre there are two Caterpillar SR4B backup generators, which can supply power to the entire customer care centre for three days.
Installed back in 2000, it has been used successfully during scheduled outages and when the mains power supply from Sharjah Electricity fails.
"We feel we are adequately prepared for any type of disaster. We have the generators and if they fail we can rely on the backup centre," assures Subbaraya Mahesha, technical manager at the Etisalat contact centre.
"We also have someone who specialises in data recovery, so if there is an earthquake and we can't use the backup centre hopefully he'll still be alive. If he's survived he can have us up and running again within about three days."
Something else that has to be considered in relation to power loss is the vast amount of construction going on in the region. This makes the reliability of power extremely uncertain and is certainly something enterprises should be worried about.
Tripp Lite's Sharma knows all too well about this problem. "In front of my home in Dubai, which has a DSL (digital subscriber line) and home data systems with everything on 24/7, there are always power cuts because the metro system is being dug up.
"Every business is susceptible to this. These ones are scheduled power-outs so it would mean planned downtime, and businesses need to understand whether they can afford that or not. In construction, there can be unscheduled outages - it just takes one digging mistake and power cables can be severed," adds Sharma.
According to Latitude's Advani, there is more to preparing for disaster than just having backups in place, and it is a sentiment echoed by a number of major vendors.
Operational best practices are key to the successful implementation of IT infrastructure - technology alone is not enough. Businesses need to implement a set of proven best practices for building highly available systems. Businesses also need to focus on removing the design complexity of system architecture to maintain high levels of protection and recovery.
Mohamed Alojaimi, technology marketing manager at Oracle Middle East and Africa, says: "Enterprises that have based their system architecture on best practices find they can quickly and efficiently design and deploy applications that meet their business requirements for system availability."
The best practice should encompass specific design and configuration recommendations, which have been extensively reviewed and tested to ensure optimum system availability and reliability.
"A myriad of technology solutions exist today to enable enterprises to protect their data and recover the data in a timely manner whenever needed," adds Alojaimi.
Some of the solutions currently available include backup and recovery, snapshots, RAID, remote data mirroring, data replication and automated standby databases, as Alojaimi points out.
He continues: "Data is the most critical asset of the enterprise - whether it is payroll/employee information, customer records, valuable research, financial records, historical operations information. If a company loses its data it cannot be replaced and rebuilding or regenerating that data will likely be an extremely expensive, if not an impossible task.
"A company that has lost its business data will find it very hard to remain in business. This is particularly true considering that regulations mandate companies, especially financial institutions, retain their business critical data for a minimum number of years."
Symantec, which merged with backup software specialist Veritas in 2004, insists companies should categorise and prioritise their data.Data should be categorised in terms of recovery time objective - how long can the organisation afford to be without this data - and recovery point objective - how much data can the organisation afford to lose.
"A recovery programme should be built around these two important factors," says Omar Dajani, regional manager of systems engineering at Symantec, MENA.
"For example, HR systems are not as critical as e-mail systems. Therefore you would give priority to e-mail recovery by providing high availability servers and other safeguards.
"A HR system, however, may not be as business critical and, therefore, the recovery measures do not have to be as quick or as expensive."
The best way to prepare for a disaster is to avoid the disaster. Companies should look for potential problems and correct them, according to Latitude's Dhiren Harchandani.
"IT mangers should address those issues that you can solve and which will provide benefit. Regardless of the cause, fast and effective recovery of your IT environment is essential," he says.
"They must be able to quickly implement your recovery plan, which must be tested and well documented before problems occur."
However, developing a disaster recovery plan for your systems in general, and databases in particular, can be an extremely tedious and time consuming process.
"If you can automate the entire process through configurable templates, then the entire process can be completed within a short period of time, saving time and resources," Harchandani explains.
"Also, one should focus not only on backup, but also on recovery. When a disaster occurs, it can take hours, if not days, depending on the complexity of the situation, to have all your systems and databases up and running.
"Users should look for applications that will help them to recover to the point of failure, or to a point in time quickly without the need to write any script or code."
Symantec's Dejani believes taking even the simplest of steps in preparation can make the world of difference. The smallest details, which may seem less important, can actually prove to be crucial.
When disaster strikes it is important that every employee knows what procedures are in place and what they each need to do. A simple example would be knowing where the backup tapes are kept, ensuring they're clearly date-stamped, and assigning the task of finding them and loading them. The plan also has to be tested.
Staff changes are a regular occurence for companies and they should take staff through regular safety drills. IT disaster recovery is just one piece of a bigger business continuity requirement, according to Dejani.
With a large percentage of enterprises having implemented disaster recovery solutions, they still need to be educated. When it comes to furthest availability, IT managers need to figure out what particular applications and services are required in case of disaster and ensure that the business is not running out of resources.
Imran Raoof, manger for the storage department at ProTechnology, an IT solutions provider, feels that governments in the Middle East need to be taking an active role and should educate businesses in IT recovery.
"I think the first initiative in educating companies about this should come from the government's side - in any case, a large section of the IT industry in the UAE, for example, is run by the government," Raoof says. "Of course, we are doing our bit by hosting seminars and trying to educate the customers about what they can get out of disaster recovery planning."
A lack of awareness is where the real problem lies and there is nowhere near enough information out there for CIOs to feed off.
Symantec's Dajani believes that companies are not fully aware of the consequences of data loss, despite the amount of information being put into the market increasing.
"Those organisations that have the best disaster recovery plans have previously experienced data loss and its consequence," he says.
"Some companies haven't experienced data loss on a large scale, so they haven't felt the pain yet. It's always going to be less of a priority if they haven't felt the pain but it's ultimately something they shouldn't have to feel."
Every data centre manager should gear himself up by getting themselves trained, get educated. This, as well as the money involved in preparing for disaster, should not be a deterrent. In fact, the expense can vary depending on a company's individual needs and budget, and need not break the bank.
Avinash Advani at Latitude Systems puts it succinctly when he says people need to think of spending money on disaster recovery and UPS as an investment, rather than just another frivolous expense.
"Think of it as insurance," he suggests. "When you get a car, you insure it. Have you seen the driving in Dubai? It's crazy. When you're driving around you know sooner or later you're at least going to get a bump and when that happens you're going to want to be covered."
It is exactly the same when it comes to disaster recovery. It is always going to be a wise move to spend a small sum of money even if we expect, or at least hope, nothing untoward is going to happen, rather than not spend and have to deal with the consequences when things inevitably go wrong.