By Caroline Denslow
Jason Phippen of Computer Associates strongly advocates regularly testing disaster and recovery tools as a vital part of data protection management
|~|main_interview.jpg|~|According to Phippen, many companies go to recover their data is when they need to, and not during tests. |~|The business of storage is no longer just about having enough disk or tape capacities to put your data in. It is becoming increasingly about how wisely you manage your resources and information, and the ability to protect and recover data in cases of disasters. Jason Phippen, director of product marketing at Computer Associates, tells IT Weekly why data management should take centre stage when it comes to your storage strategy, and why companies should regard regular testing of their installed disaster and recovery tools as an important data protection measure.
How do you define critical data?
It’s the business and the business applications that define critical data. One of the biggest challenges is that globally the management of storage and data in the Windows, the Unix and the open systems environments are still very immature. The guys from the mainframe environment have got it right [because] they’ve been doing it for several years. The mainframe-type discipline has still not made it down into the open systems environment.
The problem is, as data grows and as applications become more complicated — you have multi-tier applications, data residing in a number of places, people who have actually consolidated servers so they can have big servers and big storage arrays running multiple applications and supporting multiple areas of the business — you’ve got issues wherein customers don’t know what they have. They don’t know what pieces of data are relevant to their business. They are not managing it. It just grows and grows until it gets worse.
People can be quite lazy. You hear of companies that have 50 or 100 copies of the same database, where maybe it has been used for testing or for new applications, and the cycle continues. What actually happens is that people need to protect information; if they are backing up everything and 50% of it is duplicated or rubbish, it’s going to mean that backup takes twice as much time. So what happens when you have a disaster? Recovery takes twice as much time as well, which is, in effect, costing twice as much to manage.
What companies need to examine is how to manage their storage from the perspective of the application using that storage. The application is the closest point to the business. The business determines the value of the application.
The HR application, for example, is not business critical. The web application with a backend database that takes online orders is business critical. Why should the customer be managing the storage and protecting them both at the same level?
The logic here is if you manage the storage based on the criticality of the application to your business, you’ll manage it better. Instead of managing them on an equal level, reduce the HR application and spend more money in increasing the more important applications. Companies need to classify their storage. They need to align it to the business value and protect it accordingly.
Does this mean having different solutions for different levels of criticality?
Yes, it could mean different solutions. It could mean a stack of solutions where there is a commonality. So at the lowest level there is backup and recovery. Everything needs backup and recovery as its last line of defence.
As you go up that stack of importance, it could mean that for the critical applications data has to be replicated, maybe to another data centre in another building.
What is an ideal set-up for such an environment?
It could mean a number of steps. First of all, management should be proactive and not reactive. The company has to restructure its data centres. It should be looking at putting storage management disciplines in almost first. It shouldn’t be an afterthought. Use the tools that enable you to classify the data, and to automate the understanding of the value of that business data.
How much of this (data classification) involves manual effort? Are there tools that can help companies manage data better?
There are a number of products that can enable you to do this. Storage resource management is one area. It can go out there, find the data, and actually enable the IT department to align the data with business applications and put a value to it. The first step is to understand what you have and how it aligns with the business. If you don’t know what you have, how do you manage it?
The other area to consider is to avoid continuing throwing hardware at the problem. You can hide the problem by throwing hardware at it but you can’t continue that cycle. Many companies do that. They just throw disk after disk after disk at the problem, instead of actually putting in management and putting in policies. But if, for instance, you walk into any company in the Middle East that has a Windows environment and put in a simple storage management tool, you can probably free up 25% to 40% of their storage by identifying old data, orphaned data and redundant copies of data. By doing just that you can free up 40% of your storage capacity. That’s quite compelling.
Once you’ve done that, backup and recovery will improve 40%. Disaster recovery improves significantly. Again, companies just can’t continue to throw hardware at the problem. The disk spenders are people who want data all over the place. They want to continue throwing hardware at the problem.
Aside from understanding what you actually have so you can actually align it to the business value, and avoiding throwing hardware at the problem, companies should also consider automation.
Where possible, when it comes to storage management automate mundane and complex tasks. Normally, the first thing people say is “Well, we have enough IT resources. I don’t need to automate. I have people to do that.”
It’s not just about the cost associated with these people. When someone is doing something manually and it’s mundane, humans are fallible. That’s the biggest issue. What companies can actually do is define policies and practices to actually automate those tasks.
And if you do that, you’ll find that you’ll get more consistent management across the enterprise, and better reliability because one of the main issues for outages and disasters is not files, nor is it hardware or software failure, it’s human intervention. If you can reduce that element as well as improving cost saving, you’re improving reliability, and you’re improving service levels.
And if you’re automating, you’ll have audit trails as well. So when something goes wrong you can see what went wrong. The other thing, which is related to this, is to do it securely. Make sure you have security in place and maybe implement role-based security, where you have a number of IT staff at various levels.
Ensure that people’s access to the storage management is secure because you can actually get into a situation where a junior administrator connects to a SAN switch and adjusts the wrong setting. If there is one machine driving the entire data centre and all of the storage connectivity, that human error can take down not just that one server but also the entire data centre. Everything connected to that monster of a box could be brought down.
Are there issues in migrating old data to new storage systems?
It all depends. There are a number of challenges. If someone is actually moving from one operating system to another operating system, that can be an issue, even with something like an Oracle database.
An Oracle database on a Solaris system, for example, has a bigger structure than an Oracle database on an HP UX system. There are challenges there. However, there are tools in place and ways to do [migration]. Exchange upgrades and Exchange migrations can be quite a difficult task with all that Exchange data. What you do find is that there are a lot of storage management products out there that can actually help that process. The task of migration is data dependent because the operating system can be proprietary.
How can CA help with migration issues?
CA has a number of products. We started with backup and recovery products. We provide backup and recovery solutions for open systems and mainframe data so that can be their basic line of defence.
We also provide storage resource management technologies — that’s the intelligence — so products that can be put into a customer’s environment can work across Unix, Windows or mainframe environments and that can actually give them this consolidated view of what they have.
This will work across devices and it works in such a way that it understands the storage from the applications point of view.
CA is a purely storage software solutions company. Does it matter that you don’t have any hardware offerings?
It’s a benefit because we don’t have a hardware agenda. so we work with all the storage vendors.
Let’s talk about backup and recovery. Are companies in the region properly backing up data, and do they have recovery measures in place?
I think on the best part, yes, they have backup and recovery but the question is do people test their backup and recovery and are they using it properly? I would probably say for the most part the answer is no. The first time that they actually go to recover their data is when they need to recover their data. And it’s not just a Middle East problem. It’s across Europe. A lot of companies are still very, very weak in testing what they have protected and whether they can actually recover it or not.
It seems, though, to have gone a lot better in the last few years. I think customers have become a lot more educated. The importance of data is becoming much more understood in the region. It’s the testing and the disaster recovery planning that is probably quite weak and quite immature.
I remember, going back seven years ago, I did a lot of work for a bank in Kuwait. The bank has put in a storage solution, and I went in there to install the backup and recovery solution at the bank. We tested it and it worked. Four months later I went back to check if the system was working properly. The bank said they have had no problems whatsoever. So I checked this big tape library. The tapes were in there. But when I tried the backup and recovery product it wasn’t working. A bit of investigation revealed that two months prior to my visit, someone has actually moved the tape library, unplugged it and forgotten to put it back. No one had noticed for two months that the bank’s systems were not being backed up. If they have had a disaster and lost data, the latest data they could recover was already two months old — that would have been two month’s worth of customer transactions lost. Imagine what that would have done to a bank.
What new trends do you see in the disaster recovery space?
There are a lot of technology changes to old discipline. Disk is becoming much, much more important in backup and recovery. You’ll probably find that customers who have historically been using tape are now becoming much more recovery-orientated. If I talk about identifying the value, for example, you’ve got a banking system that takes five hours to back up to tape. If you have a disaster and have to recover data, it’s going to take five hours more as a minimum to recover it.
The issue there is if it takes you five hours to recover from tape that means your banking system is down for five hours. That’s not good.
Bring disk into the equation. You will still need tape so that you can take data to a remote location, but now you can back up critical applications from disk to another disk and then migrate the data to tape. The version on the disk can stay on that disk for 24 hours until the next backup so that if there’s a disaster you don’t need to access the data on the tape. That way, you can recover at a smaller amount of time because data is actually on disk.||**||