Disaster recovery costs corporations on average between $100,000 to $1 million a year for desktop-oriented disasters alone. When you add it up, the market could grow to more than US$71 billion by 2004.
What is a data disaster?
To the PhD student who came in desperation to Markham, Ont.-based CBL Data Recovery, it was a corrupt floppy disk containing what had been the only copy of an all-important thesis. To the west coast defence contractor, it was an un-backed up, unbootable RAID array. Magnitude: to the observer, light-years apart, yet in both cases the data was priceless.
According to ICSA Labs, an independent division of managed security supplier TruSecure Corp., the average company spends between $100,000 and $1 million per year, in hard and soft costs, for desktop-oriented disasters alone. Add to that the cost of server downtime, whose price after 72 hours is often the loss of an entire business (according to U.S. government statistics), and you have a big problem.
IDC expects the worldwide storage market to grow to US$71 billion by 2004. Even with $33 billion of that dedicated to software and services, that still means $38 billion in hardware to fill with usually irreplaceable data.
And that means larger recoveries when things do go wrong. Mike Briand, manager of business development at CBL Data Recovery, says the projects he sees are large now; and getting larger, which presents new challenges to recovery specialists. "As we get into larger systems, piecing data together presents problems," he noted. "The people in the lab have to work magic to get things back the way they should be."
Of course, companies need not resort to such drastic measures to retrieve their data. All they need to do is restore a backup--if they have one.
"It's an area of best practice," said Marco Coulter, vice-president, BrightStor strategy, for Computer Associates. "We have spent a lot of time in the past few months talking to customers, and telling them 'Don't trust your backups."' Coulter recommends that every few months, customers randomly pick a backup tape, choose a file on it, and restore it from a randomly chosen device, just to prove that everything is working properly.
Some companies are getting the message, said Fred Dimson, general manager of Veritas Canada. "We're seeing (as a result of 9/11) for the first time people are actually testing recoveries. They're seeing what they can recover, and how long it will take. It's getting a lot more attention than it did two years ago."
Jim Lee, vice-president, product marketing at data management software vendor Princeton Softech, agrees. "This past year has caused many companies to review their plans and actually test them."
On the other hand, because backups may cut into application availability, "they may, at times, choose to keep the system up and continue to generate revenue rather than performing the backup. This is a tough decision, and can be costly either way. While analyzing the cost of downtime, every minute is money and this becomes like Russian roulette, do I pay now or pay later?"
When the cylinder spins and the trigger clicks in that roulette game, the result can be costly. Briand says CBL gets a couple of RAID systems for recovery each week. "It should never happen, but it does," he said. "Frequently the cause is user error. A disk may have failed, and no-one noticed. When the final disk goes down, it gets noticed."
In the case of the defence contractor rescued by CBL, a combination of human and software errors led to the catastrophe with its RAID array. A software upgrade went awry, and vendor technicians attempting to fix things only made the situation worse.
It took CBL technicians several very long days to put Humpty Dumpty together again, and CBL company president Bill Margeson said the customer learned a lesson. When his team left for home, the newly revived system was in the middle of its first backup.
But while we hear about successes, it's important to remember that a good percentage of data recovery efforts fail. CBL places its success rate at between 75 and 85 per cent. Another Markham Ont.-based recovery lab, ActionFront Data Recovery Labs, states in its Data Emergency Guide, "any company claiming a 90 to 95 per cent success rate is lying. Data recovery can be a complicated process with inherent physical and logistical limitations that determine what can actually be done."
That's why backups are critical to protect valuable data. Dimson noted that customers are becoming more religious about their backups. "Before it was Can I get away without it?' Now companies have one standard instead of 28 different products in various workgroups. Backup has more recognition in higher areas of companies."
Added Lee, "For basic data recovery, backups are used to recover data in the case of an error or corrupted data, regardless if the cause is software, hardware or human related ... Data recovery from day-to-day operations has always been an important issue. It's not the importance that has changed, but rather the challenges of performing the daily backups in conjunction with high availability for systems, increased data growth and new data retention requirements. One of the keys to successful data recovery is planning at both the application level as well as the enterprise level."
But all of the planning in the world won't ensure proper operations. Even trying to find a way to automate the testing of backups can generate problems -- if the randomly selected tape is offsite, for example, the test will fail. And operators frequently do not monitor backup logs as they should, so fail to detect error conditions. "You always end up with the people factor," said Coulter, "The key to success is a simple tool that can automate things like backups that users don't want to be experts in. Users want to be experts in their companies' business."
At the same time, someone needs the expertise to ensure important data is safely backed up, and resellers are in an ideal position to offer this service. Dimson says in the enterprise space, there are more sophisticated resellers today who can help their customers with backup strategies. In fact, Veritas offers certification which, Dimson says, its resellers view as a value-add for their customers. Still, he is surprised and disturbed to hear stories of vendors who do not configure backup devices into systems.
"Backup is a necessary evil," he stated, "but if it's done properly it's not that onerous. I think the industry as a whole needs to do more in letting people know what they need to back up."
He acknowledged that setup can be tricky, but sees an opportunity for a knowledgeable VAR and noted, "as you move up the chain in terms of company size, if you have an automated piece for backup, the amount of human intervention goes down."
Coulter, too, thinks there's a huge reseller opportunity in the backup space. "The whole point of a reseller is that they know the stuff in real terms," he said. "That's the sort of thing we expect of them. They will go in with a backup product, and an SRM (Storage Resource Management) product, and also give them process, setting them up with best practices. It offers value for the customer, and opportunities for them."
Lee agrees, adding, "Resellers in the backup and DR (disaster recovery) market can be a great asset by providing knowledge and information to their customers. The enterprise today is extremely complex, and only getting more so."00
And when all else fails, Briand believes VARs can be an asset in the data recovery realm as well.
"We have to educate the resellers," he said, "and they will educate their customers. When a (malfunctioning) drive comes in, the reseller should tell the customer that there's an opportunity to recover the data."
The reseller would then act as the middleman between the customer and the data recovery lab. Since CBL (and some other labs) have a "no data, no charge" policy, said Briand, "there's essentially no downside to sending distressed media to us."
And, for the reseller "it's almost money lying on the table. One of the major challenges is in getting the word out."
But, he cautioned, the media should be taken to a recovery lab as quickly as possible, without having been tampered with. One of CBL's failures was a RAID system that had already been worked on for three days by its manufacturer. Its data had been so scrambled by the rescue attempts that it was unsalvageable. Had a recovery lab gotten it earlier, said Briand, it probably would have been OK. He noted that manufacturers typically are not interested in the data when they're attempting to revive a failed system.
Regardless of the methodology, said Dimson, time to recovery is becoming more critical. People need their data recovered faster, and they may not only need to retrieve a file, they need whatever is required to make their application run.
"Realistically," said Coulter, "people don't want to take backups -- they just want magic."
RELATED ARTICLE: Top 10 disaster recovery blunders that firms make
CBL has completed an IT market analysis on misguided recovery efforts based on over 1,500 data loss projects. Here are the top ten bloopers.
1. It's the Simple Things That Matter
The client purchased a "killer" UNIX network system, and put 300+ workers in place to manage it. Backups were done daily, unfortunately, no one thought to put in place a system to restore the data too.
2. In a Crisis, People Do Silly Things
The prime server in a large urban hospital's system crashed. When minor errors started occurring, system operators, instead of gathering data about the errors, tried anything and everything, including repeatedly invoking a controller function that erased the entire RAID array data.
3. When the Crisis Deepens, People Do Sillier Things
When the office of a civil engineering firm was devastated by floods, its owners sent 17 soaked disks from three RAID arrays to a recovery lab in plastic bags. For some reason, someone had frozen the bags before shipping them. As the disks thawed, even more damage was done.
4. Buy Cheap, Pay Dearly
The organization bought an IBM system -- but not from IBM. Then the system manager decided to configure the system uniquely, rather than following set procedures. When things went wrong with the system, it was next to impossible to recreate the configuration.
5. An Almost Perfect Plan
The company purchased and configured a high-end, expensive, and full-featured library for the company's system backups. Unfortunately, the backup library was placed right beside the primary system. When the primary system got fried, so too did the backup library.
6. The Truth, and Nothing But the Truth
After a data loss crisis, the company CEO and the IT staffer met with the data recovery team. No progress was made until the CEO was persuaded to leave the room. Then the IT staffer opened up, and solutions were developed.
7. Lights Are On, But No One's Home
A regional-wide ambulance monitoring system suffered a serious disk failure, only to discover that its automated backup hadn't run for fourteen months. A tape had jammed in the drive, but no one had noticed.
8. When Worlds Collide
The company's high-level IT executives purchased a "Cadillac" system, without knowing much about it. System implementation was left to a young and inexperienced IT team. When the crisis came, neither group could talk to the other about the system
9. Hit Restore and All Will Be Well
After September's WTC attacks, the company's IT staff went across town to their backup system. They invoked Restore, and proceeded to overwrite from the destroyed main system. Of course, all previous backups were lost.
10. People Are the Problem, Not Technology
Disk drives today are typically reliable --* human beings aren't, A recent study found that approximately 15 per cent of all unplanned downtime occurs because of human error.
COPYRIGHT 2002 Plesman Publications
COPYRIGHT 2002 Gale Group