Pulling for 9s
Are you taking chances with your networks? You may think that having backup, redundancies and 24x7 monitoring is enough. But according to George Stiglich, Tandem director of product management and business development, regular backup systems often are not sufficient to protect mission-critical applications. Most wireless networks need additional insurance to guarantee reliability and availability 99.999% of the time. But this protection is far from straightforward. You have a variety of coverage options available such as fault tolerance and high availability.
Industry News
Blogs
Briefing Room
advertisement
"There is a lot of confusion about what availability is, and what is the value of fault tolerance," Stiglich said.
One thing that is not confusing, however, is the fact that system availability is essential.
That means you not only need to know what fault tolerance is, but also you need to decide how fault tolerant your network should be, whether you can afford it and which applications should be fault tolerant.
DEFINING FAULT TOLERANCE How fault tolerant is your network? The answer depends on your definition of fault tolerance.
Fault tolerance is the capability to perform fault management and continue to operate during a hardware and software failure without adversely affecting the network. A fault-tolerant network is designed to remain in operation via alternate routing or another method in the event of unanticipated system or component failures. It tolerates failure rather than providing recovery from it. If a component fails, there is no unplanned system outage or interruption in processing, no loss of in-flight data and no loss of state information within the system.
Most fault-tolerant systems mirror all operations -- that is, every operation is performed on two or more duplicate components, so if one fails, the other continues processing.
Fault-tolerant software allows a telecommunications system to "tolerate" hardware faults and some of the design and coding faults built into them. The problem may not be cured completely, but the system still can function effectively. Fault-tolerant software detects faults that are, for example, about to shut down a switching system that is allotting calls to individual phones, and takes corrective action. It detects failures, backs up and recovers data, eases communications among processes, copies key files for backup, and restarts and restores crashed systems automatically.
With fault tolerance, the network and the subscriber see no impact on or change in quality or reliability of service. Jon Mechling, Stratus Computer director of product marketing, said fault tolerance generally provides more than 99.999% availability, or about 5 minutes of unplanned unavailability over a year on a 7x24x365 network.
But there is no standard that specifically defines what fault tolerance is and what it is not, according to Brian McLaughlin, Stratus product marketing manager. McLaughlin said many vendors confuse carriers by calling their systems fault tolerant when they are actually high availability. In addition, some vendors view fault tolerance as a function of hardware, while to others, it includes hardware and software, and extends to the database.
Mechling said the distinction is that fault-tolerant systems are designed to prevent failures, and high-availability systems are designed to recover rapidly.
High-availability systems use an active primary system for processing and a standby system, equipped with the same software, that starts up when a failure is detected in the primary server. High-availability configurations have lower availability than fault-tolerant systems because of delays in processes used to detect failures and the time required to start up the standby system. The total transition time can be minutes for simple applications to more than an hour for complex database applications that require restarting large transaction-processing systems, according to Mechling.
"Compared to conventional systems, which can stay broken for hours or even days, high availability recovery times are short, but compared to fault-tolerant systems with zero recovery times, high-availability recovery times can seem like an eternity," he said.
If a high-availability system crashes, there will be some downtime, and data could possibly be corrupted or lost. At 99.95% availability, these systems can produce about 41/2 hours of annual unplanned downtime for a 7x24x365 network.
Fault-tolerant systems provide continuous availability without having to write failover scripts. High-availability systems use custom-written scripts to govern what steps are taken in what order when a failure occurs. Writing these scripts is a difficult job because every possible failure scenario must be thought of in advance, worked into the script and then tested before the system goes live in the network, Mechling said. If the system configuration changes through an upgrade, the failure modes may change, and the script must be modified and retested before deployment. Fault-tolerant systems, on the other hand, provide continuous processing without failover scripts -- the fault detection and isolation is in the hardware.
WHICH IS BEST FOR YOU? Both fault tolerance and high availability offer network solutions, but are they complementary or competitive systems? Should you choose one system over the other, or use fault tolerance for some applications and high availability for others? According to Jim Odom, Harris vice president of the network products group, they are two separate considerations.
"It is one or the other," he said. "You are either willing to pay for true fault tolerance or not."
Harris, which bundles fault-tolerant or high-availability software with manufacturers' hardware in a working system, mainly installs high-availability systems for its wireless customers. Usually carriers run high availability, he said, but they should consider fault tolerance.
But some vendors say fault tolerance is not the only way to go. Rick Rotondo, Excel Communications wireless market manager, said there are many ways to get high availability, and fault tolerance is just one. He said availability goes beyond fault tolerance of any one component.
"It is not fault tolerance vs. high availability," Rotondo explained. "Everything is about availability, and fault tolerance is a method of getting high availability of critical network elements, as opposed to other methods such as load sharing, hot standby or hot swapability."
Mechling said high availability and fault tolerance complement each other.
"There are times you want to use one and times you want to use the other," he said. Fault-tolerant systems should be used for critical applications or those affecting carriers' revenue and customer satisfaction, he explained.
Bryan Sweeley, Tandem vice president & general manager of the non-stop systems division, described fault tolerance as preventive maintenance in case of failure. He said it is reserved for the most demanding applications and is not used mainstream. Where fault tolerance should have a strong following, Sweeley said, is with financial operations and telecommunications companies, which have a strong need for on-line reliability.
Fault-tolerant systems are most appropriate for critical applications such as call processing, E-911 services, home location register (HLR), short-message services, prepaid calling, number portability, over-the-air activation and voice-activated dialing.
Sweeley said carriers must ask themselves what is their risk profile or ability to absorb an outage. Mission-critical applications, or applications that drive customer satisfaction, must be foremost in the minds of information technology managers. As the wireless market becomes more competitive, customer satisfaction is affected by the quality of customer care, the accuracy and timeliness of bills, and other circumstances in which subscribers interact with carriers. Fault tolerance is becoming appropriate for these business-support and back-office applications as well.
Eric Doggett, Tandem telecommunications division senior vice president & general manager, said more service providers are switching to more robust solutions, and fault-tolerant or high-availability systems most likely will become more important throughout the industry.
Stratus' Mechling said fault-tolerant systems will be applied in areas they never have been before and where high-availability systems are not practical. He said that high-availability systems also will be applied to applications that require better-than-conventional availability. As these applications become more important over time, they will be deployed on fault-tolerant systems as well.
"One of the differentiators wireless carriers need to bring to subscribers goes beyond quality to reliability," he said. "If carriers want to bring high-quality service, those applications need fault tolerance."
Rotondo cautioned against relying on one system to do it all.
"Fault tolerance should not be considered the panacea for everything," he said. "You cannot stop with fault-tolerant components; you must look at the whole system."
What is best for your network is situational and depends on the applications you use, Doggett said.
"It is a matter of applying the appropriate technology to meet the needs of the particular service and service provider," he explained. "For example, when AT&T Wireless deployed its HLR, it absolutely put that on the most fault-tolerant, highly available platform it could find because of the critical nature of that. But when we talk to them about being able to offer wireless intelligent network services to 10,000 subscribers, they are looking at alternatives in terms of high-availability platforms because the critical nature just is not there for those particular services."
Ultimately, the IT manager's choice of a fault-tolerant or high-availability solution is application driven. Some applications demand, above all, fault tolerance and 99.999% availability, especially electronic commerce and E-911 services.
"As we get into more and more competitive price pressures, particularly in wireless, you really have to take a hard look at what are the critical components that need to be fault tolerant, what can just be redundant and what you can solve through diversity routing in your network," Rotondo said.
PAYING THE PRICE For many carriers, running fault-tolerant or high-availability systems depends on the budget. Fault-tolerant systems are either more or less costly than high-availability systems, depending on whom you ask and what you use them for.
Aaron Glass, AirTouch Cellular's director of systems engineering, said that carriers should use both high-availability and fault-tolerant systems to get the best availability in the most cost-effective manner.
"Companies pay a premium to use fault-tolerant systems," he said. "If you are using both, you are saving capital."
AirTouch uses fault-tolerant systems for mission-critical applications such as call processing and billing. The high-availability applications include support systems (OSS platforms) and short-message systems.
The pro of fault tolerance, according to Glass, is that it provides 100% availability. But the biggest downside for carriers is the premium. As for high availability, Glass said though it tolerates minimal downtime and costs less, it is not 100% guaranteed.
But the total cost of ownership for such solutions depends on a number of factors including hardware and implementation costs, service, and support contracts. The simplicity and greater availability of fault-tolerant solutions can allow them to provide superior value and a lower total cost of ownership over the system's life cycle, despite their typically higher initial hardware cost.
Mechling said the cost of high availability includes acquisition costs, cost of ownership such as maintenance and administration, complexity of installation, more complicated configuration because of two separate systems, data recovery, restart of application writing, and failover scripts.
Design and installation also affect the cost of fault-tolerant and high-availability systems. Most major vendors provide high availability by having one system failover to another, using conventional systems that were not designed for continuous operations, and adding software to detect failures and perform failovers to backup systems.
Stratus' McLaughlin said that while hardware costs are declining, personnel costs are rising, which may make high availability more expensive because two systems must be monitored, administered and maintained.
Sweeley said fault tolerance may cost 10% more than high availability.
"There is a perception that fault tolerance is much more expensive," he said. But, he added, it may be worth the higher price tag to protect your network because downtime can cost thousands of dollars in lost airtime revenue.
Ultimately, the best insurance for a service provider is availability, no matter how it gets it.
"Subscribers don't care if a system is fault tolerant," Rotondo said. "But they do care if their service is available."
Tandem's Stiglich agreed. "It all comes down to need. The difference between fault tolerance and high availability is not important, but the bigger issue is what price you put on failure," he said.
No matter how you do it, the important thing is making your network available at all times, even when something goes wrong. How much are you willing to bet on your network's availability?
Want to use this article? Click here for options!
© 2012 Penton Media Inc.
advertisement
Learning Library
Webcasts
Using Real-Time Offers, Alerts and Interactions To Improve the Mobile Broadband Experience
In this Webinar you will learn how to create a real-time relationship with your customers, how to proactively improve the customer experience, and how to successfully target and cross-sell services to boost incremental revenue.
- Megabytes to Megabucks, Bandwidth to Business Models: How 4G Is Changing Everything
- How to Unplug Your Redundant Telco Apps To Save Money and Improve Efficiency
- When IaaS Isn't Enough: Service Provider Business Models to Drive Growth and Build Margin
- How to Transform Your Aging Telco Voice Network to Drive New Profits and Revenue
- Creative Licensing Approaches for Telcos & Their Network Equipment Vendors
- Smart Home Opportunity: Balancing Customer Data & Privacy
White Papers
The Role of Diameter in All-IP, Service-Oriented Networks
This paper discusses the rise of Diameter and benefits of Diameter Protocol.
- Conducting The Orchestration – Order Management at the Speed of Business
- Toward a Converged Network Edge
- Beyond Spam – Email Security in the Age of Blended Threats
- 6 Important Steps to Evaluating a Web Filtering Solution
- The Expertise to Protect You from Botnet and DDoS Attacks
- Seeing is Believing – Bridging the Order Visibility Gap
Featured Content
A time and money saving approach to fiber deployment
Service providers are under tremendous pressure to turn up new services faster then before and, at the same time,
to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service
turn-up.
of interest
The Latest
News
From the Blog
Briefingroom
Join the Discussion
Resources
Get more out of Connected Planet by visiting our related resources below:
Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.
Subscribe Now







