A matter of survival
Until recently, network survival was a high-stakes but relatively straightforward game for carriers. Manage the lines and you'll be fine. Hire top engineers and repair crews, bury the cable and restore service as quickly as possible when a backhoe cuts through the lines.
Industry News
Blogs
Briefing Room
advertisement
Not so today. The stakes are considerably higher. Within the past few months, carriers have suffered embarrassing and costly outages that had nothing to do with cables, including the AT&T frame relay lapse in May and the Galaxy IV satellite outage that silenced pagers across North America shortly thereafter.
Both outages represent the new challenges facing network planners and operations staff in today's era of increasingly complex-and increasingly vital-telecommunications networks. These threats include complex network hardware and software, an explosion of new telecom services, a growing number of network participants and greater reliance on the network by customers, business partners and even competitors.
Indeed, competition is placing a premium on network survivability, says Rob Rich, senior vice president for telecommunications research and consulting at The Yankee Group.
"It's bad enough that if your network is down, you can't host any traffic and you can't bill," Rich says. "But with increased telecommunications service choices, carriers with outages risk losing key customers. People don't forget, and a major outage is a major driver for change" (Table 1).
Carriers themselves are partly to blame for placing such a premium on survivability, Rich says. Carrier pitches to offer services such as electronic commerce and virtual private networks naturally result in higher demands being placed on the telecom network.
"Telecommunications has simply become a mission-critical component of business," Rich says. "Constant contact with customers, suppliers and business partners is what people want, and those needs are driving the revenue streams of the carriers. When you go to things like [dense wave division multiplexing], you are carrying more and more traffic on the network. In the Age of Great Big Bandwidth, network survival is tremendously important."
Ron Worley, vice president of field operations for Kansas City, Mo.-based Sprint, defines the Age of Great Big Bandwidth as such: "We now have a tremendous load on each fiber pair. We are up to 40 channels at 100 gigabits and can carry 3.1 million calls on a single fiber pair. The significant increase in traffic has caused us to really be aggressive in the area of survivability."
Sprint's strategy hinges on its commitment to a nationwide fiber network, including four-fiber, bidirectional, line-switched Sonet rings in most major metropolitan areas. At the same time, Worley recognizes that the survivability stakes are not always within one carrier's control. Partnering with other carriers, such as Bell companies and competitive local exchange carriers, presents new challenges no matter how survivable Sprint's cables are.
"We have to work with all the different vendors and configurations," Worley says. "A big key force over time is the development of standards."
Worley also acknowledges that carriers raise the survivability stakes by rolling out new and improved services, such as Sprint's new Integrated On-Demand Network.
"It gives the customer more control," he says. "Customers can now apportion bandwidth as they need it and can aggregate many services on one platform."
The more the messier Not only are customers asking for more control of their networks, but business partners and other carriers-in some cases, direct competitors-also are sticking their fingers in the network pie through widespread interconnection agreements.
Ray Albers, vice president for network architecture at Bell Atlantic, keeps a watchful eye on the Bell company's network from his Arlington, Va., office.
The top priority for carriers guarding against outages is to engineer and deploy networks intelligently and with adequate safeguards such as redundancy, automatic failovers and duplication to very low levels, Albers says. Bell Atlantic, for example, insists that every central office have at least two signaling links to two different signal transfer points, and that these two links be at least 10 miles apart. But Albers recognizes this is only half the battle in terms of truly securing a network today.
"There are all the [CLEC] and interconnection concerns," Albers says. "For things like local number portability, we are loading a lot of software into the switches. There has to be a lot of testing, and the only way to do it is walk slowly into it and don't try to cut over everything at once."
Because of interconnection complications, Bell Atlantic now includes wording in supply contracts stipulating that contractors disclose equipment failures in other networks. The importance of this stipulation has only grown over time, as Bell Atlantic has steadily increased the number of companies with which it interoperates. "No one wants to say their stuff has failed," Albers says. "But if we buy an [asynchronous transfer mode] or [Internet protocol] router switch and it has already failed somewhere else, we need to know about it."
As for Bell Atlantic's plans to promote residential asymmetrical digital subscriber line service, Albers says, "It's easy to put up a viewgraph with one house and one switch and one connection, but how are you going to do hundreds and thousands? What if something happens and many, many people buy all this stuff?" Many of the non-wire aspects of the network might become even more critical.
As the service delivery infrastructure becomes more computer-intensive and intelligent, application platforms cannot help but become more complex, says The Yankee Group's Rich.
"Ultimately, the telcos are going to need to learn to manage more than just the networks themselves," Rich says. "They will be heavy into the game of application management."
Albers already sees increased pressures on Bell Atlantic's operations support system (OSS), the software and hardware that handles things such as billing, turning on or off service and tracking calls.
"It used to be if one of your OSSs was down for a couple of hours, you might have some upset customers, and some clerks could not do their work," Albers says. "But now, you cannot tolerate your system being down."
Carriers are seeking higher availability of their network monitoring and control systems, says Matt Izzo, director of solutions architecture for Objective Systems Integrators, a Folsom, Calif.-based producer and distributor of network monitoring systems. "When the network was simple, you could manage the service by managing the wire," he says. "As the networks get more complex and the services become more complex, so do the systems that manage the networks" (Figure 1).
The key thing that carriers need in these "networks that manage the networks," Izzo says, is the ability to "correlate network alarms across diverse networks, relate that to performance monitoring data and correlate that to the impact of services on the customers who use those services."
In other words, different users have different demands and different levels of tolerance for failure. Data services are different from voice services, for instance. The good news is that data services in general have much more built in to help monitor error rates, Izzo says. The quality of a data service can be characterized by error rates, throughput and speed of transmission, while the quality of a voice service would usually be measured by number of calls completed. Therefore, the proliferation of data services will only serve to boost the emphasis on network monitoring systems that can "proactively relate the health of the network to the service provider and then to the customer," Izzo says.
Sound the alarms Network monitoring systems can be almost too proactive, however.
For instance, one of the major causes of the recent AT&T frame relay outage was a "message storm" that hit when faulty software started setting off alarms on a switch controller card.
"It's sort of ironic that the network management system itself was bringing the network down," Rich says. "What really caused the AT&T network to fail was [that] it sent out so many alarms."
A solution is at least twofold. One, make sure any network monitoring equipment has some type of alarm filtration system. Systems should be designed to "rationalize and filter" error messages, Rich says, so that monitoring itself does not get in the way of network traffic. Carriers also must deal with increased demand by their workers to keep pace with the new monitoring and reporting technologies and systems.
"You are managing so many different types of elements," Rich says. "Being up to speed on all this stuff is a big challenge. People are figuring out, though, that it is really about rationalizing your processes to be customer focused. Even with AT&T, they should have had better testing processes to make sure it never got there."
Even with the most advanced network technology, dealing with network survival issues often comes down to human fallibility or infallibility, says Jim Alsman, senior analyst at the Dataquest Telecom Group. "When a fiber network such as Sprint has gets sick, it gets sick badly," he says. "It gives you a lot of information, and the challenge is to find out what is going wrong."
Carriers are trying to meet the challenge by creating network reliability centers to monitor and troubleshoot networks.
Centralization of network control is critical to keeping up to speed on rapid technological changes, says Ron Horton, director of BellSouth's Network Reliability Centers.
"We are going from 42 locations to two centers," Alsman says. Those two-one in Nashville and the other in Charlotte, N.C.-will watch over what Horton refers to as BellSouth's "Hurricane Alley" territory of Florida, Georgia, North and South Carolina. "We have more specialized people now and a greater focus on analysis on how we can keep the alarms from going off in the first place."
The centralization of reliability and network survivability staff is happening at most of the other RHCs, as well as at Lucent Technologies and AT&T.
SBC Communications is moving its core network monitoring and reliability functions to two centers, one covering the western portion of its territory and a new state-of-the-art monitoring center in Dallas that will handle SBC's Arkansas, Kansas, Missouri, Oklahoma and Texas region.
Network reliability centers offer other benefits, such as closer ties with the testing divisions of major manufacturers and a more comprehensive view of the network and the threats that exist throughout a carrier's region, Horton says.
"One day, we had a cable cut by a barge drifting in the river, one clipped by a crop duster and one that got snapped when a dump truck drove under it with its bed up," he says. "So we got it by land, by sea and by air. That's why building a redundant network is so important. Because if it's out there, we are going to get hit by it."
AT&T's approach to survivability consists of eight layers bound together by a philosophy grounded in the three P's-prediction, prevention and a proactive approach.
The eight-layer pyramid consists of self-regulating and auto-switching equipment, automatic rerouting of calls within 50 msec., flexible egress routing paths, alternative signaling transport networks and at the top, full disaster recovery capability should a critical AT&T network site be hit with a full-blown disaster such as an earthquake, fire or flood (see figure).
The pyramid did not, of course, prevent the IXC's frame relay debacle in May. But the reason the event occurred illustrates why the bottom layer is so important.
A combination of factors, including inadequate procedures, a switch bug, a bad switch controller card and an error message storm contributed to the system failure, says Hossein Eslambolci, vice presidentof network operations for AT&T.
The carrier now must continue to focus on survivability processes and procedures to ensure that such an outage does not happen again. "Demand on the network is growing," Eslambolci says. "The demand for data is growing, the Internet growth is significant and frame relay growth is significant. As we put critical applications on the networks, we have to execute so much more flawlessly. The more capacity you crunch into these pipes on a fiber level, the more cautious you have to be in terms of people, protection and processes."
Thousands of things can go wrong at any moment with any carrier's network. But the one that keeps most field operations people up at night is nothing other than an old standby-cable cuts.
An AT&T spokesman jokes that carriers call backhoes "unauthorized cable finders," and claims that nothing seems to find a buried cable faster than a digging crew at a construction site.
Any type of digging still ranks as the biggest cause of network outages, says Ron Worley, vice president of field operations for Sprint.
That means cable restoration crews are at the heart of network survivability. So a few years ago, Worley came up with an idea to test and sharpen the skills of Sprint's field crews, as well as allow them to showcase new skills and share knowledge. The result is Sprint's Fiber Restoration Rodeo, an annual competition featuring two dozen of Sprint's top fiber repair technicians in a test of skill and speed at finding and splicing damaged cable.
Participants must pass a written test to qualify, and the rodeo pits eight teams of three technicians each in events such as cable locating, cable fault locating, cable excavating/backhoe directing, cable preparation, cable splicing and splice case testing. Sprint has a field staff of more than 250 fiber technicians, charged with patrolling and maintaining more than 30,000 miles of fiber optic cable nationwide.
To promote the rodeo theme, Sprint brings in horses, sets up a grandstand and even hires rodeo clowns. Each event is judged on speed, accuracy and safety, and scores are immediately flashed on a large overhead television screen to an audience that consists mostly of other Sprint employees.
At this past May's Rodeo in Kansas City, Mo., Sprint network control center workers saw the difficulty and skill involved in fast, accurate field repair. To simulate realistic field conditions, for instance, the backhoe directing competition allows only hand signals, approximating times when a backhoe must be directed in inclement weather, over frozen ground or in the dark. Buried soda cans are laid next to the downed cables, representing electric and gas lines; any backhoe that hits one is docked for a utility outage.
In the splice case testing event, audience members are invited to come down and slam to the ground restored splice cases to see how the splices hold up.
"The Rodeo helps us find new, innovative ways to do our jobs faster and more efficiently, and in the end, our customers benefit," Worley says.
Network systems are among the most critical to the telecommunications infrastructure, yet only 17 months before the new millennium, two industry experts remain concerned about how well they will survive into the next century.
That's partly because some network providers have yet to complete their testing, says Tim Zebo, a principal consultant for Bellcore.
Although Y2K-compliant software is generally available, many systems may not be at the correct release level for the compliant upgrade, he says. Additionally, just because a PBX system is considered compliant doesn't mean the entire system is compliant. For example, administrative and maintenance software bundled with the PBX may not be compliant.
Although most network management systems will correctly change dates between Dec. 31, 1999, and Jan. 1, 2000, testing shows that some data switch management systems have reset to 1969 when the date was set to Feb. 28, 2000-the day before Leap Day and another date producing technological headaches.
After Jan. 1, 2000, some data multiplexers will lose configuration data after a diagnostic request, and some simple network management protocol-managed devices will fail, Zebo predicts. He recommends testing up to 31 dates to ensure networks will work properly in the next millennium.
Networks are particularly susceptible to the millennium bug. Says Zebo: * 75% of network devices are date-sensitive.
* 25% to 35% of data networks are date-sensitive.
* All network management systems are date-sensitive.
* Because different vendors may choose different standards to represent dates, interoperability between remediated devices is key. Some Y2K solutions assume certain two-digit year representations are Y2K and later. Others use full four-digit year entry. Some analysts say it will take five to 10 years before these different solutions fully communicate.
Even if most of the network successfully handles the date change, a failure in one small area could mean a network failure. Interoperability testing is the key to network reliability, Zebo says.
Bellcore's own Y2K network strategic plan includes elements covering goals and scope, the project itself, compliance process, vendor management, major account management, carrier management and deployment.
Zebo also recommends this checklist to determine network readiness: * Assess strategic plans, tactical plans, inventory and Y2K compliance.
* Evaluate areas that are of greatest risk to business from date-sensitive failures.
* Assess plans and allocation of resources.
* Verify status of contingency plans.
Zebo spoke at a recent banking conference covering Y2K issues, including those affecting the telecom infrastructure.
Washington, D.C. attorney Colleen Boothby, who also spoke at the conference, says some carriers are being coy about their Y2K readiness.
In early May, the FCC asked an estimated 1300 carriers to explain their progress in ensuring that their systems could handle the date change.
Commissioner Michael Powell's letter asked telcos: "Explain in detail how you will assure customers of your telecommunications voice and data services that those services will function properly before and after the change in date to the Y2K, including the leap year date of Feb. 29, 2000."
Response was to be within 30 days. Sixty days after the notice was sent out, some 300 carriers had responded, many with vague answers, Boothby says.
That led to a late June meeting among Powell, telecom companies, major users and manufacturers. At the meeting, several companies said they were unaware that customers wanted this information. However, several banking entities, including the Federal Reserve Bank of New York, were adamant about getting better information, Boothby says. She credits the presence of another regulator with convincing the FCC to take a stronger stance on the issue.
Powell reportedly told the companies that by providing additional information, they would build confidence among their customers, Boothby says. Powell also urged telcos to include key customer groups as testing partners.
However, several carriers are communicating with their customers about their Y2K readiness.
AT&T, for example, has a page on its Web site that details its progress. Twenty percent of its systems are date-sensitive, says an AT&T spokesman. All of those are scheduled to be Y2K-compliant by the end of 1998, leaving 1999 for testing and minor tweaking, he says.
Ameritech also uses its Web site to update users. The carrier is part of the Telco Forum, along with Cincinnati Bell, GTE and Southern New England Telecommunications, which conducted Y2K interoperability testing in July.
The FCC is also reviving the Network Reliability and Interoperability Council, which will work with telecom companies to facilitate Y2K compliance efforts. Powell will oversee the effort.
Want to use this article? Click here for options!
© 2012 Penton Media Inc.
advertisement
Learning Library
Webcasts
Using Real-Time Offers, Alerts and Interactions To Improve the Mobile Broadband Experience
In this Webinar you will learn how to create a real-time relationship with your customers, how to proactively improve the customer experience, and how to successfully target and cross-sell services to boost incremental revenue.
- Megabytes to Megabucks, Bandwidth to Business Models: How 4G Is Changing Everything
- How to Unplug Your Redundant Telco Apps To Save Money and Improve Efficiency
- When IaaS Isn't Enough: Service Provider Business Models to Drive Growth and Build Margin
- How to Transform Your Aging Telco Voice Network to Drive New Profits and Revenue
- Creative Licensing Approaches for Telcos & Their Network Equipment Vendors
- Smart Home Opportunity: Balancing Customer Data & Privacy
White Papers
The Role of Diameter in All-IP, Service-Oriented Networks
This paper discusses the rise of Diameter and benefits of Diameter Protocol.
- Conducting The Orchestration – Order Management at the Speed of Business
- Toward a Converged Network Edge
- Beyond Spam – Email Security in the Age of Blended Threats
- 6 Important Steps to Evaluating a Web Filtering Solution
- The Expertise to Protect You from Botnet and DDoS Attacks
- Seeing is Believing – Bridging the Order Visibility Gap
Featured Content
A time and money saving approach to fiber deployment
Service providers are under tremendous pressure to turn up new services faster then before and, at the same time,
to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service
turn-up.
of interest
The Latest
News
From the Blog
Briefingroom
Join the Discussion
Resources
Get more out of Connected Planet by visiting our related resources below:
Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.
Subscribe Now







