Visibility Into Telco Outages
You can build all sorts of checks and balances into your network to make sure that it will remain on-line no matter what. However, that isn't good enough because it isn't just you. You have to rely on other carriers to provide your transmission. When those telco carriers go down, so do you in the eyes of your customers. To improve the reliability of your network, you almost have to improve the reliability of your vendors' networks.
Industry News
Blogs
Briefing Room
advertisement
"Telco outages characteristically are the single greatest source of network outages," said Gary Ottosi, Nextel's vice president of network operations.
He explained the importance of wireless carriers having visibility into those kinds of outages and reliable reporting mechanisms to reduce that vulnerability.
"We can move from being reactive -- recognizing that a problem has occurred -- to being pro-active -- recognizing when a problem is coming," he said.
This visibility allows you to take up the issue of reliability and availability with the telco carriers and ensure that they are looking at the right things as well. A number of issues can come up on the telco carrier side of the business that are incumbent on the customer to take to that vendor.
"Outages from the telcos are some of their worst nightmares," agreed Gary Mayerick, Systems and Software Division president for TTC, the company that provides the monitoring equipment for Nextel's centralized network operations center (NOC) in McLean, VA.
These outages, according to Mayerick, can drive away customers because of unavailability or non-productive call capacity as well as delays in dispatching service tickets. All of these can hinder Nextel's ability to reach customers' high expectations as well as its own goal of 100% customer satisfaction.
So what is a wireless carrier to do? In the case of Nextel, it planned, designed and constructed its NOC in a brief eight months. It armed the NOC with TTC's CENTEST 650 remote test units; NetAnalyst client/server-based, test-management software; Clear Communications Early Warning intelligent surveillance software; and TTC's professional service support. When fully deployed next month, the system will provide centralized test and monitoring functions from the McLean location. The software at the NOC will perform DS0, DS1 and DS3 testing and intelligent-network surveillance at Nextel's switch sites
"Our view of Nextel's objectives was to furnish NOC-level eyesight for every site in their network, basically remote access to any T1," Mayerick said.
Nextel wanted to ascertain the quality of service it received from its carriers and providers, obtain 24x7 monitoring of T1s and CSUs, and make it possible to identify and map every circuit with an end-to-end configuration-management database.
By using the Early Warning feature, Nextel can monitor supplied network segments and analyze predictive trends. It can use the Circuit View to isolate problems to supplier network segments and then employ the Report Card feature to compare the supplier's performance, identify a given supplier's worst circuit segments and track continuous improvement initiatives with suppliers.
Ottosi said the company already has been able to measure ways that the NOC has provided visibility and driven improvement from the telco carrier. For example, it has collated the mean time to repairs to determine better repair strategies within its organization. Likewise, it has been able to set uniform threshold requirements for performance monitoring. The company is able to ensure uniform service levels are available to its customers regardless of where they travel in the Nextel environment. The NOC knows when something doesn't happen or something might not be happening to that proscribed service level.
"The customer benefits from having uniform thresholds of performance," said Bill Rose, Nextel director of network operations. "One of the initiatives we have set in the NOC is to establish that customer experience."
Payback To Ottosi, if Nextel can reduce the telco outages, it keeps the subscriber system available to the customers. Because customers can make the calls they need when they need to make them, Nextel doesn't have churn resulting from outages.
"From the perspective of availability, it obviously has satisfaction and revenue impacts because the system is going to be there more often. From the perspective of efficiency, we can go back to the telco carriers and say these circuits are your best circuits, and these are your worst circuits."
Because it can identify which circuits show degradation, Nextel not only can get back credits it might be due for circuits not falling in the right service level, but it can drive the telco to better performance through monthly reporting.
"On this particular product, we have service-level agreements with our telco carriers which require them to have a certain level of availability," Ottosi said.
Previously, Nextel couldn't drive telcos to better performance because it couldn't report back to them on actual performance.
"Obviously, their reporting back to us was, let's say, more from their perspective," Ottosi said. "We want them to have the same 100% customer satisfaction goal that we do. It is easy to take some of the outages that we have had, do root-cause analysis on them and understand that cooperatively working with the carrier can create a better environment."
Pro-active vs. Reactive According to Ottosi, the ability to monitor T1s on a 24x7 basis -- looking for minor slips and dips -- is almost impossible in the older maintenance environment. He said it used to be virtually impossible to monitor thousands of T1 circuits looking for slips, hits, faults or anything that would indicate that your carrier's equipment was beginning to fail or that the facility itself was degrading.
"In the older environment, it was predominantly reactive," Ottosi said. "Something happened, you'd get it fixed and then try to figure out why it happened."
Nextel and TTC's efforts represent a more pro-active approach. Nextel is able to see things coming, view problems occurring in real time and monitor the blips on the T1s that the customers may experience as a lower level of service quality.
"If you are not watching it and you slip below 10-6 bit error rate for 10 seconds, your chances of seeing that outside an environment similar to Nextel's is almost nil," Ottosi said.
The infrastructure that Nextel is building into its NOC will support this pro-active approach. (Although at the time of this writing it was not fully deployed, Ottosi said it would be fully deployed before the end of the year.) The NOC gives the company the ability to look at cause and effect. It can watch degradation as capacity limits reach their upper threshold, as the bit error rate slips on T1s and starts to interfere with traffic in a given area.
"With a product like the monitoring in this environment, you can almost start to react before it gets to boiling point," Rose said. "You can see degradation much earlier when it is rolled up centrally. You can look at the cause and effect through this window that youcouldn't see before."
Parsing Off Alarms With a NOC like this, you have the visibility to see as much as you want to. However, it isn't necessary to have every anomaly set off an alarm. In establishing the monitoring environment, Nextel parsed off some of the alarms so it could improve its response time. According to Rose, in a given month, it initially could see as many as 100,000+ alarms hitting its operators. The number of alarms could grow that large in a week's time if there were significant network activity. However, Rose said he thinks that number should be parsed down to about 30,000 events a month in a correlated fashion, meaning that if you have a T1 outage, it doesn't have to tell you that every element hanging off that T1 just went down. It just needs to say you lost the T1. Then the carrier can fix the T1, and the other network elements will clear in their own due course. If one of them doesn't come up, then the NOC will alarm that individual market. Potentially, however, for every alarm that goes off, you could see hundreds of alarms coming in.
"We could not manage every frame slip on the T1, every momentary hit," Rose said. "You just can't react. It goes back to overflooding your NOC with alarms."
According to Rose, you have to go through the growing pains of seeing everything to understand the relationship of a single T1 event and what network elements it deteriorates in order to drive the alarm state.
"You live with that for a while, and then you can go and peel back this thing to understand exactly what you really want to see at a NOC level," Rose said.
Report Card Measures Just because the potential alarm information is parsed away from immediate visibility doesn't mean it is gone forever. If Nextel wants to go back into log files and research it, it still can tell its provider about its service levels. In the Nextel NOC, there is an overlay product that monitors in real time behind the scenes. It doesn't distract you from picking up key alarms. However, upon request, you can have it look at the performance over a month's time. It might tally up all of those momentary blips to several hours of service interruptions. Then you can notify the telco regarding performance and availability. If you are paying for 99.99% availability, and the telco delivers substantially less, you can show the telco through the Report Card mechanism exactly when and where the faults were.
"It gives you the ability to compare and contrast the different providers and even the service provider within the same service area," Ottosi said.
Root-Cause Analysis Ottosi said the NOC is part and parcel of having uniform monitoring or uniform thresholding "above the noise level of normal field operations." >From one location, Nextel can set agreed-upon thresholds of performance and monitor those. Then it can evaluate which ones it exceeded, which ones it didn't hit, and then do root-cause analysis as a centralized group. The centralized group can assess how to improve the response, the architecture and vendor support, and drive the telco to better performance.
"By taking these elements centrally, you have a dedicated group of people whose job is quality and reliability," he said.
Rose agreed, noting that when you have a NOC with a national view, some of those lessons can be propagated across your other markets, giving you a direct quality boost. It also allows you to see potential trends or anomalies sooner because it is looking across multiple markets.
"A little something here and a little something there can mean something to a central group that may not be as meaningfully apparent to one of those markets individually," Ottosi said.
CENTEST 650: a remotely deployable, centralized test unit that allows Nextel to monitor and troubleshoot WANs from one location.
NetAnalyst: test-management software that incorporates CENTEST 650 and portable test devices into one testing software platform. It enables intrusive and non-intrusive testing through point-and-click graphical user interfaces. The software is designed with an open, scaleable architecture and is TMN-based.
* EarlyWarning: surveillance software from Clear Communications that collects and analyzes historical network performance data to identify problems before the service fails, allowing Nextel the ability to repair problems before they affect customers' service.
Thanks to Galaxy IV and other wireless-system outages, many carriers have their fingers crossed that it will never happen to them. With its NOC-level eyesight, does Nextel feel it is better prepared for catastrophic failure should it occur?
Gary Ottosi, Nextel vice president of network operations, explained that sometimes you don't need to experience a network-wide failure to consider the catastrophic possibilities.
"We have a number of ongoing initiatives to provide multiple transmission facilities; we didn't need to suffer a catastrophic outage to understand that you have to build diversity into the network," he said. "Of course, as you peel that onion back, it is a very involved engineering exercise to build true diversity and redundancy throughout all of your network."
According to Ottosi, carriers need to take up these issues with the telcos themselves. There again, it is how the telcos are driven on their performance.
Bill Rose, Nextel director of network operations, took a more hypothetical point of view and assumed that Nextel did have a catastrophic event.
"We are much better suited from a central operations point of view with a NOC than if we were diverse across the country," he said.
He explained that there are many actions a carrier can take from a central point of view to control the impact of a major telco outage at the local-market level. For example, a carrier can reroute traffic. Also, a centralized location can prevent an individual market from doing something to overdrive traffic that hurts the national network.
"We are staged very effectively so that if we have a major catastrophic event that requires a concert of repairs across the country," Rose said, "it can be orchestrated out of here as opposed to independent grass-roots types of repairs."
Want to use this article? Click here for options!
© 2012 Penton Media Inc.
advertisement
Learning Library
Webcasts
Using Real-Time Offers, Alerts and Interactions To Improve the Mobile Broadband Experience
In this Webinar you will learn how to create a real-time relationship with your customers, how to proactively improve the customer experience, and how to successfully target and cross-sell services to boost incremental revenue.
- Megabytes to Megabucks, Bandwidth to Business Models: How 4G Is Changing Everything
- How to Unplug Your Redundant Telco Apps To Save Money and Improve Efficiency
- When IaaS Isn't Enough: Service Provider Business Models to Drive Growth and Build Margin
- How to Transform Your Aging Telco Voice Network to Drive New Profits and Revenue
- Creative Licensing Approaches for Telcos & Their Network Equipment Vendors
- Smart Home Opportunity: Balancing Customer Data & Privacy
White Papers
The Role of Diameter in All-IP, Service-Oriented Networks
This paper discusses the rise of Diameter and benefits of Diameter Protocol.
- Conducting The Orchestration – Order Management at the Speed of Business
- Toward a Converged Network Edge
- Beyond Spam – Email Security in the Age of Blended Threats
- 6 Important Steps to Evaluating a Web Filtering Solution
- The Expertise to Protect You from Botnet and DDoS Attacks
- Seeing is Believing – Bridging the Order Visibility Gap
Featured Content
A time and money saving approach to fiber deployment
Service providers are under tremendous pressure to turn up new services faster then before and, at the same time,
to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service
turn-up.
of interest
The Latest
News
From the Blog
Briefingroom
Join the Discussion
Resources
Get more out of Connected Planet by visiting our related resources below:
Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.
Subscribe Now







