Sound the Alarm
An effective network-maintenance solution consists of many elements. Some you can buy, which are included in your network-maintenance software. Others, such as processes and procedures that are used to maintain your network, can't be bought. For instance, Paul Steckbeck, BellSouth Mobility network management center senior manager, said fault trending, or the process of analyzing network trouble tickets to find failure trends that the carrier can pro-actively improve, is the most important element of network maintenance. Similarly, Richard Johnson, Omnipoint manager of network engineering, said preventive maintenance is the most important part.
Industry News
Blogs
Briefing Room
advertisement
"What do we have in place that can help us prevent problems or see trouble-causing events before they impact our customers?" he said. "What are the tools and the training, or what can my operations staff do, or what can I get the vendor to put in place when we buy a piece of equipment so that it gives us, in a timely manner, errors and problems?"
When you are shopping for the right network-maintenance solution, what should it include? An effective solution offers you the ability to optimize the three Ts: time to service, time in service and time back to service. It also supports multivendor networks and network equipment, automates business processes, provides customers with a certain degree of control over their services and uses state-of-the-art technology.
Your solution must offer a wide range of management capabilities that enables you to automate your operational tasks and efficiently provide advanced services. To accomplish this goal, you should be able to introduce new services quickly by supporting integrated and automated management processes. By tracking network status and performance and supporting the staff's work with on-line help and syntactical/semantic checks, you can offer a higher quality-of-service (QoS) standard to subscribers. Cost-cutting with a common, easy-to-use user interface, intelligent end-to-end applications, centralizing and concentrating complex management tasks and correlating fault messages originating in the network also must integrate into the system's daily functionality.
FAULT MANAGEMENT Fault management describes the process of error reporting from network element (NE) to element manager (EM) or network-management system (NMS), problem detection and problem solving. Errors occur in the network for various reasons including damaged functional units or operator mistakes. They also indicate general network-maintenance requirements. In critical cases, the error affects network services. Fault management's goal is to keep the network running smoothly from behind the scenes, minimizing negative impacts to customer service.
Carriers generally perform error detection as a result of customer complaints and/or alarm reports. The network units generate alarm reports, indicating faults. Fault management must recognize the reports, find the reason for the report and inform the customer-care center, using trouble tickets. Most importantly, it is responsible for instituting repairs.
The EM is the system's element-specific management unit. Besides alarm information, it provides a detailed internal view of the NEs. As an element-specific device, it provides diverse additional information, such as links to the operation manual, graphical views of the NEs and descriptions of the hardware boards. At this level, you have flexible options regarding testing the NEs' functional units and the ability to retrieve information about the current hardware and software version.
These systems not only support different types of NEs but also differ in the facilities they provide. Furthermore, they differ when it comes to the interfaces provisioned in the NMS' direction. This difference leads to confusion on the user side. It is clear that functionality varies depending on the element type, but usability is not the same, and an incompatible interface philosophy makes integration in a network-management environment difficult. This challenge requires considerable flexibility. When carriers are not equipped properly, higher operating costs and decreased reaction times to urgent alarms can result. Superior maintenance systems simplify complex procedures required to resolve events. The transmission of alarming information in the direction of the NMS, for example, should be supported by an EM mediation function, which performs the mapping from the detailed information model at the EM level toward the more abstract information model on the network-management level.
TROUBLE TICKETING The trouble-ticketing element is a reporting and dispatching system between the fault-detection, fault-diagnosis and fault-correction phases. Trouble-ticketing functions include reporting outages, recording faults, recording customer complaints and distributing information to other offices. Outages constitute QoS degradation or service interruption, which must consistently be reported to customer care. When you restore service, you must close the original trouble ticket.
Faults also are events affecting operations, such as billing-system failure or equipment failure. Recording customer complaints and solving the reported problems is an important aspect in providing customer satisfaction and increasing the network's quality. In this case, complaints are grouped into two categories on a national and international basis.
Information distribution, which is the network-management center's responsibility, entails collecting and correlating all trouble tickets in order to determine common causes. You must store this information so that completed maintenance procedures are not unnecessarily repeated.
The trouble-ticketing system also may generate work orders that direct repair activities. This step first should occur when the diagnosis phase is complete and the repair activities begin. The fault-management system should provide information for the trouble ticket automatically, supplemented by comments and additional information on an event from your staff.
ALARMS Alarms should be displayed graphically in a list according to a color-coded scheme based on severity: critical, major, minor, warning, indeterminate and cleared. The fault-management interface and the NMS should provide such flexible alarm-filtering options. The filters enable you to choose the amount of visible information the NE displays.
RESPONDING TO ALARMS The alarm-response workflow is divided into three broader phases: detection, troubleshooting and correction. You can try to retrieve diagnosis information and describe the reason for the fault during the troubleshooting phase. This is closely related to actions on the alarms. By clicking on an alarm, you should be able to expand the visible information. Information and communications during alarm-response phases is vital -- there never should be blind alleys or 1-way streets during potential emergency-maintenance situations.
Fault management also distinguishes between pending alarms and acknowledged alarms. Acknowledged alarms are those that you already have recognized and for which you have implemented a repair procedure. In the alarm list, pending alarms usually are blinking, whereas acknowledged alarms are static. More sophisticated systems provide facilities to link additional information to the alarm, including the company name and diagnosis information. The alarm remains in the alarm list until you clear it. It is useful to combine diagnosis information with alarms and store it centrally. It especially is helpful if more than one employee is involved in the diagnosis process.
Printouts provide a detailed view at a functional unit, such as a trunk group. They show, for instance, the states and related counters. Printouts are possible just for a single unit or for a set of units. This functionality is related to the EM level.
Some solutions include an on-line spare-part-checking tool so that if it detects faulty hardware, it can locate available spare parts. If no spare parts are available in local supply stores, an ordering process can begin.
As a result of the fault-detection process, you can define a trouble ticket. The trouble ticket contains information about: the NE; the time and date; the fault codes; the result; hardware and software versions; and any unusual conditions prior to the alarm.
If the fault occurred in the local network, the trouble ticket goes to the local service personnel. If the local service personnel are unable to resolve the problem, the trouble ticket can go to the original equipment vendor. If leased lines are affected or there is a power outage, the respective company should be informed. Trouble tickets should alert external suppliers, such as electric-power suppliers or lease-line vendors, if the fault relates to their services. These trouble tickets need to be monitored with time-limit flags enabling fault-management personnel to re-contact the external agency and request the status.
The correction phase can be further subdivided into two phases: the call-out phase, in which you correct the fault, and the alarm cleared phase, in which you report the completed correction and close the trouble ticket. You should record the entire process of restoring service until you completely solve the fault and restore service. This process also is useful when the alarm still is active and a shift-change takes place so the incoming shift has all pertinent information regarding the fault, including the current status.
If the alarms indicate failure, the NEs usually send clearing notifications to indicate an event's end. The EMs and the NMS, which receive clearing notifications, remove the corresponding alarms from the active alarm list. When no clearing notification is emitted, you must clear the alarms manually. Finally, the report should inform customer care that service has been restored. Some carriers use a knowledge database that stores information about typical fault situations and problem-solving activities. In this case, the information about the service restoration also should go to the fault-management knowledge base so that future problems may be resolved quickly. The report to the knowledge database should include the following information: area, location affected, start time, fault-clearance activities and duration of fault-clearance activities. In order to provide professional support to customer care, this information should be accessible in graphical interactive point-and-click maps.
All of these options and applications comprise the key cost-saving and reduction measures enabling simplified fault resolution, clear fault indication/information and efficient fault correlation. These measures and the efficiency they foster account for 60% to 80% of fault-management improvement. This improvement translates into a host of advantages for you.
Reducing behind-the-scenes complexity is getting easier. State-of-the-art technology, such as 3D hardware displays, scenario management via graphic displays of operation flow charts, and management applications that hide complexity and reduce the number of manual interactions show that virtual reality has graduated from a novelty to an indispensable business tool.
Want to use this article? Click here for options!
© 2012 Penton Media Inc.
advertisement
Learning Library
Webcasts
Using Real-Time Offers, Alerts and Interactions To Improve the Mobile Broadband Experience
In this Webinar you will learn how to create a real-time relationship with your customers, how to proactively improve the customer experience, and how to successfully target and cross-sell services to boost incremental revenue.
- Megabytes to Megabucks, Bandwidth to Business Models: How 4G Is Changing Everything
- How to Unplug Your Redundant Telco Apps To Save Money and Improve Efficiency
- When IaaS Isn't Enough: Service Provider Business Models to Drive Growth and Build Margin
- How to Transform Your Aging Telco Voice Network to Drive New Profits and Revenue
- Creative Licensing Approaches for Telcos & Their Network Equipment Vendors
- Smart Home Opportunity: Balancing Customer Data & Privacy
White Papers
The Role of Diameter in All-IP, Service-Oriented Networks
This paper discusses the rise of Diameter and benefits of Diameter Protocol.
- Conducting The Orchestration – Order Management at the Speed of Business
- Toward a Converged Network Edge
- Beyond Spam – Email Security in the Age of Blended Threats
- 6 Important Steps to Evaluating a Web Filtering Solution
- The Expertise to Protect You from Botnet and DDoS Attacks
- Seeing is Believing – Bridging the Order Visibility Gap
Featured Content
A time and money saving approach to fiber deployment
Service providers are under tremendous pressure to turn up new services faster then before and, at the same time,
to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service
turn-up.
of interest
The Latest
News
From the Blog
Briefingroom
Join the Discussion
Resources
Get more out of Connected Planet by visiting our related resources below:
Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.
Subscribe Now







