Solutions to help your business Sign up for our newsletters Join our Community

SLAs: The "satisfaction guaranteed" warranty

Service level agreements are the telecom industry's "satisfaction guaranteed" warranty. In a marketplace teeming with competitors, SLAs are a competitive differentiator, an operator's way of saying, "Pick me because I offer SLAs."

In the past few years, SLAs have grown increasingly prevalent among service providers due to a number of reasons. First, service providers are using SLAs to prove to their customers that they can deliver advanced, value-added services. By instilling faith in customers, service providers have an easier job of moving these customers to higher-priced services in the future.

Second, deregulation forces and the accelerated competition in the communications arena have empowered communications customers, who now have a better understanding of what they can demand from service providers. As customers become more technologically savvy, they are calling for higher service levels and want to see quality of service (QoS) guarantees in writing. 

Third, as customers grow ever more dependent on networked services, their willingness to tolerate service downtime considerably diminishes. By offering SLAs, service providers can set realistic goals and viable objectives from the get-go, thus bridging the gap between customer expectations and operators' capabilities. Helping customers understand that 100% service availability is unfeasible increases the chances that customers will be more tolerable of service disruptions.

Lastly, not all customers are created equal. Customers who want optimal bandwidth will be charged at a higher rate than customers who are satisfied with slower dial-up speeds.

To that end, when a network fault occurs and affects a service, an operator needs to prioritize repair activities according to parameters such as customer class or type of service that was affected.

Facets of SLA management

To give SLAs teeth, an operator needs to monitor services end-to-end from a customers' viewpoint. To that end, an operator needs an operational support system (OSS) that can define, store and correlate data on customers, services and network elements. The chosen OSS should be able to collect and process service level indicators and compare them to pre-defined SLAs to help operators ascertain that guaranteed service levels are being met.

The process whereby service providers define, monitor and report on SLAs is an ongoing one, encompassing a number of phases: 

SLA definition

In this initial phase, service providers need to populate the OSS's configuration database with information on all service-related entities such as network elements, services, customers and SLAs. By integrating and correlating network data with customer and service data, an operator can create customer-aware network information on services. This helps it to pinpoint services and customers that are affected by failures and identify those subscribers that consistently hit bandwidth limits and who are candidates for service upgrades. 

During this phase, operators also need to define service level indicators and the methods that will be used to measure them. For instance, service performance can be measured according to indicators such as availability, latency and throughput; service provisioning can be measured as the time that it takes to provision a new service; and customer care responsiveness can be measured as the average waiting time that a customer waits in the call center.

Finally, operators need to define the SLA contract between them and their customers based on a set of rules--for example, average time-to-repair = 2 hours; maximum time-to-provision = 2 days; minimum availability = 98%. Other than specifying the availability and performance of networked services, the SLA also sets penalties in case of SLA violations.

SLA monitoring

Today, customers want service levels to be understood from their perspective, not from server statistics. They want services to be measured as a whole, not in parts. From the end user's perspective, the availability of individual network components along the service path does not necessarily reflect the QoS being delivered. For instance, while a 99% network availability guarantee might apply to the overall average availability of the network to all subscribers, it might not apply to a particular user. For a customer who cannot download data or access crucial information using a laptop, 99% availability means little if he/she is suffering from poor performance or delayed service.

Using an OSS that leverages multiple data sources such as performance measurements, fault metrics, provisioning indicators and call detail records, an operator can monitor service levels as the end user experiences it.

Performance measurements include error seconds, severe error seconds, unavailable seconds and bit error ratio. To extract meaningful performance indicators from these measurements, the OSS must process the collected data. In cases where a service is supported by just one network element, the data is collected and processed from that one element (for example, in order to measure web availability, the OSS needs to collect and process measurements from the web server only). In cases where a service rides on a number of elements, the measurements need to be collected and processed from the entire service path.

When it comes to fault metrics, indicators such as time down and time back to service can be derived from alarms or trouble ticketing applications. For example, from the time stamps of service faults, the repair time can be processed and at the end of each month the mean time to repair and the mean time between faults can be calculated. 

Call detail records are another data source that provides invaluable SLA information on a per-user basis and which can be used to assess service health. Call detail records are collected from network elements such as switches, web servers and gatekeepers. Using call detail records, service providers can assess call completion rates in specific areas, perform call failure analysis to ascertain which subscribers in which zones are affected by network failures and measure traffic volumes.

In addition to monitoring services end to end using various service indicators, operators can also set thresholds based on service indicators. Thresholds can be defined for types of services, such as "Gold" services, as well as for raw indicators such as error seconds and calculated indicators such as service availability. For example, an operator can set a threshold whereby the availability of a given Web hosting service should conform to 90% availability during scheduled business hours. If the pre-defined threshold is about to be exceeded, the OSS notifies the operator of the impending problem. This enables the operators to take action before the SLA is breached and the customer experience is adversely affected.

By proactively monitoring services, not only can operators treat problems before they escalate into crises, but they can also take preemptive measures to respond to changing network conditions and maintain high-quality service levels. This includes passing some traffic at the expense of other traffic (real-time transactions as opposed to low-priority tasks, for example), allocating more bandwidth during peak hours to prevent overbooking and otherwise fine-tune network elements.

Report generation

An integral part of SLA management is the ability to report on service quality. SLA reports are important in that they enable operators to quickly grasp the status of networked services by presenting information in an easy-to-understand, graphical format. SLA reports enable a customer to receive up-to-date information on the QoS against SLAs and assess whether or not the service provider is delivering agreed-upon service levels.

An OSS should produce both predefined reports and customized reports tailored to the unique needs of a service provider. In addition, an ideal OSS should enable customers to access reports via the web at their own convenience.

Real-time reports enable operators to track the status of SLAs, graphically see when a service has degraded, and pinpoint potential failures or poorly performing network elements before an actual breakdown occurs.

Historical reports enable operators to identify long-term trends and investigate recurrent problem areas in the network. Historical reports are also useful for capacity planning purposes, enabling service providers to plan for future expansions and network growth more effectively.

SLA modifications

Service providers and their customers should meet at regular intervals to assess whether any changes have been made in the communications market or in the customer's organization that would necessitate modifying the SLA. For example, if the customer's organization has hired a significant number of new employees, it is reasonable to assume that the traffic on the network will greatly increase, which might lead to slower response times or increased downtime. That is why it is important that the two sides periodically convene to discuss recent changes and how they will affect the SLA. By having an OSS in place, operators can modify the SLA accordingly.

SLA management in practice--VoIP

VoIP services are conveyed across the IP domain, which spans IP routers and switches, and the voice domain, which encompasses the traditional public network, media gateways, signaling gateways and gatekeepers.

Service indicators can be defined for each domain based on various sources of information, including:

  • Alarms from various network equipment--which can be used as the basis for calculating the availability of both the IP and voice domains

  • The MIB of the routers--which is the main source of data on the performance of the IP domain. For example, data link utilization can be calculated based on the bandwidth of the link and on the number of packets in the ingress and egress of the link. Link utilization can serve as a service indicator to gauge network performance, where high utilization spells poor performance and low utilization indicates high performance

  • Call detail records generated by gatekeepers and media gateways--which can be used to define the Answer Seizure Ratio of the link between hubs

  • In the case of pre-paid VoIP services, the authentication, authorization, accounting (AAA) server and the interactive voice response can both be used as sources for service indicators. For example, the authentication success rate can be calculated based on the logs that the AAA server stores for every call attempt that was made.

For instance, a VoIP operator can sign an SLA with a customer and define service level thresholds and QoS as follows:

  • Committed availability of the whole network will be greater than threshold1

  • Average roundtrip latency in the IP domain will be less then threshold2

  • Number of high utilization events in a month will be less then threshold3 (an event can be classified as a high utilization event when the link utilization crosses the 80% mark for more than 10 minutes)

  • The answer seizure ratio between two destinations will be greater than the committed rate

  • The authentication success rate will be higher than threshold4.

The above agreed-upon SLA provides a complete and detailed picture of the QoS being delivered to the customer. This helps customers understand what it means to have high quality service and assists them in deciding how much they are willing to pay for it.

SLAs--A competitive advantage

In today's crowded market space, where customers can switch to a rival organization with one call or the click of a mouse, an operator can set itself apart from the pack by offering SLAs and demonstrating that it can deliver agreed-upon QoS levels. Using an intelligent OSS, service providers can:

  • Minimize revenue loss by monitoring services in real time and taking corrective measures before SLA violations

  • Increase profits by offering verifiable SLAs that foster trust and customer satisfaction and help make premium-priced services easier to sell

  • Attract and retain customers by ensuring the performance and availability of business-critical services that they rely on

  • Meet and manage user expectations of services by agreeing on service parameters that at the same time are both measurable and meaningful to the customer.

Avichai Levy is Vice President of Marketing for TTI Telecom.

Visit TTI Telecom online.

 

Learning Library

Featured Content

A time and money saving approach to fiber deployment

Service providers are under tremendous pressure to turn up new services faster then before and, at the same time, to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service turn-up.

The Latest

News

From the Blog

Briefingroom

Join the Discussion

Resources

Get more out of Connected Planet by visiting our related resources below:

Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.

Subscribe Now

Back to Top