Solutions to help your business Sign up for our newsletters Join our Community
  • Share

A whole new ballgame

Test and measurement in the telecommunications industry has been refined for nearly a century on the public switched telephone network. However, with the rush to deploy packet voice networks to exploit advantages of voice/data convergence, many traditional T&M techniques are no longer sufficient or even applicable. New techniques and strategies are needed, beginning with the definition of what parameters actually need to be measured.

More on this Topic

Industry News

Blogs

Briefing Room

SIDEBAR
Linear time-invariant systems

Defining new measurement parameters and techniques begins with examining the nature of packet voice networks. Although these networks can vary in their implementations, they share the same basic processes. For example, in a voice over IP (VoIP) network, voice is digitally encoded using non-compression or compression encoding. 

Voice activity detectors are often used in conjunction with encoding to suppress transmission during silence intervals. The voice bit stream is then packetized and transported over an IP network, which allows packets to take different routes. At the IP destination, the packets are sequenced and delay jitter-buffered. The voice bit stream is assembled and decoded, and then transmitted out on a DS-0 channel.

This scheme comprises several differences from the traditional PSTN, as outlined below:

PSTN

Packet Voice

Transport is deterministic; an 8-bit sample of voice will reach its destination, and will do so synchronized with the preceding and proceeding samples of voice.

Transport is non-deterministic; especially with IP and its “best effort” delivery, voice packets may or may not arrive at the destination, and can arrive unsynchronized and even out of sequence.

Guaranteed bandwidth with 64 kb/s channels dedicated to a call

No guaranteed bandwidth; voice packets for a call compete with other packets for the same bandwidth

Network is linear and time-invariant (LTI)

Network is non-LTI 
(see sidebar story)

The most important impact of these differences between packet voice networks and the PSTN is on call quality, and in particular, conversational quality. Today’s packet voice networks tend to deliver degraded conversational quality when compared to the PSTN. Addressing this issue requires the ability to quantify the parameters of conversational quality, using effective T&M techniques that differ from those used on the PSTN.

Conversational quality is comprised of many characteristics. The key parameters to consider are speech quality, delay, echo and loudness. Of these, speech quality is the parameter most in need of new T&M techniques.

Speech quality refers to the clearness and fidelity of speech reproduction. Speech quality on packet voice networks is affected by many processes, including:

  • PCM codecs and low bit-rate vocoders.

  • Front-end clipping as introduced by Voice Activity Detectors.

  • Temporal signal loss and dropouts as introduced by packet or cell loss.

  • Delay variance. Packet jitter that can result in ‘audio warping’ such that speech is not delivered at a constant flow. Though VoIP devices have jitter buffers to cancel jitter, perfect reproduction of the audio rate does not always occur.

  • Packet delay. While packet delay does not directly impact speech quality, increased packet delay can increase loss and jitter.

  • Environmental noise.

  • Signal attenuation and gain/attenuation variances.

  • Transmission channel errors.

The acceptable standards for speech quality have been defined for over a century by the PSTN, where levels of quality are quite predictable and reliable. Traditional metrics to measure signal quality on the PSTN are most valuable when applied to linear time-invariant (LTI) systems. Some metrics could be adapted to measure non-LTI systems by estimating those systems as LTI in short time segments; however, these metrics are still limited to measuring impairments due to analog transmission and waveform encoding.

Two common techniques are signal/noise ratio (SNR) and total harmonic distortion (THD). SNR is used to measure relative noise levels on analog signals, and quantization distortion introduced by PCM encoders. SNR can also provide an accurate measurement of the effect that bit errors have on a reproduced signal. Speech pauses can result in falsely poor SNR; therefore, a segmental SNR is used in which SNR is obtained only for intervals of speech and not for the silent periods between utterances.

SNR is useful only when the coding process generally maintains input waveforms at the output. In cases where low bit rate codecs and compression are used, however, it has been found that SNR and segmental SNR measurement results show little correlation to perceived speech quality. This is one of the reasons why new perceptual quality measurements are needed.

Total harmonic distortion (THD) and intermodulation distortion (IMD) measurements are techniques used to evaluate non-linear distortion introduced by signal processors such as amplifiers. THD is determined from a single-tone input, whereas IMD is determined from a dual-tone input.

While these metrics are useful for measuring non-linear distortion for tone inputs, they do not adequately reflect the quality of voice that has been processed by non-waveform codecs and packet networks.

Packet voice networks use many technologies such as low bit-rate codecs and packet transport that are non-LTI. These technologies render many traditional metrics for signal quality insufficient. When the transmission path is non-LTI, simple objective measurements (such as those specified in Recommendation G.712 for performance characteristics of PCM systems) are not adequate. 

In addition, even those traditional metrics that can be applied to non-LTI systems via LTI estimations do not adequately predict a person’s perception of speech quality. For example, segmental SNR and THD will not account for a person’s ability to adapt to missing time-frequency energy components as a result of voice encoding or small packet loss.

In recent years, as new technologies were deployed, the industry recognized the need for new measurement techniques that accurately represent speech quality the same way humans perceive it.

The most obvious method to measure speech quality, and conversational quality in general, is mean opinion scoring (MOS) tests. MOS tests use large numbers of human subjects to produce statistically valid quality scores. The techniques for performing MOS testing on networks and codecs are described in the International Telecommunication Union (ITU) recommendations P.800 and P.830.

There are several types of MOS tests for both conversational quality and listening quality. The most widely recognized MOS test is an Absolute Category Rating (ACR) for listening quality. This test asks subjects to rate the quality of speech using the following scale:

Score Quality of Speech
5

Excellent

4 Good
3 Fair
2 Poor
1 Bad

Obviously, MOS testing has several drawbacks. It is highly subjective and not a repeatable or consistent method for testing. It is expensive, inefficient, and impractical to use on a frequent basis for network testing.

Objective, automated and repeatable testing methods are needed for measuring speech quality. However, correlation with MOS test results is the benchmark for determining the accuracy and value of these objective methods.

The three most widely used techniques for measuring speech quality on packet voice networks are PSQM, PAMS and PESQ. These measurements are similar in four ways:

  • Each measures perceptual speech quality for narrowband (300-3400 Hz) telephone signals

  • Each requires active testing, in which a reference voice signal is transmitted across a network, and the received voice signal is compared with the reference signal

  • Each comprises a mathematical process that measures the differences (distortion) between the received signal and the reference signal, based on factors of human perception

  • Each produces a speech quality score, with the objective of accurately correlating with the results of subjective or MOS tests

The Perceptual Speech Quality Measurement (PSQM) was developed to provide objective measurements of perceptual speech quality for low-bit rate codecs. It was approved by the ITU-T as Recommendation P.861 in 1996, and has since gained wide acceptance as a consistent and accurate measurement of speech quality based on human perception factors.

The objective of PSQM is to produce scores that reliably predict the results of subjective tests, particularly those methods in P.830 (MOS). PSQM scores, however, are on a different scale, and reflect a perceptual distance measure. That is, PSQM scores reflect the amount of divergence from a clean signal that a distorted signal exhibits once it has been processed by some telephony system.

PSQM scores range from 0 to infinity, representing the perceptual distance between the input and output signals. For example, a 0 score indicates a perfect match between the input and output signals, or perfect quality. Higher PSQM scores indicate increasing levels of distortion, or lower quality. In practice, upper limits of PSQM scores range from 6 to 12.

PSQM became popular as a way to measure speech quality not only across vocoders but also across entire packet voice networks. One PSQM drawback, however, is that it does not accurately report the effect of distortion when that distortion is caused by packet loss or other types of time clipping. In other words, PSQM would report better quality under these conditions than a human would.

In response to this drawback, an improvement to the PSQM model was developed and submitted as a contribution to ITU P.861. The improved model is known as PSQM+, and is preferred to PSQM for measuring speech quality in network environments. PSQM+ improves the way the PSQM technique is applied to a system that experiences severe distortions due to time clipping and packet loss. For systems comprising speech encoding only, PSQM and PSQM+ give identical scores.

The Perceptual Analysis Measurement System (PAMS) is another technique for measuring perceptual speech quality. It offers a different model than PSQM+ but with the same goal: to objectively predict results of subjective speech quality tests for networks on which coding distortions as well as time-clipping and packet loss are potentially problems. PAMS has gained wide acceptance worldwide as an effective and robust measurement of speech quality in packet voice networks.

PAMS uses a model based on factors of human perception to measure the perceived speech quality of an output signal as compared with the input signal. Although similar to PSQM in many aspects, PAMS uses different signal processing techniques, and a different perceptual model. PAMS has proven to be more robust and accurate on networks that exhibit severe delay and delay variation problems.

PAMS test results are scores that range from 0-5, and that correlate on the same scale as MOS testing. In particular, PAMS produces a Listening Quality Score and a Listening Effort Score that correspond with the opinion scales in P.830.  Several other measured distortion parameters are produced by PAMS, including the calculation of an Error Surface that shows audible errors in the time-frequency domain of the received signal.

From Tolly Research
Planning a Converged Network:
A Closer Look at Voice Quality Test Tools

May 2001
36 pages
Price: $1,595

From 1998 to 2000, the ITU reviewed submissions for new perceptual speech quality measurements. Included in this review were 1999 versions of PSQM+ and PAMS, both of which proved to best match subjective testing. It was determined that each had significant merits and that it would be beneficial to the industry to combine their merits into a new measurement technique. This new technique is called the Perceptual Evaluation of Speech Quality (PESQ). PESQ was recently approved by the ITU as Recommendation P.862, which replaces P.861.

PESQ leverages the best features of PAMS and PSQM, and adds some new features. PESQ is an effective technique for measuring speech quality on networks with low bit-rate vocoders, variable delay, filtering, packet or cell loss, time-clipping and channel errors. PESQ scores correlate well with ACR listening quality scores. It has proven to be more accurate in this correlation than PAMS or PSQM.

A second important aspect (in addition to speech quality) of conversational quality is delay. Delay represents the time needed for a voice signal to be transmitted from a speaker’s mouth to a listener’s ear. Unlike speech quality, the definition of a delay measurement is the same on both packet voice networks and the PSTN. However, the techniques for measuring delay on the PSTN are not as effective on packet voice networks. 

One common technique used on the PSTN is to use an acoustic ping or tone. A short audio signal is transmitted across a network, and the time difference from the transmission source to destination is measured. This technique is susceptible to packet loss and time-clipping.

One technique that has been proven effective in measuring delay on packet voice networks is called normalized signal cross-correlation. This technique measures the impulse response of a network by transmitting a burst of pseudo-random, pattern-repeating noise referred to as Multiple Length Sequence (MLS). The MLS signal provides a highly predictable signal pattern to enable very accurate signal correlation.

In delay measurements using this technique, an MLS signal is transmitted from source to destination. The received signal is normalized to, and correlated in time with, the transmitted signal. The measurement determines where in time the distribution of received signal energy correlates the best with the transmitted signal energy. The result of this correlation is a measured time-shift equal to the delay of the received signal. Measuring delay by cross-correlating waveforms is not feasible in non-LTI systems such as VoIP networks. Therefore, a cross-correlation of signal energy via a piecemeal impulse response measurement is performed.

This method offers several advantages over other methods that measure delay via an acoustic ping or tone:

  • It provides a more robust time synchronization between the transmitted and received signals.

  • It is less susceptible to front-end clipping, noise, attenuation and loss, any of which can mask detection of a ping or tone.

  • It measures delay for a signal comprising multiple frequencies.

  • It provides visibility into dynamic delay.

A third important aspect of conversational quality is echo. Although echo is an impairment introduced within the PSTN, where it is effectively addressed by echo cancelers, it is raised as a new issue within packet voice networks.

Echo impacts conversational quality with two parameters: an echo signal’s delay and loudness. Delay in the PSTN is primarily related to transmission distance. Therefore, in  the PSTN, echo cancelers are deployed on network links that have a minimum transmission distance.

Packet voice networks introduce process-related delay. The result is that an echo signal may be delayed enough to impact quality, even on a short transmission path where echo cancelers would not be deployed. Essentially, new echo canceler policies are needed for packet voice networks. This also means that measuring echo on packet voice networks is required, but the same techniques may be used. ITU G.131 describes the effects of echo delay and loudness on call quality. Acceptable levels of echo loudness as a function of echo delay are given. Measurements of echo loudness and delay are thus needed to determine if a network is effectively handling echo.

Packet voice networks require new T&M techniques for two reasons: new parameters such as speech quality have become more variable and thus require measurement, and traditional measurement techniques such as SNR do not apply well to measuring these parameters on packet voice networks. Fortunately, the industry has responded, and new T&M techniques have emerged and proven effective. 
John J. Anderson is IP Telephony Product Manager for Agilent Technologies' Network Systems Test Division, Palo Alto, CA. He can be reached at john_j_anderson@agilent.com or by phone at 719/531-4526. 

Visit Agilent online.

return to top


Linear time-invariant systems

Linear time-invariant (LTI) systems  is a characteristic often ascribed to audio circuits and channels that describes, in a general way, how the audio circuit or channel is likely to behave when it processes an input signal.

A system or network is linear if it is both additive and homogenous.

Additive:

The output response resulting from the input x(t) + y(t) is equal to the output response resulting from x(t) plus the output response resulting from y(t). That is, the output function [F(x+y)](t) = Fx(t) + Fy(t). In other words, the output signal from a combined input of signal x and signal y is the same as adding the output signals from individual uncombined inputs of signal x and signal y.

Homogenous:
The output response resulting from the input a[x(t)] is equal to a times the output response resulting from the input x(t), where a is a scalar value. That is, the output function [F(ax)](t) = a(Fx)(t). In other words, a signal multiplied by a and then input into a system would produce the same output as if the signal was input (without multiplying a) and then multiplying the output by a.

 

Linearity can thus be summarized by: 

[F(ax+by)](t) = a(Fx)(t) + b(Fy)(t)

 

A system or network is time invariant if for any delayed input x(t – t0), the resulting output response is y(t – t0) [1]. That is, the shape of the output response waveform is independent of delay. A variation in the delay of an audio signal can cause a time invariant signal to become time variant. Thus, even when using linear encoding techniques, delay jitter introduced by a non-deterministic network (such as a packet network) can produce a time-variant system.
--John Anderson


REFERENCE
1. Couch, Leon W., “Digital and Analog Communications Systems”, Macmillan Publishing, 1987

return to top

Want to use this article? Click here for options!
© 2012 Penton Media Inc.

Learning Library

Featured Content

A time and money saving approach to fiber deployment

Service providers are under tremendous pressure to turn up new services faster then before and, at the same time, to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service turn-up.

The Latest

News

From the Blog

Briefingroom

Join the Discussion

Resources

Get more out of Connected Planet by visiting our related resources below:

Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.

Subscribe Now

Back to Top