Killing the Software Bug
Think about the last time you had the flu. You probably felt great one minute. The next, you were down for the count. When a bug hits, it attacks with a vengeance, sometimes putting you out of commission for days.
Industry News
Blogs
Briefing Room
advertisement
Your wireless network is not much different. A bug can cause your system to plummet into a critical condition within a matter of minutes. Although there is no known cure for a virus of this nature, you can ward off devastating problems by building up your network's immune system with monitoring, testing and a detailed recovery plan.
Software defects can happen several ways. Sometimes, your vendor software may have a bug in it before you install it. Other times, it might malfunction when you update your database. For instance, Ken Woo, AT&T Wireless corporate spokesperson, said whenever a carrier loads new software or customer lists into the HLR, there is always a possibility that something could go wrong with the software.
"We experienced that in the early days when we were converting over from analog to digital networks, and we were running both at the same time," he said.
Likewise, in late June, the San Jose Mercury News reported that corrupt data in Pacific Bell Mobile's customer database cut off wireless service to 420, 000 customers during a reload process.
NETWORK VITAMINS: MONITORING The best way to combat a software bug is to never let it happen, said Jack Barnett, Lucent wireless solutions manager in the communications software organization. All too often a carrier discovers a software glitch because customers are calling in and complaining. In this case, either your equipment did not report an alarm, or your employees missed the warning signals.
"The key is being aware of the problem early and knowing exactly what the fault is, because a lot of the issues come down to how quickly you can determine what the fault is and repair the problem at that point," he said.
A combination of powerful network fault-management tools and perceptive network operators is the first step toward preventing a software glitch. Monitoring and analysis technology, which usually runs in your network operation center (NOC) 24 hours a day, gives your operators the diagnosis capability to understand where the configurations may have been incorrectly installed.
But monitoring is only half the battle. Your operators have to be aware of what the network is telling them. Many times the switch will indicate an error in the data, but if it is a subtle failure, the technician may let warning messages go by because he is flooded with information. Some later event could intensify the problem until a significant outage occurs, Barnett explained. Operators must be aware early if something appears to be in trouble. They must not dismiss warnings; that is where analysis and correlation tools are critical for the NOC as well.
Greg Selig, Airadigm director of network operations, said his operators watch alarms closely so problems do not escalate to the point of becoming customer-affecting issues. At the same time, they do not have time to chase false alarms.
"You have to be able to clean up your system," he said. "If you don't do that, an alarm that started this morning may be passed by if you get used to seeing it on a routine basis. Keeping things clean allows you to see the real software-related problems."
Because so much of your reliability hinges on your employees, Selig advises carriers to hire switching people who think about the network, not just about managing the switch. They should understand how the call works in the wireless network and have a view of full functionality. That global view, in turn, helps them to test processes on the switch better.
Frank Salm, AG Communication Systems product marketing manager for the IN business unit, said that software integrity generally is built in at several different layers, such as application, middleware and the operating system. The applications tend to be developed to account for a multitude of different error situations related to their immediate domain but cannot protect against situations such as writing data into an incorrect memory location. Usually, a middleware component takes control of that and prevents errors of this type.
"The middleware provides core functionality for all your applications to run on," he said. "Using that component should capture those error conditions and be able to handle them as elegantly as possible."
GET YOUR CHECK-UP Salm said a number of software problems arise simply because today's products are highly complex. Frequently, carriers do not go with a single software vendor, so several layers of software are built upon each other. Therefore, quality assurance and testing are paramount when it comes to software reliability.
Carriers must choose a vendor that builds safeguards and feedback cycles into their products. But even if you do have quality software, it doesn't take much to cause a problem, Salm said. He recalled an SS7 network outage a few years ago that resulted from a single byte of information that was coded incorrectly by a manufacturer. That software then was distributed to several nodes. The only way to avoid something so small causing such a large problem is to test each load at several levels. Your software vendor should test for defects at the unit, integration, system and network levels before you install the product to make sure that multiple network elements are able to work together.
Salm said most vendors test their products in extreme conditions in a realistic lab environment before they hit the field. In fact, one of the reasons software is more expensive in the telecommunications industry than in the computer environment is because the vendor may spend 40% to 60% of its time on extensive testing. However, there is no way the vendor can possibly test every single software path. As an extra precaution, added Selig, you should test the product yourself as well.
"What the vendor delivers from the lab may be universally solid and error-free, but until you test it in a live actual network, every one of which is different from the other, you can't be certain that you can release it to your customers," he said.
In the labs, vendors make their best guess about the configuration of the software. They can never know for sure how one load would respond and react when integrated with several different carriers, Selig continued. Sometimes the vendor will work with you in your testing.
Patrik Ringqvist, Ericsson director for research and development, said Ericsson delivers new software to one customer at a time and implements it in one part of the carrier's network. After letting it soak in that environment with no problems, Ericsson will place it into the other parts of the network. Once it works flawlessly in the entire network, the company will deliver it to other carriers.
Although you may get a new load from your vendor only once a year, Selig recommends that you test your network every time you apply a software correction, which may be every few weeks. If you are diligent about testing processes around an area where you are making a correction, you can make sure it won't break something else.
A QUICK RECOVERY Even people who take vitamins get sick every once in awhile. Similarly, your pro-active measures to prevent software glitches may not be enough every time. In the event of a crash, a quick recovery plan is essential. Some systems are designed to recover automatically from a software bug, but that does not mean the crash will be transparent to your customer, Ringqvist said. Recovery takes time. In most cases it is not a big impact, but at times it can be significant. Mechanical recovery has to be built into the entire network, and not just in a particular system. That way, if one particular network element of a switch has a software problem that causes signal links to go down, other switches should be able to recover. Your customers won't feel the effects as much as they would otherwise.
AT&T Wireless has built redundancies into its system so that if there is a bug in a new piece of software, the backup system immediately comes up. The company has redundancy capability in all of its markets so that if one switch in the mobile switching center goes down, the other will kick in automatically.
"In the event that the entire Seattle system goes down entirely, we can switch things over to the Portland system or something else because we have enough switches around the country," Woo said.
But Tom Holmen, Ericsson vice president of customer support, said if you build in physical redundancy, you don't necessarily protect yourself against software errors. If you have two identical nodes, one as a standby to the other for load sharing, and you have a software fault in one node because of a load-related problem, the other node will have exactly the same software fault when you transfer over to it.
"There are certain software glitches you will be protected against, but you have to be very careful when you talk about redundancy," he said. "If you run the same software in the nodes, you don't protect yourself against software faults."
Lucent's Barnett agreed, adding that the cost of developing totally separate software is prohibitive. For that reason, almost every carrier should have a backup of previous configurations so it can restore the network. He suggests that you back up your network by storing and archiving the configuration information of any component. However, you must be sure to back up the system every time you change a configuration.
"If you have old configurations, then your backup copy is useless," he said.
There is no way to guarantee your network never will fail. Anything from network bugs in vendor software to natural disasters can bring it down in minutes. Fires, earthquakes and storms are expected problems for which carriers generally prepare; but a software bug causes smaller unexpected problems, and recovery can be more difficult. The simple prescription to prevent ailing software is to back up your system, test your network frequently, and have a strategic recovery plan ready in case of an emergency.
Want to use this article? Click here for options!
© 2012 Penton Media Inc.
advertisement
Learning Library
Webcasts
Using Real-Time Offers, Alerts and Interactions To Improve the Mobile Broadband Experience
In this Webinar you will learn how to create a real-time relationship with your customers, how to proactively improve the customer experience, and how to successfully target and cross-sell services to boost incremental revenue.
- Megabytes to Megabucks, Bandwidth to Business Models: How 4G Is Changing Everything
- How to Unplug Your Redundant Telco Apps To Save Money and Improve Efficiency
- When IaaS Isn't Enough: Service Provider Business Models to Drive Growth and Build Margin
- How to Transform Your Aging Telco Voice Network to Drive New Profits and Revenue
- Creative Licensing Approaches for Telcos & Their Network Equipment Vendors
- Smart Home Opportunity: Balancing Customer Data & Privacy
White Papers
The Role of Diameter in All-IP, Service-Oriented Networks
This paper discusses the rise of Diameter and benefits of Diameter Protocol.
- Conducting The Orchestration – Order Management at the Speed of Business
- Toward a Converged Network Edge
- Beyond Spam – Email Security in the Age of Blended Threats
- 6 Important Steps to Evaluating a Web Filtering Solution
- The Expertise to Protect You from Botnet and DDoS Attacks
- Seeing is Believing – Bridging the Order Visibility Gap
Featured Content
A time and money saving approach to fiber deployment
Service providers are under tremendous pressure to turn up new services faster then before and, at the same time,
to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service
turn-up.
of interest
The Latest
News
From the Blog
Briefingroom
Join the Discussion
Resources
Get more out of Connected Planet by visiting our related resources below:
Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.
Subscribe Now







