Performance similarities of next-generation routers and the largest supercomputers
Changing dynamics in the telecommunications industry are compelling service providers to carefully reassess their network architectures. The rapid declines in traditional voice revenues and market acceptance of flat-rate data services are motivating service providers to deliver new, innovative, and profitable Internet Protocol-based service offerings.
Industry News
Blogs
Briefing Room
advertisement
On the operations side, the high expenditures that are needed to maintain separate networks, such as Frame Relay and Asynchronous Transfer Mode (ATM), are prompting leading service providers to envision a next-generation network where these service-specific networks will be consolidated. In this new network environment, Frame Relay and ATM circuits will be integrated into a unified, multiservice infrastructure that is an IP/Multiprotocol Label Switching (MPLS) backbone.
The continued growth of consumer broadband and business subscribers, the expansion of mobility markets and the greater availability of higher access speeds will also require a much more robust, highly scalable core to handle the expected annual doubling of bandwidth in the United States and Europe and the expected tripling and possibly quadrupling of bandwidth in Asia. To meet these demands, service providers will need to deploy a new class of carrier core routers that have taken a major leap forward in design. While the bandwidth of external connections on core routers has increased in recent years from OC-3 to OC-48 and OC-192, tomorrow’s core routers will need to support OC-768 connections operating at 40 gigabits per second (Gbps). In addition, the number of line cards that the core router will need to support will grow dramatically to handle the aggregate subscriber and backbone bandwidth growth. To achieve lower operating costs while eliminating separate networks will require a carrier-class core router that will also help enable point-of-presence (POP) consolidation for network simplification.
To meet these new demands, tomorrow’s router architectures will have to function very differently from those of today. They will require distributed memories and multi-stage fabrics that replace single-stage crossbars, allowing extraordinary scalability as each stage of the fabric increases scale by the square of the individual switch element’s size. To handle the increased computational demands for determining routing paths, performing lookups, shaping traffic, and making queuing decisions across hundreds or thousands of line cards, the number of route processors will need to scale in a distributed, parallel design. The scale and processing power of these next-generation core routers will make them comparable to the designs of supercomputers. In fact, in time they may well surpass the performance and capabilities of the largest supercomputers.
How will a next-generation carrier-class core router compare with a high-performance, parallel processing supercomputer? A good basis for comparison is the requirements for a next-generation 40-Gbps per line card router with that of the U.S. Department of Energy’s Advanced Strategic Computing Initiative (ASCI) supercomputer, called ASCI White. ASCI White was built by IBM and now performs nuclear simulations at the Lawrence Livermore National Laboratory in California. When ASCI White was deployed in 2000, it was the fastest supercomputer in the world.

Figure 1: Next-Generation Core Router
The next-generation core router will be massively parallel and distributed with a huge computational power. The 40 Gbps-capable line cards, or forwarding data planes, will need to provide packet processing for 100 million packets per second (pps)––parsing packets as they enter the card and examining Layer 2 through Layer 7 information. Prioritizing traffic to quality of service levels, assigning VPN IDs, classifying and filtering packets, and multicast functions will all be handled at line-speed rates. Using general-purpose processors, this packet processing will require at least 500 instructions per packet. Once they are processed on the ingress of these cards, the packets will be shuttled across a multistage fabric in fixed cells. When the cells reach an egress line card they must be reassembled into packets. A second level of packet processing will then occur to apply egress features and determine output queuing priority. The packets are then sent to the appropriate interface, such as a one of multiple 10 Gbps (OC-192 or 10GE) or a 40 Gbps (OC-768) interface.
The majority of processing power for this next-generation router will occur on the line cards. With 100 million pps crossing a 40-Gbps interface, each line card will need a total of 100,000 million instructions per second (MIPs) processing power (500 instructions per packet x 100 million pps x ingress + egress). A router with 256 40-Gbps line cards will require 25,600 billion instructions per second (BIPS). Each line card will also have a control processor that will provide 1 BIPS of throughput––an extra 256 of BIPs on a 256 line-card system.
Additional processing demands on the next-generation router will come from the route processor subsystem, which will also be arrayed in a distributed design––unlike today’s routers that rely on a centralized control plane consisting of one route processor (and possibly one standby). This will be driven by the increased demands placed on the route processors. First, as the number of network-connected devices continues to grow, the levels of route computations and communication will increase. In addition, the route processors will need to download this information to the forwarding information bases (FIBs) located on hundreds of line cards. And in the event of a “route flap”––where a link or a router goes down––the route processors will need to broadcast massive amounts of FIB updates to each of the hundreds of line cards.
It’s estimated that a router with 256 40-Gbps line cards will need 32 route processors, and each route processor will generate 1 BIPS of throughput, resulting in an additional 32 BIPS. Total throughput for the core router will equal 25,888 BIPS.
How do these numbers compare with the ASCI White supercomputer at Lawrence Livermore Labs in California?

Figure 2: ASCII White at LLNL
As shown in Figure 2, the ASCI White supercomputer contains 512 processing nodes––460 compute nodes, 32 visual processing nodes (used to transform data computations into a visual display), 16 General Purpose File System (GPFS) server nodes, and 4 I/O nodes.
The compute node subsystem generates the majority of processing power on a supercomputer such as ASCI White. It is essentially responsible for all of the “heavy lifting”––which amounts to performing calculations on billions of variables. Each compute node is assigned a particular task; it may be calculating algorithms in a simulation of an earthquake, for example, and one node will be assigned a segment of the earth nearby the fault line. Another node will be assigned the algorithms and data for the earth adjacent to the location of the first node. Therefore, not only must each node perform the needed calculations and determine the results––that the earth moved 10 millimeters north, for example––it must also communicate this information to the adjacent compute nodes because these results will affect the adjacent segment of earth. All of the compute nodes must do their work and communicate with each other in order to faithfully represent the physical simulation of the event.
Each processor in ASCI White operates at 1.5GFLOPS or 1.5 billion instructions per second (BIPS). With 7,360 processors (16 processors per node x 460 nodes), the total compute node subsystem generates 11,040 BIPS. When you then factor in the remaining 52 nodes, and ASCI White’s 8,192 processors generate 12,288 BIPS. In terms of pure aggregate computational power, therefore, a large next-generation core router will be almost twice as powerful as ASCI White––25,888 BIPS versus 12,288 BIPS.
High memory bandwidth is another primary metric of system performance. This is particularly important on a router because the platform’s major function is to move data in and out of queues quickly. The next-generation router will use a distributed memory line card with input queues on the card’s ingress and output queues at the card’s egress. This doubles the amount of memory bandwidth required. Also, because data is written into a queue and eventually read out at least once, memory bandwidth must once again double. Therefore, on a 40-Gbps line card bandwidth will require four times the speed of the line card: 4 x 40 Gbps =160 Gbps per line card. The card must also have extra memory bandwidth to fragment the arbitrary-sized packets into fixed-size memory buffers, requiring another doubling of the memory bandwidth to 320 Gbps.
On the sample router with 256 40-Gbps line cards discussed in this article, this will require a total memory bandwidth of 81.9 terabits per second (Tbps), calculated as follows: 320 Gbps per line card x 256 line cards = 81.9 Tbps. In comparison, ASCI White requires memory bandwidth for each of the 512 nodes on the system––both for computational memory in the compute nodes and for IO buffer memory on the external interface nodes. IBM’s ASCI White provides 2 gigabytes per second (GBps) of memory bandwidth per processor. With 8192 processors in ASCI White, total memory bandwidth can be calculated as follows:
8192 processors x 2 GBps per processor = 16384 GBps; and, 16384 GB/8 GB = 20480 Gbps, or 20.4 Tbps.
As shown in this article, the power and throughput of high-end core routers have increased dramatically, to the point where a next-generation core router will be at parity with the highest-performing parallel supercomputers. In fact, in terms of aggregate system performance, the next-generation router will be twice as fast as ASCI White primarily because the router can scale to 256 or more 40-Gbps line cards––a phenomenal increase over today’s routers. The router will also hold a four-fold advantage in memory bandwidth: 81.9 Tbps versus 20.4 Tbps.
|
Feature |
Router with 256 40 Gbps Line Cards |
512-Node (8192 Processor) ASCI White Supercomputer |
|
System throughput |
25,888 BIPS |
12288 BIPS |
|
Memory bandwidth |
81.9 Tbps |
20.4 Tbps |
Table 1: Overall System Comparison, Core Router versus ASCI White
In addition to these performance parameters, next-generation high-end routers have high-availability requirements beyond that of supercomputers. As carriers converge their backbones on IP routers-based infrastructures, multiple services including voice over IP (VoIP) demand that routers provide the same resilience as their current telephony equipment. The redundancy of multiple compute nodes, or line cards and route processors, in the distributed architecture of both supercomputers and routers form a good base for high availability.
While this simple redundancy is sufficient for the supercomputer that uses very coarse-grained check-pointing and job restart, the router must do much more to meet its real-time requirements. The router hardware must be designed with robust error detection and correction logic to quickly detect hardware failures and either mask them or trigger failover to redundant hardware. Likewise, the router software must support hot and warm standby with process restart on application crashes, and failover to redundant route processors or line cards on hardware and kernel software problems.
While supercomputer performance will increase in the coming years, so, too, will that of next-generation core routers. In fact, some in the industry estimate that routers will increase at a faster rate given the strong market demands for increased network bandwidth. Within the next five to seven years, next-generation carrier-class core routers will likely grow beyond 40 Gbps per line card while also further increasing their capacity. Today, routers already surpass supercomputers in their high-availability support, and in the future, as core router performance grows, they are likely to surpass supercomputers as the world’s most powerful computational devices.
Dan Lenoski is the vice president of engineering at Cisco Systems. He can be reached at lenoski@cisco.com.
Visit Cisco Systems online.
Want to use this article? Click here for options!
© 2012 Penton Media Inc.
advertisement
Learning Library
Webcasts
Using Real-Time Offers, Alerts and Interactions To Improve the Mobile Broadband Experience
In this Webinar you will learn how to create a real-time relationship with your customers, how to proactively improve the customer experience, and how to successfully target and cross-sell services to boost incremental revenue.
- Megabytes to Megabucks, Bandwidth to Business Models: How 4G Is Changing Everything
- How to Unplug Your Redundant Telco Apps To Save Money and Improve Efficiency
- When IaaS Isn't Enough: Service Provider Business Models to Drive Growth and Build Margin
- How to Transform Your Aging Telco Voice Network to Drive New Profits and Revenue
- Creative Licensing Approaches for Telcos & Their Network Equipment Vendors
- Smart Home Opportunity: Balancing Customer Data & Privacy
White Papers
The Role of Diameter in All-IP, Service-Oriented Networks
This paper discusses the rise of Diameter and benefits of Diameter Protocol.
- Conducting The Orchestration – Order Management at the Speed of Business
- Toward a Converged Network Edge
- Beyond Spam – Email Security in the Age of Blended Threats
- 6 Important Steps to Evaluating a Web Filtering Solution
- The Expertise to Protect You from Botnet and DDoS Attacks
- Seeing is Believing – Bridging the Order Visibility Gap
Featured Content
A time and money saving approach to fiber deployment
Service providers are under tremendous pressure to turn up new services faster then before and, at the same time,
to do it at less expense - and intra-office fiber is one of the biggest challenges in terms of both cost and service
turn-up.
of interest
The Latest
News
From the Blog
Briefingroom
Join the Discussion
Resources
Get more out of Connected Planet by visiting our related resources below:
Connected Planet highlights the next generation of service providers, as well as how their customers use services in new ways.
Subscribe Now







