Review: Is white-box switching the future of networking?
This differs from virtually all conventional data center switches and routers, which store configuration data in a single startup file. Some approaches – such as HP Intelligent Resilient Framework (IRF) and Juniper QFabric – even span multiple physical switches or routers using one configuration file.
With Linux, configuration data lives in lots of places. Cumulus Linux uses /etc/network/interfaces
to store layer-2 information such as data about interfaces and VLANs, but also puts interface setup information in /etc/cumulus/ports.conf
. If routing is enabled, the system will consult the /etc/quagga/daemons
file to see which routing protocols to load and the /etc/quagga/Quagga.conf
file to see how routing is configured.
And that’s just the beginning. Setting protocol aging timers – something we do to avoid contention with the test traffic we generate – can involve changes to a half-dozen or more variables stored in Linux’s /proc
filesystem. Granted, that’s a use case mainly relevant in test labs, but there are similar examples involving everyday networking tasks.
For example, it’s anything but simple to extend Address Resolution Protocol/Neighbor Discovery (ARP/ND) timers beyond Linux’s default 30-second aging time. This short timeout is fine for hosts, but in networking devices it can create a lot of unnecessary traffic. For that reason, many switches and routers use much higher default ARP timeout values (Cisco’s default is 4 hours).
To complicate matters, the Linux state machine uses a variable timeout value for consistency with IPv6 behavior, even with IPv4 traffic. The timeout is tunable, as with virtually all parameters in Linux, but it involves setting at least four parameters apiece for IPv4 and IPv6 under the /proc
filesystem. Even then, we also had to stop a Cumulus Linux watchdog script called arp_refresh
for the longer timer values to take hold. Cumulus has an open bug report about there being too many variables involved to set ARP/ND aging.
Cumulus Linux also differs in terms of which commands accomplish common networking tasks. For example, Linux access control lists filter routing table entries rather than packets, the opposite of Cisco’s ACLs (but similar to the route-filter command
in Juniper’s Junos OS).
Cumulus Linux uses the included Netfilter firewall for packet-filtering functions. But instead of using iptables
and ip6tables
to configure Netfilter, Cumulus recommends its own cl-acltool
to configure ACLs.
Linux also has extensive quality-of-service and route selection features, available through the traffic control (tc
) and policy routing capabilities respectively. These are conceptually similar to the service/policy mapping model and policy-based routing in Cisco devices.
8. CUMULUS LINUX COMBATS CONFIG BLOAT, UP TO A POINT
Cumulus says it’s fully aware that network engineers are used to seeing configuration and management information in one place. To that end, it’s adding some commands that nicely aggregate information from multiple places. There’s still lots to do here, but the current commands begin to address the issue of configuration bloat.
A good example is netshow
, which Cumulus uses to collect interface and troubleshooting data from multiple places. To get a quick overview of which interfaces are physically up, the network interface
command does the trick:
cumulus@cumulus$ netshow interface
Name Speed Mtu Mode Summary
-- ------ ------- ----- -------- ------------------------
UP lo N/A 16436 Loopback IP: 127.0.0.1/8, ::1/128
UP eth0 1G 1500 Mgmt IP: 172.31.128.11/24
The netshow
command also has options to get status about traffic statistics, including errors, as well as neighbor information using Logical-Link Discovery Protocol (LLDP). For automation, output from many netshow
commands can be formatted in JavaScript Object Notation (JSON) format.
These kind of interface status displays are very similar to those in conventional data center devices, and can save network engineers the trouble of pawing through multiple configuration files, as described above.
Another useful tool for troubleshooting is Cumulus’ cl-resource-query
command. It displays system limits, such as current and maximum levels of layer-2 and layer-3 forwarding tables, all in one screen. That’s very handy when investigating resource exhaustion issues. In contrast, discovering system limits in other switches and routers may require multiple commands, if they’re even available, or consulting product data sheets.
9. PERFORMANCE IS A NONISSUE
Performance was excellent across the board. Tests with all variations of unicast, multicast, IPv4, and IPv6, switching, and routing produced uniformly strong results.
Both with 64 10G Ethernet interfaces and a combination of four 40G Ethernet and 48 10G Ethernet interfaces, the switch moved traffic at virtual line rate in every single test case. We say “virtual” line rate because we saw trivial packet loss at nominal line rate – but no loss when we offered traffic at 99.999 percent of line rate. That’s a difference of 10 parts per million, and likely attributable to clock speed differences between the switch and the Spirent test tool. We don’t think the difference is significant.
Latency was low and predictable in all cases. Delay results were very much in line with other 10G Ethernet top-of-rack switches we’ve tested.
We also ran functional tests of VLAN trunking and link aggregation between the Edge-Core/Cumulus Linux combo and an Arista data center switch; in both cases, the white-box system behaved exactly as expected.
10. THERE MAY BE A WHITE BOX IN YOUR FUTURE
In the end, the decision to embrace white-box switching will depend on multiple factors, including economics, familiarity with Linux, sunken investment in training and certifications, and dependence on proprietary features. Any one of these might be a good reason to stick with proprietary switches, at least for now.
But we think that first one, economics, ultimately will matter most, for one simple reason: We’ve seen this movie before.
Fifteen to 20 years ago, Linux-on-commodity-hardware was an upstart going against HP, IBM, and Sun, the entrenched vendors of proprietary servers. That didn’t end well for the incumbents: Linux and white-box servers won, and the turnkey server market imploded, unable to compete with off-the-shelf hardware and free or low-cost software.
Then as now, Linux doesn’t offer all the features the incumbents do. It doesn’t have as large an army of well-trained and well-paid wizards looking after its care and feeding. It’s not yet polished or simple enough to deploy everywhere. But none of that will matter if, as has happened before, enterprises decide that white boxes running Linux are good enough.
Cumulus Linux on white-box hardware offers a glimpse into what the future of enterprise networking might look like: Lower cost, higher programmability, and greater flexibility and control.
THANKS
Network World gratefully acknowledges the support of Spirent Communications, which supplied its Spirent TestCenter traffic generator/analyzer equipped with HyperMetrics dX2 8-port 40G Ethernet and HyperMetrics dX 32-port 10G Ethernet test modules, along with engineering support, for this project. We conducted all performance and functional tests using Spirent TestCenter.
David Newman, a Network World Test Alliance partner, is president of Network Test in Westlake Village, Calif. He can be reached at dnewman@networktest.com.
This story, "Review: Is white-box switching the future of networking?" was originally published by Network World.
Copyright © 2015 IDG Communications, Inc.