August 25, 2004
(Computerworld)
Session Initiation Protocol, or SIP, has been hailed as the key to the convergence kingdom, the global signaling standard that will enable all switches, gateways and phones -- hard and soft -- to talk to one another. However, after years of prominent play in the press and hyperbolic vendor claims, SIP remains virtually invisible in today's VoIP products and IP networks. If SIP is such a thirst-quenching concoction, why isn't anyone drinking?
First and foremost, the standard isn't anywhere near completion yet. Henning Schulzrinne, one of the fathers of SIP, says the job will take another four to six years. More than 100 active drafts are in committee, and the pending features far outnumber the finished ones. If you're looking for enterprise-level voice functionality, today's SIP-based products can't give you much more than, well, a sip.
Ironically, SIP was supposed to accelerate convergence by freeing telephony from the grips of the century-old International Telecommunication Union establishment. The ITU standards process has a reputation for moving at glacial speed, creating endless compromises and excluding individual innovators. SIP was to recast SS7 signaling in an Internet mold and transfer power from the telephony behemoths and back-office switches to users and their desktops. This power-to-the-PC philosophy envisioned PCs subverting the backroom gods and setting up IP-based voice calls with one another directly. In this scenario, the old monolithic standards process would be replaced by a modular one open to virtually anyone with a new idea. Instead of following a comprehensive standard that had to be implemented in its entirety, modules could be developed independently, and vendors could pick and choose among them. That was the theory. In practice, each new module increases SIP's complexity geometrically, and its intended simplicity is evaporating.
The initial iteration of SIP was touted for its simplicity, but that simplicity was deceptive. People soon realized that many essential features had not been addressed yet, and functionality is now being added feature by feature in committees. It might have made more sense for the core group of SIP "fathers" to go back to the drawing board and come up with a more complete standard. Doing things by committee is always a challenge, but a small group can generally get things done much faster than a much bigger and very fragmented one. The open, democratic approach has a lot of emotional appeal, but it isn't very efficient.
Once Burned, Twice Leery
The slow progress and all the hype have made for a bad combination. Vendors jump eagerly on early drafts and implement them, while the committees go on to evolve the standards into something significantly different. The long, painful road to the transfer feature is a case in point. Vendors got burned and are now very leery. It's easy to say they shouldn't have jumped the gun, but SIP won't do anything for anyone if it doesn't get put into products.
When an open standard is evolving rapidly, it's hard to know when product implementations start to make sense, or how to make them at least somewhat futureproof. A particular SIP product has to recognize and respond appropriately to requests from other SIP products and must be able to tolerate requests stemming from enhancements that didn't exist when it was released. SIP implementations must be crafted carefully to prevent unexpected behavior when novel parameters hit them. They must function as encapsulated subsets, within environments their developers could only guess at.
To get a good idea of the challenge SIP is presenting, consider the relatively simple and controlled world of IP Centrex. A current draft standard addressing just 17 basic IP Centrex features is 169 pages long. Why is something that's supposed to be so simple still in a draft stage and yet so cumbersome? The fact that the committee has to go into such punishing detail even at this level shows how difficult and complex SIP is turning out to be.
But the old guard and the new have their work cut out for them. Many common telephony features that business users expect are still missing from SIP products, including intercom capabilities, call parking, music on hold and six-way conferencing. Initial SIP products deliver only a limited subset of the functionality available from time division multiplexing (TDM) platforms while creating management and security problems. Because of this, 99% of the IP phones in use today are not SIP-based.
Feature-rich, PBX-specific phones currently dominate the enterprise, and today's SIP phones can't begin to fill their shoes. Trying to make SIP phones mimic some of the traditional functionality involves nonstandard technology and a very nonintuitive interface. Users would have to memorize and employ flash hooks and star codes instead of simply pushing the button they're accustomed to.
And the ingredients of the SIP endpoints aren't all that's lacking. If you know enough or have the right tools, it's easy enough to place a call from your Windows XP system. It does contain a SIP stack. However, receiving a call from another SIP client is a very different matter. In today's world, telephony addresses are phone numbers, and that SIP stack doesn't include one. Phone numbers are provided by the traditional, regulated telephony world.
Phone numbers are just one of the roadblocks SIP hits when it moves between the enterprise and service providers. For example, carriers still use SS7 signaling to exchange voice traffic. There are also security issues that prevent direct communication between IP-based carrier services and enterprise IP PBX platforms. Instead, equipment at the edge converts VoIP into some other protocol, moves it to its destination and then reconstructs it. Phone service will eventually be available as configurable software, but right now enterprises can't buy SIP trunks. Underneath it all, we have the same TDM network with its old-fashioned protocols, and SIP doesn't solve that problem. And even it did, it would open up gaping security holes.
Another security issue slowing SIP deployment by service providers is firewall and Network Address Translation (NAT) traversal. SIP-based products that wow people in the laboratory are rendered useless in the interconnected real world because service provider customers have been "NAT'ed." In theory, SIP devices are IP devices, and each should have a unique IP address. In practice, multiple devices share a single IP address, courtesy of NATs. This has created an opportunity for a whole host of session border controller vendors that are scrambling to come up with solutions. Meanwhile, NATs have SIP effectively shackled.
Current realities also prevent SIP from lowering the cost of voice services. Today, everything is basically a telephone line connected to an infrastructure built 40 years ago. Voice may get turned into bits, but those bits still run across a TDM network, and the cost of any SIP-based service has to reflect this fact. If SIP is to live up to all the hype, it needs to be operating in a native IP network -- not something running on top of TDM.
Upside-Down Architecture
SIP's architecture pretty much turns enterprise telephony philosophy on its head. In the hierarchical PBX world, the smarts are in the servers, and the endpoint devices are dumb. While this arrangement has its limitations, it helps keep users out of trouble. In the peer-to-peer SIP world, all SIP entities are smart, and the bulk of the features must be implemented in the SIP endpoints. This gives users plenty of rope with which to hang themselves -- and perhaps the entire enterprise. What enterprises need is an orderly telecommunications community, not a cadre of all-powerful users. Until SIP acquires the necessary discipline, Media Gateway Control Protocol (MGCP) is much better suited to enterprise VoIP.
SIP architecture also creates problems for computer telephony integration and its essential element of third-party call control. In CTI applications, you use your desktop or handheld device to direct calls. When this process is based on SIP, the computer must be inserted in the path of the SIP endpoint. This computer runs a SIP server that watches the SIP phone constantly and always knows its state. These functions must be performed with at least 99.999% reliability -- which precludes today's handhelds and smart phones, due to their limited battery life. If the computer chokes, your phone is broken.
In general, SIP's large footprint makes implementation in resource-constrained smart phones and handhelds very difficult. The leading manufacturers of 802.11 phones can't even do SIP for show right now. Again, MGCP is a better fit because it's much less computationally intensive.
SIP is no panacea, no matter how good it eventually becomes. Much of the important functionality a phone or other endpoint requires is outside the scope of SIP. When you plug in a SIP phone and it just starts working, it's not because of what SIP does. Other essential ingredients include self-configuration capabilities, a good display, the right phone function keys and dialing plans. You can't expect the typical user to point a browser at the phone and set up the proper parameters. If SIP-based telephony is really to take off, it can't make phone usage harder for people. Interacting with SIP phones has to be very easy -- virtually mindless. Phones can't be little computers that pass a lot of the complexity on to the end users.
SIP does have enormous potential and is doubtless the way of the future. The vendor community is laying the foundation, and there is a SIP stack in every Windows XP machine. SIP is a sleeping giant, but its current state is more comparable to an infant than to a mature adult. At the moment, SIP is much like one of those celebrities who is basically famous for being famous.
Ed Basart is chief technology officer at IP telephony company ShoreTel in Sunnyvale, Calif.