"Riding the VoIP Wave" ACM Queue Magazine, September 2004

Background and Summary By: Patty Jablonski

Background, Characteristics and Advantages of VoIP:

The traditional telephone network, commonly known as the Public Switched Telephone Network (PSTN), is circuit-switched and uses Time Division Multiplexing (TDM). TDM works by placing multiple calls on the T1 line and because it is circuit-switched, an actual circuit is created between the endpoints when the connection is made. This can be problematic because the same path and same bandwidth are used regardless of the content of the phone call. For example, a pause in conversation still sends 64 Kbps across the line.

Our voices as we speak to one another are considered analog signals. Analog signals are represented as a sinusoidal wave. The phone lines that carry voices from telephone to telephone are traditionally analog. The problem with an analog signal is that of noise interference. There is an increase in noise with distance traveled, which is why we typically use amplifiers (analog) and repeaters (digital) to increase signal strength over a longer distance.

The trend, however, has been from analog to digital signals. We see this more and more in communications - with the introduction of computers, digital cable, digital satellite, and digital subscriber lines (DSL). Unlike analog signals, digital signals are typically represented as a 0 or a 1, meaning that there is either no voltage or a high voltage level. As you can see, this type of signal can better distinguish between the signal and the noise interference because it is known that the signal should be either 1 or 0 and no other value in between.

Digital signals are now the reality, and it is predicted that the next trend will be from digital signals to voice over IP (VoIP). VoIP uses digital signal processing (DSP) technology and an inexpensive network interface chip set, providing an overall lower cost. More specifically, VoIP takes advantage of the Internet Protocol (IP) to make the communication transfer possible. The Internet Protocol (IP) is known for its ability to fragment large amounts of data and then reassemble it at the receiver end. Notice that the word 'data' is used here and not just 'voice'. This is important because it is becoming increasingly popular to send text, voice, and video over phones, and it is known that VoIP technology is able to handle all of these various types of transfers with the use of a data network.

The VoIP network is packet-switched, as opposed to circuit-switched like the PSTN. As the name implies, a packet-switched network transfers 'packets' or chunks of data from the sender to the receiver. A packet-switched network can be more advantageous than a circuit-switched network because different routes can be used each time a chunk of data is sent, therefore data can still reach its destination in network failure situations. In a circuit-switched network, once the circuit connection has failed, the destination becomes unreachable. Additionally, with packet-switched networks only useful data is sent and received and only the necessary bandwidth is used. In the case of a paused conversation in the packet-switched network, no data is sent and no bandwidth is utilized.

It is clear that Voice over IP has numerous advantages over the existing analog and digital networks. With VoIP, there may one day be a single communication system to tie together people from all over the world with lower costs to consumers.

Definitions and Details of the VoIP Architecture:

A typical telephone call consists of two types of traffic - bearer and signaling. Bearer is the actual voice being sent over the network. Signaling is the information necessary for successful setup and teardown of the call, which includes things like dual-tone multi-frequency DTMF (touch tone phone sounds that tell what number you are dialing), fax, modem sounds, and even voice, in the form of voice mail and interactive voice response (IVR) (when you are asked to press or say your account number in an automated response system, for example). It is important that signal and bearer traffic are kept separate so that there is no interference between the two.

As already seen, VoIP uses the Internet Protocol (IP), but there are many more protocols that it uses in order to successfully transfer data. First, voice quality is extremely sensitive to the amount of delay, packet loss and bandwidth available in the system. Because of this, VoIP uses the Realtime Protocol (RTP) for voice transmission because it is important that there is no delay in voice transfer (that it is received in realtime). The RTP protocol basically "rides" on top of User Datagram Protocol (UDP) and provides sequence numbers and timestamps necessary for the ordering of packets at the receiver. To limit "jitter" (variability in delay) on the receiver end, the packets are buffered and played back at a constant rate. Furthermore, to ensure voice quality and manage the number of simultaneous calls that are going on, the Reservation Protocol (RSVP) can be used to reserve bandwidth through the network.

The VoIP architecture consists of a Media Gateway Controller (MGC) that supervises calls and services from end to end. The MGC has Media Gateways (MG)s that are the connection between the PSTN and the IP network (IP network - Media Gateways - PSTN). These gateways actually create, modify and destroy connections as instructed by the MGC. The controllers and gateways interact over a control plane via the MEGACO Protocol (RFC 3525), previously the Media Gateway Control Protocol (MGCP). The media controllers interact with their peers using the Session Initiation Protocol (SIP), which is a text-based messaging protocol whose roots are in Hypertext Transfer Protocol (HTTP). (There is another protocol that can be used instead of SIP, named H.323, but it will not be discussed further in this paper). SIP is used to initiate communication sessions between users (its messages are session-specific). The media gateways first receive the initial DTMF signals from the user and convert them to SIP messages for the IP-based application servers to understand. They then convert the voice payload that follows to RTP packets to be used by the media processors.

Private Branch Exchanges (PBX)s:

A Private Branch Exchange (PBX) is similar in concept to a network Autonomous System (AS). PBXs are generally purchased and maintained by large organizations. An organization's PBX is a system that provides communications to employees of that business. This business would have its own phone number prefix that belongs to the organization and with the PBX its employees can simply be reached by an extension within the organization. Calls within the system are kept internally and external calls are routed through gateway interfaces. The PBX provides basic features, including the ability to transfer a call and the ability for the operator to assign extensions to its employees, and can also support more advanced features like conferencing, music on hold, announcements, voice mail and interactive voice response (IVR).

Traditional PBXs contain the following components:

  1. Control Processor - to run the software that operates the system features

  2. Communication Software - runs on the control processor; drives all system components and determines the function that it provides

  3. Endpoints - phones, fax machines, modems, PDAs, etc.; used to access the features and functions of the system

  4. Modules - shelves; house the interface cards that provide endpoints or gateway interfaces

  5. Inter-Module Switching - allows interconnection of ports in different modules (traditionally via circuit-switching)

  6. Voice Network - creates a voice path between devices

  7. Control Network - so that components can communicate with each other to implement system operations

New PBXs, called IP-PBXs, are different from traditional PBXs in that (1) they distribute their components farther apart, (2) they incorporate more off-the-shelf components, and (3) they use the IP network to transmit both control information and voice.

New IP-PBXs contain the following components:

  1. Control Processor - off-the-shelf server that runs application software on Microsoft, UNIX, Linux, etc.

  2. Endpoints - IP phones; have their own IP address using DHCP, uses FTP to update firmware, may be put on special VLAN; phone gets IP address and address of application server and "registers" itself

  3. Gateway Interfaces - needed to access the PSTN

  4. Switching Functions - needed to limit the number of calls across the bandwidth (a limited resource) or else voice quality will degrade

  5. Media Processing - may be provided

  6. System Architecture - client-server architecture, where the application server is the server and the gateways and endpoints (phones, fax machines, modems, etc.) are the clients

  7. IP Network - used to transmit both control information and voice; provides reliability by providing multiple paths from device to device

VoIP Security:

Voice over IP is not easy to secure. Its flexible, open and distributed design with no central entity makes security difficult and similar to that of securing the Internet. Having the VoIP network interconnected to the PSTN adds another complexity to the securing of the VoIP network. End users who misconfigure their services can/will result in security vulnerabilities as well. Attacks to VoIP can happen anywhere from within the underlying network, transport protocols, VoIP devices, application, other applications (DHCP, etc.), underlying operating systems, and more.

Security concerns include:

  • Confidentiality - much harder to "tap" an Internet conversation (physically gain access to the line); with VoIP, voice can flow over many different physical networks

    • Implement end-to-end security (strong encryption)

  • Integrity - harder to spoof or modify voice because of voice recognition

    • Use public/private key encryption, but it is difficult to be sure that person with certificate is who they claim to be

  • Availability - denial of service (DOS) and distributed denial of service (DDOS) attacks - significant threat

    • Unable to identify instigator of DOS attack on the Internet

  • Authentication/Authorization - authentication (who you are), authorization (capabilities/access control)

    • Need to authenticate the 'person', not the 'device'; authenticate using username/password, biometrics, smart cards, keys (not IP or MAC address, etc.); helps address VoIP SPAM, spyware

  • Secure Gateways - the gateways between VoIP and PSTN need to be secure or else the closed and isolated PSTN may be threatened

Design and implementation concerns include:

  • Software stability, robustness, interoperability (operate between vendors)

  • Codecs - voice encoding algorithms are used in VoIP

    • G.711 (64 Kbps encoding), G.729 (8 Kbps encoding)

  • VoIP implementations separate voice transport (RTP), signaling (SIP) and service creation from one another

  • SIP relies on other protocols for security:

    1. IPSec (IP Security)
    2. TLS (Transport Layer Security)
    3. S/MIME (Secure Multipurpose Internet Email Extensions)
    4. HTTP Digest Authentication

  • SIP does not provide encryption for the media (RTP) stream

  • Users have more control over their voice service with VoIP, unlike with the PSTN

The content of this paper was inspired by the articles in the September 2004 issue of ACM Queue Magazine entitled "Riding the VoIP Wave".