Voice Over Internet protocol (VOIP)


What is VoIP?


VoIP stands for 'V'oice 'o'ver 'I' ernet 'P'r ocol. As the term says VoIP

tries to let go voice (mainly human) through IP packets and, in defi ti

through Internet. VoIP can use accelerating hardware to achieve this purpose

and can also be used in a PC environment.

How does it work?

Many years ago we discovered that sending a signal to a remote

destination could have be done also in a digit fashion: before sending it we

have to digit ize it with an ADC (analog to digit converter), ransmitt, and

at the end transform it again in analog format with DAC (digit to analog

converter) to use it.

VoIP works like that, digit izing voice in data packets, sending them

and reconverting them in voice at destination.

Digit format can be better controll we can compress it, oute it,

convertt to a new better format, and so on; also we saw that digit signal is

more noise tolerant than the analog one (see GSM vs TACS).

TCP/IP networks are made of IP packets containing a header (to control

communication) and a payload to transport data: VoIP use it o go across the

network and come to destination.

Voice (source) - - ADC - - - - Internet - - - DAC - - Voice (dest)

W hat is the advantages using VoIP rather PSTN?

When you are using PSTN li you typically pay for time used to a

PSTN line manager company: more time you stay at phone and more you'l

pay. In addition you couldn't alk with other that one person at a time.

In opposite with VoIP mechanism you can talk all he time with every

person you want (the needed is that other person is also connected to Internet

at the same time), as far as you want (money independent) and, in additi

you can talk with many people at the same time.

If you're stil not persuaded you can consider that, at the same time,

you can exchange data with people are you talking wit sending images,

graphs and videos.

Then, why everybody doesn't use it yet?

Unfortunately we have to report some problem with the integrati

between VoIP architecture and Internet. As you can easy imagine, voice data

communication must be a real time stream (you couldn't speak, waitor many

seconds, then hear other side answering): this is in contrast with the Internet

heterogeneous architecture that can be made of many routers (machines that

route packets), about 20-30 or more and can have a very high round tri ime

(RTT), so we need to modify something to get it properly working.

In next sections we'llry to understand how to solve this great problem.

In general we know that is very diffi t to guarantee a bandwidth in Internet

for VoIP appli ion.

Technical info about VoIP

Here we see some important info about VoIP, needed to understand it.


Overview on a VoIP connection

To setup a VoIP communication we need:

1.Fir the ADC to convert analog voice to digit signals (bit


2.Now the bits have to be compressed in a good format for transmission:

there is a number of protocols we'l see after.

3.Here we have to insert our voice packets in data packets using a real-

time protocol (typically RTP over UDP over IP)

4.We need a signaling protocol to call users: ITU-T H323 does that.

5.At RX we have to disassemble packets, extract datas, then convert

them to analog voice signals and send them to sound card (or phone)

6.All hat must be done in a real time fashion cause we cannot wait or

too long for a vocal answer!

Base architecture

Voice )) ADC - Compression Algorithm - Assembling RTP in TCP/IP -






Voice (( DAC - Decompress. Algorithm - Disass. RTP from TCP/IP -


Analog to Digital Conversion

This is made by hardware, typically by card integrated ADC.

Today every sound card allows you convert with 16 bit a band of 22050 Hz

(f sampling it you need a freq of 44100 Hz for Nyquist Principle) obtaining a

throughput of 2 bytes * 44100 (samples per second) = 88200 Bytes/s, 176.4

kBytes/s for stereo stream.

For VoIP we needn't a 22 kHz bandwidth (and also we needn't 16 bit!): next

we'l see other coding used for it.

Compression Algorithms

Now that we have digit data we may convert it o a standard format that

could be quickly transmitted.

PCM, Pulse Code Modulation, Standard ITU-T G.711

Voice bandwidth is 4 kHz, so sampling bandwidth has to be 8 kHz (for


We represent each sample with 8 bit (having 256 possible values).

Throughput is 8000 Hz *8 bit = 64 kbit/s, as a typical digital phone line.

In real application mu-law (North America) and a-law (Europe) variants

are used which code analog signal a logarithmic scale using 12 or 13

bits instead of 8 bits (see Standard ITU-T G.711).

ADPCM, Adaptive diff enti PCM, Standard ITU-T G.726

It converts only the diff ence between the actual and the previous voice

packet requiring 32 kbps (see Standard ITU-T G.726).

LD-CELP, Standard ITU-T G.728

CS-ACELP, Standard ITU-T G.729 and G.729a

MP-MLQ, Standard ITU-T G.723.1, 6.3kbps, True speech

ACELP, Standard ITU-T G.723.1, 5.3kbps, True speech

LPC-10, able to reach 2.5 kbps!!



This last protocols are the most important cause can guarantee a very low

minimal band using source coding; also G.723.1 codecs have a very high MOS

(Mean Opinion Score, used to measure voice fidelit but attention to

elaboration performance required by them, up to 26 MIPS!

RTP Real Time Transport Protocol

Now we have the raw data and we want to encapsulate it nto TCP/IP stack.

We follow the structure:

VoIP data packets




I, II layers

VoIP data packets live in RTP (Real-Time Transport Protocol) packets, which

are inside UDP-IP packets.

Fir , VoIP don't use TCP cause it s too heavy for real time appli ion, so

instead UDP (datagram) is used.

In UDP we cannot ordering packets in arrive time (which is a must in VoIP)

because there isn't onnection idea, each packet is independent from others

(datagram concept); so we have to introduce a new protocol, such as RTP,

able to manage this.

The following figure gives the structure of RTP implemented in VOIP.

Real Time Transport Protocol

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1


| V=2|P|X| CC |M| PT | sequence number |


| timestamp |


| synchronization source (SSRC) identifier |


| contributing source (CSRC) identifiers |

| . |



V indicates the version of RTP used

P indicates the padding, a byte not used at bottom packet to reach the

parity packet dimension

X is the presence of the header extension

CC field is the number of CSRC identifiers following the fixed header.

CSRC fi d are used, for example, in conference case.

M is a marker bit

PT payload type


There are also other protocols used in VoIP, like RSVP, that can manage

Qualiy of Service (QoS).

RSVP is a signaling protocol that requests a cert n amount of bandwidth and

latency in every network hop that supports it.


Quality of Service (QoS)


We said many times that VoIP appli ions require a real-time data streaming

cause we expect an interactive data voice exchange.

Unfortunately, TCP/IP cannot guarantee this kind of purpose, it ust make a '

best eff t' to do it. So we need to introduce tricks and polies that could

manage the packet flow in EVERY router we cross.

So here are:

1.TOS fi d in IP protocol to describe type of service: high values indicate

low urgency while more and more low values bring us more and more

real-time urgency

2.Queuing packets methods:

1.FIFO (Fir in Fir Out), he more stupid method that allows

passing packets in arrive order.

2.WFQ (Weighted Fair Queuing), consisting in a fair passing of

packets (f example, FTP cannot consume all avail e

bandwidth), depending on kind of data flow, typically one packet

for UDP and one for TCP in a fair ashion.

3.CQ (Custom Queuing), users can decide pri it

4.PQ (Pri ity Queuing), here is a number (typically 4) of queues

with a pri ity level each one: fir , packets in the fir queue are

sent, hen (when fir queue is empty) starts sending from the

second one and so on.

5.CB-WFQ (Class Based Weighted Fair Queuing), ike WFQ but,n

additi we have classes concept (up to 64) and the bandwidth

value associated for each one.

3.Shaping capabily, that allows to li t the source to a fixed bandwidth



4.Congestion Avoidance, like RED (Random Early Detection).

H323 Signaling Protocol

H323 protocol is used, for example, by Microsoft Net meeting to make VoIP


This protocol allow a vari y of elements talking each other:

1.Terminals, clients that initi ize VoIP connection. Although terminals

could talk together without anyone else, we need some additional

elements for a scalable vision.

2.Gatekeepers, that essenti ly operate:

1.Address translation service, to use names instead IP addresses

2.Admission control,o allow or deny some hosts or some users

3.Bandwidth management

3.Gateways, points of reference for conversion TCP/IP - PSTN.

4.Multipoint Control Units (MCUs) to provide conference.

5.Proxies Server also is used.

h323 allows not only VoIP but also video and data communications.

Concerning VoIP, h323 can carry audio codecs G.711, G.722, G.723, G.728

and G.729 whil or video it supports h261 and h263.

You can find it mplemented in various appli ion software like Microsoft

Netmeeting , Net2Phone , DialPad , and also in freeware products you can

find at Openh323 Web Site .



Hardware requirement

To create a litle VoIP system you need the following hardware:

1.PC 386 or more

2.Sound card, full duplex capable


3.a network card or connection to internet or other kind of interface to

allow communication between 2 PCs

All that has to be present twice to simulate a standard communication.

The tool above are the minimal requirement for a VoIP connection: next we'l

see that we should (and in Internet we must) use more hardware to do the

same in a real situation.

Sound card has be full duplex unless we couldn't hear anything whil


Hardware accelerating cards

We can use special cards with hardware accelerating capabiliy. Two of them

(and also the only ones directly managed by the Linux kernel at this moment)

are the

1.Quicknet PhoneJack

2.Quicknet LineJack

Quicknet PhoneJack is a sound card that can use standard algorithms to

compress audio stream like G723.1

It can be connected directl o a phone (POTS port) or a couple mic-speaker.

It has a ISA or PCI connector bus.

Quicknet LineJack works like PhoneJack with some addition features (see


For more info see Quicknet web site .

Hardware gateway cards

Quicknet LineJack can be connected to a PSTN line allowing VoIP gateway


Then you'l need software to manage it see after).

Software requirement

We can choose what O.S. to use:



Under Win9x we have Microsoft Netmeeting, Internet Phone, DialPad or others

or Internet Switchboard (from Quicknet web site )<>

for Quicknet cards.

Also you can use free software you download from OpenH323 .

Under Linux we only have free software from OpenH323 web sit simph323

or ohphone that can also work with Quicknet accelerating hardware.

Attention: all Openh323 source code has to be compiled in a user directory (if

not it s necessary to change some environment variable). You are warned



that compiling time could be very high and you could need a lot of RAM to

make itn a decent time.

Gateway software

To manage gateway feature (join TCP/IP VoIP to PSTN lines) you need some

kind of software like this:

Internet SwitchBoard ( for Windows systems

also acting as a h323 terminal;

PSTNGw for Linux and Windows systems you download from OpenH323

. (

Gatekeeper software

You can choose as gatekeeper:

1.Opengatekeeper, you can download from opengatekeeper web site

< >for Linux and Win9x.

Other software

In addition I report some useful software h323 compli :

Phonepatch, able to solve problems behind a NAT fiewall. It simply

allows users (external or internal) calling from a web page (which is

reachable from even external and internal users): when web appli ion

understands the remote host is ready, it call h323) the source telli

it alls ok and communication can be established.


In this section we try to setup VoIP system, simple at fir , then more and

more complex.

Simple communication: IP to IP

A (Win9x+Sound card) - - - B (Win9x+Sound card) - - - calls

A and B should:

1.have Microsoft Netmeeting (or other software) installed and properly


2.have a network card or other kind of TCP/IP interface to talk each other.

In this kind of view A can make a H323 call o B (if B has Netmeeti

active) using B IP address. Then B can answer to itf it wants. After accepti

call VoIP data packets start o pass.

Using names

If you use Microsoft Windows in a lan you can call he other side using

NetBIOS name. NetBIOS is a protocol that can work (stand over) with NetBEUI

low level protocol and also with TCP/IP. It is only need to call the ' computer

name' on the other side to make a connection.

A - - - B - - -


John - - - Alice

John calls Alice.


This is possible cause John call request to Alice is converted to IP callng by

the NetBIOS protocol.

The above 2 examples are very easy to implement but aren't scalable.

In a more big view such as Internet it s impossible to use direct callng

cause, usuall the call s don't know the destination IP address. Furthermore

NetBIOS naming feature cannot work cause it uses broadcast messages, which

typically don't pass ISP routers .

Internet calling using a W INS server

The NetBIOS name callng idea can be implemented also in a Internet

environment, using a WINS server: NetBIOS clients can be configured to use a

WINS server to resolve names.

PCs using the same WINS server wil be able to make direct callng between


A (WINS Server is S) - - - - I - - - - B (WINS Server is S)



E - - - - - S (WINS Server)

C (WINS Server is S) - - - - R


E - - - - D (WINS Server is S)


Internet communication

A, B, C and D are in diff ent subnets, but they can call each other in a

NetBIOS name callng fashion. The needed is that all e using S as WINS


Note: WINS server hasn't very high performance cause it use NetBIOS feature

and should only be used for joining few subnets.

A big problem: the masquering.

A problem of few IPs is commonly solved using the so called masqueri

(also NAT, network address translation): there is only 1 IP public address (that

Internet can directly ' see' ), he others machines are ' masqueraded' using

all this IP.

A - - -

B - - - Router with NAT - - - Internet

C - - -

This doesn't work

In the example A,B and C can navigate, pinging, using mail and news

services with Internet people, but they CANNOT make a VoIP call This

because H323 protocol send IP address at appli ion level, so the answer will

never arrive to source (that is using a private IP address).




there is a Linux module that modifies H323 packets avoiding this

problem. You can download the module here . To install it you have to

copy it o source directory specifi modify Makefie and go compilng

and installng module with ' modprobe ip_masq_h323' . Unfortunately

this module cannot work with ohphone software at this moment (I don't

know why).

A - - - Router with NAT

B - - - + - - - Internet

C - - - ip_masq_h323 module

This works

A - - -

B - - - PhonePatch - - - Internet

C - - -

This works

Using Linux

With Linux (as an h323 terminal) you can experiment everything done


O h p h o n e Sy n ta x

Syntax is:

' ohphone -l|--listen [options]'

' ohphone [options]. . address'

' -l' , listen to standard port (1720)

' address' , mean that we don't wait for a call, but we connect to '

address' host

' -n' , ' --no-gatekeeper' , this is ok if we haven't a gatekeeper

' -q num' , ' --quicknet num' , it uses Quicknet card, device


' -s device' , ' --sound device' , it uses /dev/device sound device.

' -j delay' , ' --jitter delay' , it change delay buffer to ' delay' .

Also, when you start ohphone, you can give command to the interpreter

directl like decrease AEC, Automatic Echo Cancell ion).

Setting up a gatekeeper

You can also experiment gatekeeper feature



(Terminal H323) A - - -

(Terminal H323) B - - - D (Gatekeeper)


(Terminal H323) C - - -

Gatekeeper configuration

1.Hosts A,B and C have gatekeeper setting to point to D.


2.At start ime each host tells D own address and own name (also with

aliases) which could be used by a call to reach it.

3.When a terminal asks D for an host, D answers with right IP address, so

communication can be established.

We have to notice that the Gatekeeper is able only to solve name in IP

address, it couldn't oin hosts that aren't eachable each other (at IP level),n

other words it couldn't act as a NAT router.

Program has only to be launch with -d (as daemon) or -x (execute)


Setting up a gateway

As we said, gateway is an entity that can join VoIP to PSTN lines allowing us to

made callrom Internet to a classic telephone. So, in addition, we need a card

that could manage PSTN lines: Quicknet LineJack does it.

From OpenH323 web site we download:

1.driver for Linejack

2.PSTNGW appli ion to create our gateway.

If executable doesn't work you need to download source code and openh323

li ary, <code.html> then install l in a home user directory.

After that you only need to launch PSTNGw to start your H323 gateway.

