Data Over Voice Over GSM

Background Information

I would like to build a out-of-band management (OOBM) module that does not require Internet connection. This is because I have a remote server deployed in a hostile network environment which is hard to access. The server contains sensitive information. It also requires specialist knowledge to operate. It is not feasible to train a local operator to fix problems, when they occur.

It is possible to use 3G mobile broadband for backup connection. However, I believe that the ISP will not assign a public IPv4 address to the mobile client. (The facility itself does not actually get a public IPv4 address for its broadband connection!) This means that if I do decided to use 3G mobile broadband for backup connection, I have to make sure that the link is always on. I also need to run some kind of VPN services on the 3G mobile broadband connection. This makes the whole solution cost effective.

A dial-in server [1] is one of the feasible solution. However the remote does not have an exclusive phone line. The existing phone line in the facilities is sometimes used for actual phone call. Therefore, I thought about creating some sort of mobile-based dial-in server.

It is impossible to simply connect a dial-up modem on the mobile phone, and expect the dial-up modem to work on the mobile phone's voice channel. This is because dial-up modems expect to work on an analogue line. The voice channel on the mobile network is actually a digital channel [2][3].

This is where DTMF modulation comes in. It is a well-known fact that DTMF modulation is preserved in GSM voice channel. Sometimes a company may ask you to type in numbers, when you call them over the phone, e.g. when you have to type in a top-up voucher. The numbers you type in are modulated under DTMF. You can certainly perform that over a mobile phone.

My overall idea is to build a modem in order to modulate data over GSM voice channel using DTMF. The baud rate might not be very high, however, it should be able to support terminal operations.

In contrast to a normal modem, our modem is not exactly compatible with the Hayes Command Set [4]. In a standard Hayes modem, the modem can switch between command mode and data mode. Our modem runs in both command mode and data mode simultaneously. I envisage that we open up two serial terminal links to the modem, one for command and control, one for data.

The overall block diagram is shown in below:

Transferring data over DTMF

Protocol Design

This modem handles layer 1 and 2 in the OSI model. It is expected that two modems are capable of forming serial links, and PPP connection can be established on top of the serial link.

Layer PDU name Description
1 Symbol The sound of DTMF, the sound represents one of the sixteen digits.
2 Frame A frame can represent a byte.
3 and above N/A Hopefully we can establish some kind of PPP connection over the serial link

DTMF Modulation

DTMF stands for Dual-tone multi-frequency. It was actually designed for in-band telecommunication signalling [5]. More specifically, it is used to modulate the numbers you type on a phone. Numbers are represented as a combination of two sine waves. There are four high frequencies, and four low frequencies. Each row represents low frequency, each column represents high frequency. This means that each sound segment in a DTMF modulated signal can represent 16 symbols in total, transmitting 4 bit of information.

The reference modulation scheme for telephone keypad is summarised in the table below:

1209 Hz 1336 Hz 1477 Hz 1633 Hz
697 Hz 1 2 3 A
770 Hz 7 5 6 B
852 Hz 7 8 9 C
941 Hz * 0 # D

Since we are not really worried about sending ' * ' and ' # ', we should change the symbol each dual-tone represents, in order to improve clarity. The modulation scheme used for this project is the following:

1209 Hz 1336 Hz 1477 Hz 1633 Hz
697 Hz 0 1 2 3
770 Hz 4 5 6 7
852 Hz 8 9 A B
941 Hz C D E F

Transferring a byte over DTMF

DTMF can send a nibble (4-bit) of information in each symbol. Therefore, in order to send out a single byte, we need at least two symbols. However, how do we know which is the first symbol, and which is the second symbol? For example, a partial sound stream contains the string ABC. How do we know whether AB is a byte, or BC is a byte?

The solution is to put one byte into three symbols, and use a distinct set of symbols for the start of the frame. In my current proposed design, each frame has the following format:

An alternative way of looking at this approach is to look at each symbol individually, break each symbol into the bits that it is meant to represent:

Bit Description
0 Start of frame indicator
1 - 3 Data

This means that each frame can transfer 9 bits, we can use the 9th bit for parity purposes. Alternatively we can transfer 9 bytes in a go, however that might lead to interesting byte-padding scenario.

Hardware building plan

To be continued...