Just a few hours ago a good friend of mine and I took CELT out for a test drive. CELT is a new audio codec
created by the Xiph foundation. Well, actually we were discussing continuations over an XMPP audio chat
and the quality was a bit bad. I mentioned CELT just as a joke but a few minutes later both of us somehow downloaded
and compiled the CELT codec.
Basically the idea was the following:
- Get a sound stream from the microphone
- Pipe it through the CELT encoder
- Use
netcat
to transfer the encoded sound stream over the internet
- Pipe the received sound stream through the CELT decoder
- Play the decoded sound on the other computer
Well, it was just for fun but it did work extraordinary well. We grabbed the mic output with the parec
command and
pacat
plays back such a recorded stream. So this command basically plays back whatever your mic receives:
parec --latency-msec=1 | pacat --latency-msec=1
The --latency-msec
option makes sure the latency of pulse audio is not to high. If you omit it pulse audio will choose a
very high latency (about 2 seconds in our case). Therefore we just set it to a very small value.
Now we put CELT into the game. Fortunately the CELT library form the Xiph foundation already contains a fully functional
encoder and decoder program. celtenc
in the tools
directory of the library is the encoder, celtdec
the decoder. We
both just compiled the library (with ./configure
and make
) and used the encoder and decoder directly from the tools
directory. With CELT in the game our local test command looked like that:
parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
./celtdec --stereo - - | pacat --latency-msec=1
Now we encode and directly decode the sound stream with CELT. The only thing to watch out for here is the --stereo
parameter of the CELT tools. The pulse audio tools use 2 channels by default so the --stereo
options is needed to match
that. Alternatively you can use the --channels=1
parameter of parec
and pacat
to make pulse audio just use one
channel and then omit the --stereo
option.
Anyway, more important is the bitrate. We played around a bit and 50 seems to be quite ok. Lower bitrates still sound good
but increased the latency in our case. Probably not because of the codec but because of the audio frame size. We tried to
reduce this but this did not work. Anyway 50 KBit/s was a vaule we were happy with.
Now the most important part of all: move the data over the internet. Well, thanks to netcat
that is a piece of cake. We first
did it in one direction and later on extended it into a real bidirectional voice chat.
For the one who wants to listen:
netcat -l 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1
And the one how likes to send its mic sound:
parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
netcat [target-ip] 50123
Basically we just put a netcat
pair between the celtenc
and celtdec
programs. [target-ip]
is the IP address of the
listener computer. 50123
is just a random port number, you can use whatever port you like. Another idea here is, that
the listener starts the netcat
server and receives the encoded sound stream.
Now, lets make it two way. So it will actually be useful for something. An easy attempt would be to do the same two commands
again. Just in the other direction. That is also the best solution we could come up with.
However it seems a bit wasted to use two connections when we just send one data stream and receive another one. We
can use one netcat
connection for that:
Computer A:
parec --latency-msec=1 | ./celtenc --stereo - - | \
netcat -l 127.0.0.1 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1
Computer B:
parec --latency-msec=1 | ./celtenc --stereo - - | \
netcat 127.0.0.1 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1
Now the commands get a bit more complex. We send one encoded sound stream into netcat
and decode and play back
the received sound stream. I couldn't resist to draw a nice little diagram of this setup:
The dashed lines represent CELT encoded data, while normal lines are raw uncompressed audio. The problem with this setup
is the latency. As soon as computer A runs the command parec
and celtenc
will start to record and encode data. Even
if no connection has been established yet. The pipe to netcat
will buffer this data as long as possible and when computer B
connects it will first receive all buffered data. For example if computer B runs the command 2 seconds after computer A there
will be a 2 second delay in the sound stream from computer A to B (but only in that direction).
We stopped at that point. It was working quite well and the sound quality was absolutely marvellous. For something hacked
together in about 4 hours it's quite well. Most of our time went into working round NAT problems of our routers anyway. Maybe
I'll play a bit around with the library and write some basic UDP transport system. The API looks quite easy to work with. :)