Just a few hours ago a good friend of mine and I took CELT out for a test drive. CELT is a new audio codec created by the Xiph foundation. Well, actually we were discussing continuations over an XMPP audio chat and the quality was a bit bad. I mentioned CELT just as a joke but a few minutes later both of us somehow downloaded and compiled the CELT codec.
Basically the idea was the following:
- Get a sound stream from the microphone
- Pipe it through the CELT encoder
netcatto transfer the encoded sound stream over the internet
- Pipe the received sound stream through the CELT decoder
- Play the decoded sound on the other computer
Well, it was just for fun but it did work extraordinary well. We grabbed the mic output with the
parec command and
pacat plays back such a recorded stream. So this command basically plays back whatever your mic receives:
parec --latency-msec=1 | pacat --latency-msec=1
--latency-msec option makes sure the latency of pulse audio is not to high. If you omit it pulse audio will choose a
very high latency (about 2 seconds in our case). Therefore we just set it to a very small value.
Now we put CELT into the game. Fortunately the CELT library form the Xiph foundation already contains a fully functional
encoder and decoder program.
celtenc in the
tools directory of the library is the encoder,
celtdec the decoder. We
both just compiled the library (with
make) and used the encoder and decoder directly from the tools
directory. With CELT in the game our local test command looked like that:
parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \ ./celtdec --stereo - - | pacat --latency-msec=1
Now we encode and directly decode the sound stream with CELT. The only thing to watch out for here is the
parameter of the CELT tools. The pulse audio tools use 2 channels by default so the
--stereo options is needed to match
that. Alternatively you can use the
--channels=1 parameter of
pacat to make pulse audio just use one
channel and then omit the
Anyway, more important is the bitrate. We played around a bit and 50 seems to be quite ok. Lower bitrates still sound good but increased the latency in our case. Probably not because of the codec but because of the audio frame size. We tried to reduce this but this did not work. Anyway 50 KBit/s was a vaule we were happy with.
Now the most important part of all: move the data over the internet. Well, thanks to
netcat that is a piece of cake. We first
did it in one direction and later on extended it into a real bidirectional voice chat.
For the one who wants to listen:
netcat -l 50123 | \ ./celtdec --stereo - - | pacat --latency-msec=1
And the one how likes to send its mic sound:
parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \ netcat [target-ip] 50123
Basically we just put a
netcat pair between the
[target-ip] is the IP address of the
50123 is just a random port number, you can use whatever port you like. Another idea here is, that
the listener starts the
netcat server and receives the encoded sound stream.
Now, lets make it two way. So it will actually be useful for something. An easy attempt would be to do the same two commands again. Just in the other direction. That is also the best solution we could come up with.
However it seems a bit wasted to use two connections when we just send one data stream and receive another one. We
can use one
netcat connection for that:
parec --latency-msec=1 | ./celtenc --stereo - - | \ netcat -l 127.0.0.1 50123 | \ ./celtdec --stereo - - | pacat --latency-msec=1
parec --latency-msec=1 | ./celtenc --stereo - - | \ netcat 127.0.0.1 50123 | \ ./celtdec --stereo - - | pacat --latency-msec=1
Now the commands get a bit more complex. We send one encoded sound stream into
netcat and decode and play back
the received sound stream. I couldn't resist to draw a nice little diagram of this setup:
The dashed lines represent CELT encoded data, while normal lines are raw uncompressed audio. The problem with this setup
is the latency. As soon as computer A runs the command
celtenc will start to record and encode data. Even
if no connection has been established yet. The pipe to
netcat will buffer this data as long as possible and when computer B
connects it will first receive all buffered data. For example if computer B runs the command 2 seconds after computer A there
will be a 2 second delay in the sound stream from computer A to B (but only in that direction).
We stopped at that point. It was working quite well and the sound quality was absolutely marvellous. For something hacked together in about 4 hours it's quite well. Most of our time went into working round NAT problems of our routers anyway. Maybe I'll play a bit around with the library and write some basic UDP transport system. The API looks quite easy to work with. :)