Arkanis A blog about random stuff, but mostly programming.

Crazy experiments with CELT

celt, voice, linux

Just a few hours ago a good friend of mine and I took CELT out for a test drive. CELT is a new audio codec created by the Xiph foundation. Well, actually we were discussing continuations over an XMPP audio chat and the quality was a bit bad. I mentioned CELT just as a joke but a few minutes later both of us somehow downloaded and compiled the CELT codec.

Basically the idea was the following:

Well, it was just for fun but it did work extraordinary well. We grabbed the mic output with the parec command and pacat plays back such a recorded stream. So this command basically plays back whatever your mic receives:

parec --latency-msec=1 | pacat --latency-msec=1

The --latency-msec option makes sure the latency of pulse audio is not to high. If you omit it pulse audio will choose a very high latency (about 2 seconds in our case). Therefore we just set it to a very small value.

Now we put CELT into the game. Fortunately the CELT library form the Xiph foundation already contains a fully functional encoder and decoder program. celtenc in the tools directory of the library is the encoder, celtdec the decoder. We both just compiled the library (with ./configure and make) and used the encoder and decoder directly from the tools directory. With CELT in the game our local test command looked like that:

parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
./celtdec --stereo - - | pacat --latency-msec=1

Now we encode and directly decode the sound stream with CELT. The only thing to watch out for here is the --stereo parameter of the CELT tools. The pulse audio tools use 2 channels by default so the --stereo options is needed to match that. Alternatively you can use the --channels=1 parameter of parec and pacat to make pulse audio just use one channel and then omit the --stereo option.

Anyway, more important is the bitrate. We played around a bit and 50 seems to be quite ok. Lower bitrates still sound good but increased the latency in our case. Probably not because of the codec but because of the audio frame size. We tried to reduce this but this did not work. Anyway 50 KBit/s was a vaule we were happy with.

Now the most important part of all: move the data over the internet. Well, thanks to netcat that is a piece of cake. We first did it in one direction and later on extended it into a real bidirectional voice chat.

For the one who wants to listen:

netcat -l 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

And the one how likes to send its mic sound:

parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
netcat [target-ip] 50123

Basically we just put a netcat pair between the celtenc and celtdec programs. [target-ip] is the IP address of the listener computer. 50123 is just a random port number, you can use whatever port you like. Another idea here is, that the listener starts the netcat server and receives the encoded sound stream.

Now, lets make it two way. So it will actually be useful for something. An easy attempt would be to do the same two commands again. Just in the other direction. That is also the best solution we could come up with.

However it seems a bit wasted to use two connections when we just send one data stream and receive another one. We can use one netcat connection for that:

Computer A:

parec --latency-msec=1 | ./celtenc --stereo - - | \
netcat -l 127.0.0.1 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

Computer B:

parec --latency-msec=1 | ./celtenc --stereo - - | \
netcat 127.0.0.1 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

Now the commands get a bit more complex. We send one encoded sound stream into netcat and decode and play back the received sound stream. I couldn't resist to draw a nice little diagram of this setup:

The data flow of the sound streams between the programs and over the internet

The dashed lines represent CELT encoded data, while normal lines are raw uncompressed audio. The problem with this setup is the latency. As soon as computer A runs the command parec and celtenc will start to record and encode data. Even if no connection has been established yet. The pipe to netcat will buffer this data as long as possible and when computer B connects it will first receive all buffered data. For example if computer B runs the command 2 seconds after computer A there will be a 2 second delay in the sound stream from computer A to B (but only in that direction).

We stopped at that point. It was working quite well and the sound quality was absolutely marvellous. For something hacked together in about 4 hours it's quite well. Most of our time went into working round NAT problems of our routers anyway. Maybe I'll play a bit around with the library and write some basic UDP transport system. The API looks quite easy to work with. 🙂

react

nice meh bad surprised confused agree disagree

Comments

Newsfeed