Crazy experiments with CELT

August 2nd, 2011 celt, voice, linux

Just a few hours ago a good friend of mine and I took CELT out for a test drive. CELT is a new audio codec created by the Xiph foundation. Well, actually we were discussing continuations over an XMPP audio chat and the quality was a bit bad. I mentioned CELT just as a joke but a few minutes later both of us somehow downloaded and compiled the CELT codec.

Basically the idea was the following:

Get a sound stream from the microphone
Pipe it through the CELT encoder
Use netcat to transfer the encoded sound stream over the internet
Pipe the received sound stream through the CELT decoder
Play the decoded sound on the other computer

Well, it was just for fun but it did work extraordinary well. We grabbed the mic output with the parec command and pacat plays back such a recorded stream. So this command basically plays back whatever your mic receives:

parec --latency-msec=1 | pacat --latency-msec=1

The --latency-msec option makes sure the latency of pulse audio is not to high. If you omit it pulse audio will choose a very high latency (about 2 seconds in our case). Therefore we just set it to a very small value.

Now we put CELT into the game. Fortunately the CELT library form the Xiph foundation already contains a fully functional encoder and decoder program. celtenc in the tools directory of the library is the encoder, celtdec the decoder. We both just compiled the library (with ./configure and make) and used the encoder and decoder directly from the tools directory. With CELT in the game our local test command looked like that:

parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
./celtdec --stereo - - | pacat --latency-msec=1

Now we encode and directly decode the sound stream with CELT. The only thing to watch out for here is the --stereo parameter of the CELT tools. The pulse audio tools use 2 channels by default so the --stereo options is needed to match that. Alternatively you can use the --channels=1 parameter of parec and pacat to make pulse audio just use one channel and then omit the --stereo option.

Anyway, more important is the bitrate. We played around a bit and 50 seems to be quite ok. Lower bitrates still sound good but increased the latency in our case. Probably not because of the codec but because of the audio frame size. We tried to reduce this but this did not work. Anyway 50 KBit/s was a vaule we were happy with.

Now the most important part of all: move the data over the internet. Well, thanks to netcat that is a piece of cake. We first did it in one direction and later on extended it into a real bidirectional voice chat.

For the one who wants to listen:

netcat -l 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

And the one how likes to send its mic sound:

parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
netcat [target-ip] 50123

Basically we just put a netcat pair between the celtenc and celtdec programs. [target-ip] is the IP address of the listener computer. 50123 is just a random port number, you can use whatever port you like. Another idea here is, that the listener starts the netcat server and receives the encoded sound stream.

Now, lets make it two way. So it will actually be useful for something. An easy attempt would be to do the same two commands again. Just in the other direction. That is also the best solution we could come up with.

However it seems a bit wasted to use two connections when we just send one data stream and receive another one. We can use one netcat connection for that:

Computer A:

parec --latency-msec=1 | ./celtenc --stereo - - | \
netcat -l 127.0.0.1 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

Computer B:

parec --latency-msec=1 | ./celtenc --stereo - - | \
netcat 127.0.0.1 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

Now the commands get a bit more complex. We send one encoded sound stream into netcat and decode and play back the received sound stream. I couldn't resist to draw a nice little diagram of this setup:

The data flow of the sound streams between the programs and over the internet

The dashed lines represent CELT encoded data, while normal lines are raw uncompressed audio. The problem with this setup is the latency. As soon as computer A runs the command parec and celtenc will start to record and encode data. Even if no connection has been established yet. The pipe to netcat will buffer this data as long as possible and when computer B connects it will first receive all buffered data. For example if computer B runs the command 2 seconds after computer A there will be a 2 second delay in the sound stream from computer A to B (but only in that direction).

We stopped at that point. It was working quite well and the sound quality was absolutely marvellous. For something hacked together in about 4 hours it's quite well. Most of our time went into working round NAT problems of our routers anyway. Maybe I'll play a bit around with the library and write some basic UDP transport system. The API looks quite easy to work with. 🙂

Comments

Newsfeed

#1 by Boris August 2nd, 2011 14:30

50kbit/s is a really high bitrate for a voice only codec. It should be almost lossless at this rate. For example GSM operates around 13kbit/s.
react

nice meh bad surprised confused agree disagree
- #3 by Garf reply to #1 August 2nd, 2011 15:02
  
  Depends on which GSM standard, audio bandwidth allowed, and latency you can live with.
  
  CELT is meant to carry music as well as voice. The new versions are now called Opus, which combines CELT (music) with SILK (voice).
  
  I believe CELT itself works fine for voice with <50kbps anyway, but as described in the post they ran into an issue and didn't bother looking further at that.
  react
  
  nice meh bad surprised confused agree disagree
- #4 by Anonymus reply to #1 August 2nd, 2011 15:50
  
  50kb/sec is fine, if you are using it for these kinds of applications (http://www.celt-codec.org/):
  
  "Using CELT application developers can build software that allows musicians to perform together across the Internet, or simply build great sounding telephony systems. Why shouldn't your telephone sound as good as your stereo?"
  react
  
  nice meh bad surprised confused agree disagree
- #5 by Stephan reply to #1 August 2nd, 2011 16:22
  
  50 KBit/s sounded lossless for me. We also tried 25 KBit/s and it sounded the same. At least I could not tell the difference. The problem with that however was that lower bitrates added a very noticeable delay to the stream, even without any network. I suppose the reason for this is that we used normal pipes to move the data from the encoder to the decoder. With a low bitrate it takes much longer for the pipe buffer to be filled and handed on to the next process. 50 KBit/s was just a bitrate where this effect was not that much disturbing.
  
  Just for the sake of bitrate I did a small test. I used the following command from above and tested some different bitrates:
```
parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
./celtdec --stereo - - | pacat --latency-msec=1
```
  - 50, 25 and 15 KBit/s sounded the same to me. But well, my ears are not the best.
  - With 10 KBit/s it started to sound a bit dull. Still better than the XMPP voice chat we used before (Empathy with SPEEX8000 I think).
  - 5 KBit/s was really distorted. Still understandable but not really human like.
  Please keep in mind that this was still stereo, not mono. With 5 KBit/s I got a latency of about 14 seconds. With 50 KBit/s it was about 2 seconds. So using pipes is not a very good idea. 😉
  
  Anyway, please keep in mind that we hacked it together in a few hours without really knowing Pulse Audio or CELT. We just did it for fun and not to replace any real voice chat.
  react
  
  nice meh bad surprised confused agree disagree
#2 by Anonymus August 2nd, 2011 14:50

Just use Mumble http://mumble.sourceforge.net/
react

nice meh bad surprised confused agree disagree
#6 by Anonymus August 2nd, 2011 17:06

Bravo for putting together an amazing example of why small unixy tools are totally awesome.
react

nice meh bad surprised confused agree disagree
#7 by jojojo August 2nd, 2011 18:24

Pulseaudio is crap, just use OSS' vmix device.
The demo encoders/decoders are not trimmed for low latency, the libraries work properly with low bitrates. Use an other implementation.
Don't try to be too clever, your bidirectional netcat stuff is not actually worth it. Listen on both receiving sides and you won't have that long delay you've been talking about.
And this shouldn't have taken 4 hours!
react

nice meh bad surprised confused agree disagree
- #8 by Stephan reply to #7 August 2nd, 2011 19:36
  
  Sorry, I don't know OSS. I know that audio systems and especially Pulse Audio are discussed quite often. It simply was installed on our computers (running Ubuntu) and so we used it. There's nothing more to it than that.
  
  Regarding the encoder latency: From the encoder output I got the impression that it emits 20ms frames. However I haven't looked into the I/O code of the encoder.
  
  I just fired up strace and watched the encoder a bit and it looks like it only writes the data in 4 KByte junks. Seems like libogg buffers the data until a page is filled. I replaced the ogg_stream_pageout() call in the encoder with ogg_stream_flush() and it does the trick. Still not perfect but much better. Of course this makes absolutely no sense for a normal encoder. I definitely agree with you here, the encoder is not made for low latency stuff. Anyway if we really wanted to build a serious tool here we would have written a small C program and used libcelt together with some UDP transport layer. But it was a just for fun experiment, nothing more. So we just glued together some tools that were already available. Thanks to the command line this works quite well for the effort we invested. 😉
  
  Using the one way version twice for each direction will get rid of the uncontrollable delay. I've written above that this is the best setup we came up with. Probably didn't emphasized it enough. I only mentioned the two way version with one socket because that is the way I think it should have been. Using two sockets looks like a waste to me. Again, something that is trivial to do right if we would have written a small C program. I hope that was what you meant with "listen on both receiving sides".
  
  Most of the time went into working around NAT. Neither of us had proper access to our routers so we could not establish a direct connection. In the end we used a socat bride on another server to exchange data. An SSH tunnel might have been easier from the get go. All in all about 3 hours went into that I guess. The pulse audio tools, the CELT encoder and decoder as well as netcat were both a matter of minutes. We spend some more minutes on trying some parameters of the encoder and decoder (rate, framesize, etc.) but without much effect.
  react
  
  nice meh bad surprised confused agree disagree
#9 by Kreso April 17th, 2012 8:20

Guys is there some simple solution for us using CELT , who don't know compelling etc. Something simple as Lucy (http://www.luci.eu/?page_id=15))? With lucy live and aac he codec you get 300ms delay.
react

nice meh bad surprised confused agree disagree