Friday, January 11, 2008

What is the bandwidth of one voice talking?

A side question to the previous post: About how much information does a voice convey per unit time? Let's neglect tone of voice here; it's clearly important in real speech, but my guess is that if you could quantify it, it would come out to not very many bits (Happy? yes/no. Angry? yes/no. Sarcastic? yes/no. etc., each changing fairly seldom). It's also not something that computers are terribly good at picking up, so it's not relevant to the particular case of speech recognition.

At an upper limit, how fast can people talk? Appearances can be deceiving here. That auctioneer rattling on a mile a minute is really just continually refreshing two 2- or 3-digit numbers (possibly with some zeros attached) that change every few seconds. That person zipping along in a foreign language isn't really talking significantly faster or slower than you would, but it sounds like a lot since you don't understand it. And of course, some people can say more with a word than others can say with a paragraph.

The record for speaking English appears to be around 10 words per second (yikes!). Mind, this is most likely someone spewing out a prepared spiel that they've practiced over and over again. Assuming about 10 bits per English word (estimates vary a bit), that's about 100 bits per second. My guess is that most of us, particularly those of us actually coming up with the words as we go along, would do well to hit half that. On the other hand, most of us type considerably more slowly than that.

So let's figure 50 bits/second for running speech, for example, if you're dictating a letter. What if you're just barking out commands from a set list? Interestingly, bandwidth drops considerably. For example, if it takes half a second to bark out one of 16 commands, that's 8 bits/second. Not exactly broadband.

