Tuesday, January 8, 2008

Some day you will be able to talk to your computer

Actually, you can do that now. The trick is getting it to listen.

I heard someone on the radio talking about how before too long, keyboards were going to be practically obsolete. Typing, they said, is just one way of communicating with a computer. Why should you type in, say, a search phrase when you could just say "Where's a good Thai restaurant in the area?" and have the computer google it up?

The "just another way of communicating" angle rang a bell, and now I remember which bell. It's the same reason text is supposed to be dying. I've already argued that, um, there must be something amiss with that position, but killing typing is not the same as killing text itself. One could imagine a world in which we still read text, in all its visually-tuned, random-access glory, but create it using voice recognition and a mouse (or a pen, or touchscreen, since mice are about to be obsolete as well, but let's stick to one UI device at a time).

We shall see. Speech recognition (technically, voice recognition means figuring out who is speaking as opposed to what they're saying) has been around for a while now, getting steadily better. There now appear to be dictation systems that can capture most people's normal, continuous speech with high accuracy. This is important, because having to speak ... slowly ... and ... distinctly ... for ... long ... periods ... of ... time or constantly correct monsieur's misheard words rapidly negates any advantage in ease-of-use.

Nonetheless, I'm not sure speech is going to take over as completely as one might think. Why do people send text with cell phones? One would be hard-pressed to imagine a more tortuous way of producing text than to thumb it in on a tiny numeric pad, especially before word recognition, but people did, and do, even when they could call, or leave a voice mail. Or for that matter, why do people IRC while on the phone?

Conversely, how often do you get an email with a voice attachment? I doubt bandwidth or storage are significant problems for that any more. Speech requires around 3Khz of bandwidth and compresses quite a bit better than, say, classical music.

Personally I would prefer not to talk to my computer much of the time, either because the environment is noisy (playing havoc with accuracy), or because it's quiet (and I don't want to disturb anyone), or because there's someone else in the room I might like to talk to without confusing the UI. Occasionally I might even want to mutter some choice words at it without them showing up in the latest blog post for all the world to see.

In short, speech recognition is useful now (particularly if typing is difficult or impossible for a person) and will continue to become more useful as the technology continues to improve, but I don't see it taking over the world.

Postscript: Long ago I decided I was going to get rich by selling a computer that would run faster if you yelled at it. I could have done it, too. Just hook up a volume meter to the system clock. When nothing's coming in, the system runs at, say, half speed. As the volume increases, the clock comes back up to normal. Of course, you'd have to get people used to a computer that runs artificially slowly. But that's a solved problem ...

1 comment:

earl said...
This comment has been removed by a blog administrator.