Firstly, there are two categories of speech recognition: speech-to-text processing and direct voice input. The former is what you get from apps such as MacSpeech Dictate. The latter is what you get out of the box with a Mac - in Apple parlance, it's known as "Speakable Items". The advantage of Speakable Items is that the system only needs to listen out for a short list of possible commands, rather than attempting to interpret infinite possible combinations of words from the entire dictionary. Speech-to-text is processor intensive, and may be tricky to get working on an iPhone. Direct voice input is pretty straightforward by comparison, and is already offered on numerous competitor handsets.
So what kinds of services would direct voice input enable? Firstly, I'd argue that it's all to do with headphones. It's not interesting to be able to bark voice commands when you're already interacting with the touch interface. It's only really interesting when you're on the move, and can't get the phone out of your bag - say you're driving, or walking, listening to music. Imagine being able to press the headphone button once (to pause the music, plus to activate the speakable items listen command). Then you could give one of the following commands:
- time: the phone's voice tells you the time
- call [name]: this one's obvious!
- last call: the phone's voice tells you the last caller
- last text: the phone's voice tells you who just sent you a text
- last e-mail: the phone's voice tells you who just sent you an e-mail
- open [app name]: the app is launched (ideal for apps with voice interface)
- play [song name]: play's song in iPod app
Woops, I think that Play [Song name] would increase a fair bit of possibilities in the words the speak-in module has to monitor, wouldn't it?
ReplyDeleteApart from that, as in almost all of the time, I think you're on a 'righter' track than apple itself.
Hi Pedro, thanks for the feedback.
ReplyDeleteYou may be right about Play [Song name]. I based the assumption on the fact that my previous phone (a Nokia) allowed you to speak names from the contact list without the need for voice tags. Matching against a predefined list of a few hundred contacts (or songs/albums/artists) is a simpler task to get accurate than full text-to-speech functionality.