Friday, 14 November 2008

Delays to new headphones may point the way to voice memos and speech recognition on iPhone and touch

Recently, this blog argued that optimizations in Snow Leopard may result in Speech Recognition and Text to Speech making their way into the iPhone. Of course, iPhone already offers some speech recognition support via 3rd party apps, such as Say Who. And today, Google has made a big splash with their announcement of speech recognition for their iPhone Google app. But the true potential for speech recognition on iPhone will only be realized when it is introduced system-wide. And here's some unadulterated speculation of just how that might work...

Firstly, there are two categories of speech recognition: speech-to-text processing and direct voice input. The former is what you get from apps such as MacSpeech Dictate. The latter is what you get out of the box with a Mac - in Apple parlance, it's known as "Speakable Items". The advantage of Speakable Items is that the system only needs to listen out for a short list of possible commands, rather than attempting to interpret infinite possible combinations of words from the entire dictionary. Speech-to-text is processor intensive, and may be tricky to get working on an iPhone. Direct voice input is pretty straightforward by comparison, and is already offered on numerous competitor handsets.

So what kinds of services would direct voice input enable? Firstly, I'd argue that it's all to do with headphones. It's not interesting to be able to bark voice commands when you're already interacting with the touch interface. It's only really interesting when you're on the move, and can't get the phone out of your bag - say you're driving, or walking, listening to music. Imagine being able to press the headphone button once (to pause the music, plus to activate the speakable items listen command). Then you could give one of the following commands:
  • time: the phone's voice tells you the time
  • call [name]: this one's obvious!
  • last call: the phone's voice tells you the last caller
  • last text: the phone's voice tells you who just sent you a text
  • last e-mail: the phone's voice tells you who just sent you an e-mail
  • open [app name]: the app is launched (ideal for apps with voice interface)
  • play [song name]: play's song in iPod app
This feature may even be one of the reasons why Apple's new headphones with mic have been delayed. We've already had software updates for the 4GB iPod nano, and the 120GB Classic iPod, but notably no update (yet) for the iPod touch. Rumor has it that the iPhone 2.2 update is due imminently - this will presumably add support for the new headphones, although, Apple claims the mic will only be supported on the 2nd generation touch. Apple will also presumably add an iPod Touch (and iPhone) app for voice memos, since it would be strange to offer this on the iPod nano, but not the more powerful touch. Since the latest builds of the 2.2 software don't appear to contain a built-in Voice Memo app, Apple probably plans to distribute it as a free download from the store - in a similar fashion to Remote.


  1. Woops, I think that Play [Song name] would increase a fair bit of possibilities in the words the speak-in module has to monitor, wouldn't it?

    Apart from that, as in almost all of the time, I think you're on a 'righter' track than apple itself.

  2. Hi Pedro, thanks for the feedback.

    You may be right about Play [Song name]. I based the assumption on the fact that my previous phone (a Nokia) allowed you to speak names from the contact list without the need for voice tags. Matching against a predefined list of a few hundred contacts (or songs/albums/artists) is a simpler task to get accurate than full text-to-speech functionality.