I'm keen to develop an intuitive and usable front end for the excellent speech recognition engine that ships with Vista, with the following key considerations:
- Targeted primarily towards an able bodied user (but usable in an alternate configuration by a disabled user)
- Striking an optimal balance between speech and manual input
eg keys to turn the microphone on and off, hold down a key while dictating a name, hold down a key while dictating a command, use keys to navigate text by phrase, select alternatives etc
- Dictate seamlessly into any application, by dictating into an on-screen 'buffer' which then (on demand) dumps its content into the active window. This buffer would be something like NotePad, but designed specifically for synthesising keyboard and speech input.
- Provide an integrated and context sensitive GUI
eg the list of active commands would display/toggle in semi transparent overlay
- Powerful macro scripting language (such as AutoHotKey.com)
I can see a genuine need, and an opportunity to create a premium product to address this need.
Much as I would love to manifest this myself, it is not practical for me. RSI may be no more a barrier to long forum posts, but it is a stopper for coding.
I would like to throw this open for discussion, and get some temperature reading from the speech recognition community (develop and user).
If you are working with speech technology, and fancy involving with this effort, please post a reply. If we hit critical mass we can do something.
If you fancy coring it, please involve me - post a reply or send me an e-mail. I can see a clear path to an effective solution.
Sam
PS The existing Vista SR front end is for me little short of a nightmare. Please don't get me wrong when I say this, I mean to be objective and constructive. This is an honest reflection of my experience with this product (and every other speech recognition product I have ever used).
- It allows practically no customizability.
- Interspersing commands with dictation, eg Every time I say 'speech recognition', it detects a match on the webpage with 'WSR windows speech recognition help', and changes the webpage. After having lost my post three times , Im now dictating into notepad and using copy / paste.
- It is specifically targeted at people with disabilities, hence tries to accomplish everything through speech - rather than focusing on an optimal balance between speech and manual input
- It is unresponsive in a noisy environment. Clearly the code locks as it tries to decipher a noisy signal, but this means that user interface (ie the single on off button for the microphone) does not respond, sometimes for several minutes.
- It does not work (or half-works) in certain contexts / apps
- etc
I'm starting to realise that the community at Microsoft developing this technology is opening portals of communication to their user base.
If it is of use to someone, I would be happy to record a diary of every frustrating incident over several days, and categorize into key areas.
|