MSSpeech-Forum Homepage
Forum Home Forum Home > Windows™ Speech Recognition Forums > WSR for Professionals > Transcriptionists
  New Posts New Posts RSS Feed - How to improve SDK accuracy?
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

How to improve SDK accuracy?

 Post Reply Post Reply
Author
Message
carolfly View Drop Down
Member
Member


Joined: 21/Sep/2009
Status: Offline
Points: 3
Post Options Post Options   Thanks (0) Thanks(0)   Quote carolfly Quote  Post ReplyReply Direct Link To This Post Topic: How to improve SDK accuracy?
    Posted: 22/Sep/2009 at 10:45am
Hi, I'm not sure its the right place to ask a question about the MS Speech Engine SDk. Hopefully some one could help me with my problem.
I've been trapped in this problem for a very long time. I'm developing an application that take audio files as input and generate transcript from it using SAPI 5.1. However the accuracy is too disappointing, the accuracy is almost below 30%, most of the time the engine just guess what uttered in the audio file, even with a good quality audio file without any back ground noise and has standard pronunciation. I use dictation grammar and the wav file format is 16 bit,44100 hz and mono. Could anyone told me what should I do to improve the accuracy or it's the nature of MS SAPI that could only recognize voice correctly after trained? Is there any way to train the speech engine with the audio file which might including multiple speekers?  
Back to Top
mmarkoe_admin View Drop Down
Admin Group
Admin Group
Avatar

Joined: 16/Jul/2008
Status: Offline
Points: 331
Post Options Post Options   Thanks (0) Thanks(0)   Quote mmarkoe_admin Quote  Post ReplyReply Direct Link To This Post Posted: 22/Sep/2009 at 12:27pm
Originally posted by carolfly carolfly wrote:

I'm developing an application that take audio files as input and generate transcript from it using SAPI 5.1. However the accuracy is too disappointing, the accuracy is almost below 30%, most of the time the engine just guess what uttered in the audio file, even with a good quality audio file without any back ground noise and has standard pronunciation. I use dictation grammar and the wav file format is 16 bit,44100 hz and mono. Could anyone told me what should I do to improve the accuracy or it's the nature of MS SAPI that could only recognize voice correctly after trained? Is there any way to train the speech engine with the audio file which might including multiple speekers? 
You have several serious hurdles which are difficult to impossible to overcome.
 
First of all, you need to understand that large vocabulary speech recognition is speaker dependent and works best when it is trained for an individual's unique voice.
 
Next, large vocabulary speech recognition software not only looks for the sounds of each word, but compares each word to the words around it for context clues. For example, I dictate the following correctly every time, "Two boys went to see a doctor because they ate too much food." Context is how it knows which to, two or too to use.
 
In other words you cannot input (directly through a microphone or indirectly through a digital recording) conversational speech and expect high accuracy. High accuracy is attained when each word is enunciated clearly, when words are spoken in phrases for context comparisons, and the speaker uses spoken punctuation.
 
I hope this helps you understand why you have not been successful.
 
Marty
Back to Top
carolfly View Drop Down
Member
Member


Joined: 21/Sep/2009
Status: Offline
Points: 3
Post Options Post Options   Thanks (0) Thanks(0)   Quote carolfly Quote  Post ReplyReply Direct Link To This Post Posted: 23/Sep/2009 at 10:22am
Thanks a lot for your reply. Actually, I used to feel it's impossible to improve the accuracy since you cannot train the engine with the audio files. However, my friend showed me a software named docsoft, with the same audio file it got a much better transcript than the MS engine did. Also I noticed that the dragon naturally speaking has a promising accuracy.  I start to doubt whether it is my fault in using the MS engine or it is the MS engine itself suffers from the problem of enable to get accurate transcript ?
Back to Top
mmarkoe View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 24/Jul/2008
Status: Offline
Points: 210
Post Options Post Options   Thanks (0) Thanks(0)   Quote mmarkoe Quote  Post ReplyReply Direct Link To This Post Posted: 23/Sep/2009 at 4:36pm
Originally posted by carolfly carolfly wrote:

Thanks a lot for your reply. Actually, I used to feel it's impossible to improve the accuracy since you cannot train the engine with the audio files. However, my friend showed me a software named docsoft, with the same audio file it got a much better transcript than the MS engine did. Also I noticed that the dragon naturally speaking has a promising accuracy.  I start to doubt whether it is my fault in using the MS engine or it is the MS engine itself suffers from the problem of enable to get accurate transcript ?
The best way to tell this to do a test with a single digital recording file with the each software. Compare the results and you will know immediately.
 
Marty
Marty Markoe, MVP
Microsoft Valued Partner
See us at: http://www.mymsspeech.com
Back to Top
carolfly View Drop Down
Member
Member


Joined: 21/Sep/2009
Status: Offline
Points: 3
Post Options Post Options   Thanks (0) Thanks(0)   Quote carolfly Quote  Post ReplyReply Direct Link To This Post Posted: 24/Sep/2009 at 10:34am
I've tested the same audio file on docsoft and my application, it seems that the accuracy of docsoft is much better above 80%, but in my application it only captured some of the words but most of them are incorrect. I checked my code again and again, I'm sure I followed every step as follow:
1) creating SpInprocRecognizer ,
2) create context ,
3)create grammar (I've tried grammarid from 0-10 but none of them gave out a good accuracy)
4)load dictation grammar
5) load audiofile ,
 and the SpeechRecognized event fired as well and I use the PhraseInfo.GetText to get the recognized results. However the accuracy is still very bad, I can believe the accuracy of the engine is as low as it. Could you tell me where I did wrong ?
Back to Top
mmarkoe_admin View Drop Down
Admin Group
Admin Group
Avatar

Joined: 16/Jul/2008
Status: Offline
Points: 331
Post Options Post Options   Thanks (0) Thanks(0)   Quote mmarkoe_admin Quote  Post ReplyReply Direct Link To This Post Posted: 24/Sep/2009 at 3:05pm
Originally posted by carolfly carolfly wrote:

and the SpeechRecognized event fired as well and I use the PhraseInfo.GetText to get the recognized results. However the accuracy is still very bad, I can believe the accuracy of the engine is as low as it. Could you tell me where I did wrong ?
See my response above Posted: 22/Sep/2009 at 12:27pm . This is why you cannot get good accuracy.
 
Marty
Back to Top
srinwantudey View Drop Down
Member
Member
Avatar

Joined: 16/Nov/2009
Location: USA
Status: Offline
Points: 1
Post Options Post Options   Thanks (0) Thanks(0)   Quote srinwantudey Quote  Post ReplyReply Direct Link To This Post Posted: 16/Nov/2009 at 5:06am
dear carolfly,
I would suggest you to post your code here, so that we may give tips to improve..
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.02
Copyright ©2001-2019 Web Wiz Ltd.

This page was generated in 0.523 seconds.

Microsoft Most Valuable Professional

§- Thank you for visiting our Windows Speech Recognition and Macro Forum.. -§