MSSpeech-Forum Homepage
Forum Home Forum Home > Windows™ Speech Recognition Forums > New Users & General Questions
  New Posts New Posts RSS Feed - Speech Recognition for a wave file
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

Speech Recognition for a wave file

 Post Reply Post Reply
Author
Message
gkhanna View Drop Down
Member
Member
Avatar

Joined: 07/Apr/2009
Status: Offline
Points: 1
Post Options Post Options   Thanks (0) Thanks(0)   Quote gkhanna Quote  Post ReplyReply Direct Link To This Post Topic: Speech Recognition for a wave file
    Posted: 07/Apr/2009 at 2:55am

Currently I am working on a project which can recognize the speech in a wave file. I am using SpeechRecognitionEngine class in Framework 3.5. I want to display each word which in said in wave file (something like we see in movies where every sentences spoken is display in text below). Wave file could be a voice of a person or a song. When I try to using Recognize function it only display first 4-5 words from the wave file and they are totally wrong, and have confidence level 0.004. When I try to do it asynchronously (RecognizeAsync function) then speech is not detected at all. I have also tried SAPI 5.3 but I get wrong result.

Below is my code

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim rec As New Speech.Recognition.SpeechRecognitionEngine
        Dim fs As New FileStream("C:\Test.wav", FileMode.Open, FileAccess.Read)
        rec.SetInputToWaveStream(fs)

        rec.LoadGrammar(New DictationGrammar)

        AddHandler rec.SpeechRecognized, AddressOf SpeechRecognized_Click
        AddHandler rec.RecognizeCompleted, AddressOf RecognizedComplete_Click

        rec.RecognizeAsync(RecognizeMode.Multiple)
     
End Sub

Private Sub SpeechRecognized_Click(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs)
        Dim result As RecognitionResult = e.Result
        RichTextBox1.AppendText(result.Text)
End Sub

Private Sub RecognizedComplete_Click(ByVal sender As Object, ByVal e As RecognizeCompletedEventArgs)
        Dim result As RecognitionResult = e.Result
        MsgBox(result.Text)
End Sub
Below link contains the voice file which contains wave file which I am giving as input.
 
Thanks,
Gaurav Khanna
Back to Top
mmarkoe View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 24/Jul/2008
Status: Offline
Points: 210
Post Options Post Options   Thanks (0) Thanks(0)   Quote mmarkoe Quote  Post ReplyReply Direct Link To This Post Posted: 07/Apr/2009 at 9:25am
Originally posted by gkhanna gkhanna wrote:

Currently I am working on a project which can recognize the speech in a wave file. I am using SpeechRecognitionEngine class in Framework 3.5. I want to display each word which in said in wave file (something like we see in movies where every sentences spoken is display in text below). Wave file could be a voice of a person or a song. When I try to using Recognize function it only display first 4-5 words from the wave file and they are totally wrong, and have confidence level 0.004. When I try to do it asynchronously (RecognizeAsync function) then speech is not detected at all. I have also tried SAPI 5.3 but I get wrong result.
 
It sounds to me as if you have been watching too many Star Wars movies and forgotten that this is the 21st century not the 25th century.  There's not enough computing power in the world to do what you were requiring.  Large vocabulary Speech Recognition today is capable of very high accuracy.  However, it is speaker dependent.  This means to work optimally (I would call it usefully to the point where you understand what speakers say) it needs to be trained to an individual's voice.  Speech recognition software does not just listen for the sounds of words but also compares each word to the around it for context clues.
 
Therefore, large vocabulary speech recognition works best:
  1. An individual speaker creates a user training profile unique to their voice's audio quality and has samples of this person's typical syntax.
  2. The  individual speaker enunciates each word clearly as they are speaking.
  3. The speaker speaks in phrases.
  4. When speaking, they use punctuation marks to indicate pauses and the beginnings of sentences and paragraphs.

It is highly unlikely the public speaker would be willing to do the above.  Also, when you try to decipher more than one voice, the task becomes geometrically more difficult.

Marty Markoe

Marty Markoe, MVP
Microsoft Valued Partner
See us at: http://www.mymsspeech.com
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.02
Copyright ©2001-2019 Web Wiz Ltd.

This page was generated in 0.188 seconds.

Microsoft Most Valuable Professional

§- Thank you for visiting our Windows Speech Recognition and Macro Forum.. -§