Print Page | Close Window

Speech Recognition for a wave file

Printed From: MSSpeech-Forum
Category: Windows™ Speech Recognition Forums
Forum Name: New Users & General Questions
Forum Description: Ask questions, give and get answers.
URL: https://www.msspeech-forum.com/forum_posts.asp?TID=72
Printed Date: 28/Apr/2024 at 9:41pm
Software Version: Web Wiz Forums 12.02 - http://www.webwizforums.com


Topic: Speech Recognition for a wave file
Posted By: gkhanna
Subject: Speech Recognition for a wave file
Date Posted: 07/Apr/2009 at 2:55am

Currently I am working on a project which can recognize the speech in a wave file. I am using SpeechRecognitionEngine class in Framework 3.5. I want to display each word which in said in wave file (something like we see in movies where every sentences spoken is display in text below). Wave file could be a voice of a person or a song. When I try to using Recognize function it only display first 4-5 words from the wave file and they are totally wrong, and have confidence level 0.004. When I try to do it asynchronously (RecognizeAsync function) then speech is not detected at all. I have also tried SAPI 5.3 but I get wrong result.

Below is my code

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim rec As New Speech.Recognition.SpeechRecognitionEngine
        Dim fs As New FileStream("C:\Test.wav", FileMode.Open, FileAccess.Read)
        rec.SetInputToWaveStream(fs)

        rec.LoadGrammar(New DictationGrammar)

        AddHandler rec.SpeechRecognized, AddressOf SpeechRecognized_Click
        AddHandler rec.RecognizeCompleted, AddressOf RecognizedComplete_Click

        rec.RecognizeAsync(RecognizeMode.Multiple)
     
End Sub

Private Sub SpeechRecognized_Click(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs)
        Dim result As RecognitionResult = e.Result
        RichTextBox1.AppendText(result.Text)
End Sub

Private Sub RecognizedComplete_Click(ByVal sender As Object, ByVal e As RecognizeCompletedEventArgs)
        Dim result As RecognitionResult = e.Result
        MsgBox(result.Text)
End Sub
Below link contains the voice file which contains wave file which I am giving as input.
 
http://gauravkhanna.blog.co.in/files/2009/04/voicedemo.zip - http://gauravkhanna.blog.co.in/files/2009/04/voicedemo.zip
Thanks,
Gaurav Khanna



Replies:
Posted By: mmarkoe
Date Posted: 07/Apr/2009 at 9:25am
Originally posted by gkhanna gkhanna wrote:

Currently I am working on a project which can recognize the speech in a wave file. I am using SpeechRecognitionEngine class in Framework 3.5. I want to display each word which in said in wave file (something like we see in movies where every sentences spoken is display in text below). Wave file could be a voice of a person or a song. When I try to using Recognize function it only display first 4-5 words from the wave file and they are totally wrong, and have confidence level 0.004. When I try to do it asynchronously (RecognizeAsync function) then speech is not detected at all. I have also tried SAPI 5.3 but I get wrong result.
 
It sounds to me as if you have been watching too many Star Wars movies and forgotten that this is the 21st century not the 25th century.  There's not enough computing power in the world to do what you were requiring.  Large vocabulary Speech Recognition today is capable of very high accuracy.  However, it is speaker dependent.  This means to work optimally (I would call it usefully to the point where you understand what speakers say) it needs to be trained to an individual's voice.  Speech recognition software does not just listen for the sounds of words but also compares each word to the around it for context clues.
 
Therefore, large vocabulary speech recognition works best:
  1. An individual speaker creates a user training profile unique to their voice's audio quality and has samples of this person's typical syntax.
  2. The  individual speaker enunciates each word clearly as they are speaking.
  3. The speaker speaks in phrases.
  4. When speaking, they use punctuation marks to indicate pauses and the beginnings of sentences and paragraphs.

It is highly unlikely the public speaker would be willing to do the above.  Also, when you try to decipher more than one voice, the task becomes geometrically more difficult.

Marty Markoe



-------------
Marty Markoe, MVP
Microsoft Valued Partner
See us at: http://www.mymsspeech.com



Print Page | Close Window

Forum Software by Web Wiz Forums® version 12.02 - http://www.webwizforums.com
Copyright ©2001-2019 Web Wiz Ltd. - https://www.webwiz.net