News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Simple pulse counter and emitter program

Started by ImmortalPr1nce, September 07, 2013, 01:53:49 AM

Previous topic - Next topic

Paulo

#45
Hi georg

I'm not sure what you mean by CV.

As regards speech/voice recognition, you have a few choices:

1) Use a library or add-on such as:
(Note that I have never used them)

http://voce.sourceforge.net/

http://code.google.com/p/dragonfly/

2) Try WSR (Windows Speech Recognition).
It was first released with Vista but it is included with 7, just tried on my 7 box and it's there.
See here:

http://windows.microsoft.com/en-us/windows-vista/common-commands-in-speech-recognition

3) If you just want to detect speech but not interested in actual words, you may want to try a bit of FFT, you may even be able to code something on a micro such as the ATmega range.

4) Human speech is mainly concentrated in the range 300Hz to around 3KHz so perhaps experiment using several tone decoders.
The more you use the less false triggers you will get.
Don't know how accurate and reliable a method this will be.
Perhaps something like this:



Datasheet for LM567 tone decoders here:
http://www.ti.com/lit/ds/symlink/lm567.pdf

To make a more "intelligent" circuit, replace the AND gate with a micro, then you could pgm the micro to give a valid output
if certain combinations of tones/frequencies are present.

You could also replace the tone decoders with active bandpass filters (using op-amps) and then rectify each frequency output.
See here"
http://www.seattlerobotics.org/encoder/200003/AudioDetector.gif

Although in my circuit above, I chose random frequencies (between 300 and 3000Hz), you may want to be more scientific
about it and choose them according to this:
http://www.newscientist.com/article.ns?id=dn4031

5) Use a DSP chip like those available from Ti.
http://www.ti.com/lsds/ti/dsp/overview.page

dedndave

i think they use FFT in software, rather than hardware, to do that

Paulo

Hi dedndave
Agree 100% that proper voice recognition is done via software but if all one wants is to detect if there is speech rather then wanting to detect actual words then either software or hardware can be used.
The hardware option may be simpler to implement but will certainly not be as accurate or flexible as using a pure software based FFT.

georg


georg

As always, is not a bad idea to do it in hardware, the FFT is fine, especially if you can trigger an alarm and record a few minutes of sample. as you did in your motion detector.

Paulo

Quote from: georg on September 13, 2013, 01:06:14 AM
Hi Paulo CV, stands for 'computer vision'

Aaahh OK got it now.

Quote from: georg
As always, is not a bad idea to do it in hardware, the FFT is fine, especially if you can trigger an alarm and record a few minutes of sample. as you did in your motion detector.

Did a quick frequency analysis and below is the result showing frequencies and closest note on musical scale.


So if you decide on the hardware option, set up say 12 (or more)  "monitoring" frequencies  ranging from A4 (440Hz) to around F7 (2793Hz) as most speech will fall between these
then get a micro to give an output if more then 75% of them are present.
The chances of other "noises" containing such a high percentage of these frequencies will be rather low.
Of course it will be a compromise between checking on how many of the "notes" are present and rejecting other "noises" by making the percentage high versus the possibility
of missing some speech by reducing that percentage but increasing the chances of a false trigger.

Frequencies of musical scale here:
http://www.phy.mtu.edu/~suits/notefreqs.html

It all depends on what you are trying to achieve with your project.
If you want to build something just to tinker and for the educational value, then go the hardware route.
However if you want to make a commercial product them I would say rather go the computer+software FFT route.
These days computers are very cheap compared to the processing power you get for your buck (even a entry level PC will have more than enough grunt).
Keep in mind that when I did the ldr motion sensor, computers did not have the capability they have now and stand alone DSP chips, Pics and ATmegas did not exist
so I had to make do with what was available.

georg

I'm very aware of the limitation at that time Paulo, but your idea is very good because is easy to implement, and is very very cheap, I'll keep in mind your suggestions, very good approach  :eusa_boohoo:

dedndave

some amateur radio equipment uses 300 to 2400 Hz
that's probably a little tight for someone with a high voice, like a female

but, for speech recognition, you could probably get all you need from 350 to 2450 Hz
likely you don't even need that much

Paulo

Hi dedndave

Interesting point so ran another frequency plot but this time with a woman saying hello.
Below is the unfiltered response:



And with filtering.
As can be seen, most of the speech is still within the pass band so unless the person has just inhaled a lot of helium gas,
350 to 2400Hz should be fine.  :biggrin:


Paulo

georg

Glad you liked the idea.
Sometimes doing things in an unconventional manner pays off   :t