News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

FIR Audio Spectrum Analyzer

Started by Siekmanski, September 03, 2014, 10:14:36 PM

Previous topic - Next topic

Siekmanski



Digital Signal Processing Example,

How you can use FIR filters to analyze sound.
How to calculate the FIR coefficients for low-pass, high-pass, band-pass and band-reject filters with a Hamming window.
How you can screen-synchronize the audio with DirectSound and Direct3D9.

I have rewritten everything to make things more flexible.
This example demonstrates a 1 octave 10 band analyzer.

The source code has some text to explain things, and where to get additional information on this subject.

edit:

Line 1474 had an error, each bar has 4 vertices and i was sending 10 vertices instead of 40 to the graphics card. ( see post of nidud )

    coinvoke    g_pD3DDevice,IDirect3DDevice9,DrawIndexedPrimitive,D3DPT_TRIANGLELIST,0,0,AudioBarCount,0,AudioBarCount*2
must be:
    coinvoke    g_pD3DDevice,IDirect3DDevice9,DrawIndexedPrimitive,D3DPT_TRIANGLELIST,0,0,AudioBarCount*4,0,AudioBarCount*2

Added a routine ( Reset3DEnvironment ) in case the graphics device has been lost.
It will restore the vertex and index buffers and reset the graphics device.

Corrected 2 bugs found,
There was a buffer overrun when calculating the frequency bands. ( corrected this )
And something very stupid, i was only calculating the left audio channel. ( corrected this )


New attachment,

Marinus
Creative coders use backward thinking techniques as a strategy.

dedndave

very nice Marinus - and small, too   :t

avcaballero

Thank you, Marinus. I'll have a look  :t

guga

Excelent work Marinus.

One small isseu only. Is it only me, or the app is leaking memory ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Gunther

You have to know the facts before you can distort them.

nidud

#5
deleted

Siekmanski

Thanks guys,

@ guga, are you sure about memory leakage? This proggy uses filemapping, could it be that?
But if you found a bug please let me know.

@ nidud, thanks for this bug report. I think i have found it...
Can you test this example?
If it works well i will correct the source code and post it in the first post of this thread.

Also learned today that the window width and height of a dialog program is not the same on all pc's  :biggrin:
Creative coders use backward thinking techniques as a strategy.

nidud

#7
deleted

Siekmanski

Thanks, i'll make a new attachment.  :t
Creative coders use backward thinking techniques as a strategy.

guga

Hi Marinus.

I´m not sure what caused the leakage. I had the antivirus opened when i 1st runned your app, causing it to "freeze" the directory where it was.  Most probably was the antivirus scanning the allocated memory from it.

After a reboot, i saw that it have no problems.

But, since you commented it..why using filemapping, instead simply using the heap allocated memory ?

Also, can you make it be editable (and saving the results) ? I mean, it would be nice control the different frequencies. For example if i want to only hear all frequencies between 400Hz to 800 Hz (Human voice, i presume). Also, a pitch identification would be nice. (Not a pitch control, but a pitch identification, to make easier isolate human voice, for example)

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

@ guga, why using filemapping?

I just map the Wav file to the process space getting a pointer to the beginning of the Wav file data in the processes own memory space.
So i have easy access to the Wav data.

By pitch identification, do you mean to have a variable frequency band width with a controllable center frequency?
This is possible with FIR filters but then you have to calculate the coefficients on the fly.
( we can speed up this routine because the frequency response of the filter is symmetric so we only have to calculate the first half of them )
Then save the filtered data to disk or send it to the sound card.
Creative coders use backward thinking techniques as a strategy.

guga

#11
Hi marinus

QuoteBy pitch identification, do you mean to have a variable frequency band width with a controllable center frequency?

Yes, but with a bit more control then simply adjust the frequency or speed. I mean a way to identify Human voice by isolating it from the background music. (The centered channel is not always the best way to do it, since a audio can have mixed background noise or music centered too)

There is a tool that does that to better exemplify what i´m trying to say.
http://www.celemony.com/en/melodyne/what-is-melodyne

here too have a similar tool http://aubio.org

One other thing with pitch identification is that it is possible to change the voice of one person with another, matching formands. Below is a tool that "mimics" other people voice almost perfectly.
http://www.voiceconverter.net
See the example of "voice matching - spanish dubbing example". This app uses, in fact, a opensource software called Praat. See here (it also includes the source code) http://www.fon.hum.uva.nl/praat
With praat you can see a tutorial on acoustic analysis, i.e. what the waveform, the spectrogram and the pitch curve tell you about durations, formants, pitches etc http://www.fon.hum.uva.nl/paul/papers/AcousticAnalysis8.pdf
http://www.fon.hum.uva.nl/paul/praat.html

One way to identify human voice from the rest of the audio is isolate all frequencies that are not in the band for the human voice and later, trying to retrieve through noise patterns what is voice and what is noise. Or without patterns as celemony or voice converter does.

Human voice frequency bands are described here:
http://en.wikipedia.org/wiki/Voice_frequency
http://en.wikipedia.org/wiki/Vocal_range
http://www.bnoack.com/index.html?http&&&www.bnoack.com/audio/speech-level.html
http://www.cs.cf.ac.uk/Dave/Multimedia/node271.html
http://www.axiomaudio.com/blog/audio-oddities-frequency-ranges-of-male-female-and-children%E2%80%99s-voices/


The identification of human voice can also be achieved by the crest factor.
http://www.bnoack.com/index.html?http&&&www.bnoack.com/audio/crestfactor.html
http://en.wikipedia.org/wiki/Crest_factor
http://www.spectrum-soft.com/news/spring2011/crest.shtm

An algorithm that isolates the frequencies and also compute the crest factor (above 15 db is always human voice) can lead to a better isolation. Since the peak used to compute the crest factor is a average of all peaks, and since RMS is used to calculate it,a fine tune can be made to isolate frequencies that results on a RMS lower then 15 db.
The formula for crest factor is
C = |a|/RMS
RMS = a/sqr(2)
y = a*sin(2*Pi*Frequency*Time)

Since it is possible to compute the amplitude, is also possible to calculate it´s average and therefore the RMS. Since amplitude is a measure of frequency and time, and also the resultant RMS. It is possible to isolate frequencies by forcing RMS to compute only values above 15db. On this way, the frequency and amplitutes not related to human voice will also be removed to fit RMS > 15

basicsynth.com/uploads/AddendumCh6a.pdf
http://www.indiana.edu/~emusic/acoustics/amplitude.htm
http://www.dspguide.com/ch2/2.htm
http://recordingology.com/in-the-studio/distortion/square-wave-calculations/

For peak detection
http://www.wavemetrics.com/products/igorpro/dataanalysis/peakanalysis/peakfinding.htm
http://www.calculatefactory.com/calculate/formula/2611
http://www.ni.com/white-paper/4278/en/

Also, if using patterns, once you know what are the frequencies not related to human voice, you can try finding the patterns of it. For example, if a given wave sound related to voice is inside another one that is not, it will produce a specific frequency that was interfered by the superimposing bye each other. The goal is that at a certain point (time) we may have before the interference non voice frequencies related to a specific pattern.

So, if we find this pattern we can cancel the non voice frequency including the one that is embeded in the human voice. This is done by cancelation of the wave form as describes here
http://holykaw.alltop.com/this-mathematical-formula-can-cancel-out-all
http://www-personal.umich.edu/~gowtham/bellala_EECS452report.pdf
Noice cancelation is done also by audacity, IzotopeRX and other apps. But i´m not sure if it is on the same techique.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

Hi guga,

This is all very interesting stuff but not on my todo-list yet. ( maybe it would if there where 64 hours in a day  :biggrin:)
Although it might be interesting to get the formants out of the speech audio and let a 3D head speak.
Have you ever done this kind of coding?
Are you a musician?
Creative coders use backward thinking techniques as a strategy.

guga

Hi Marinus

QuoteThis is all very interesting stuff but not on my todo-list yet. ( maybe it would if there where 64 hours in a day
I know what you mean.  :greensml:

QuoteHave you ever done this kind of coding?
Are you a musician?

I´m not a musician. My interest in audio edition is mainly because i´m a collector of old movies (I have something around 10000 films/series/cartoons etc) and i´m used to edit/restore the audio the videos i have. (also the video itself).
For that purpose (audio), i use tools like Audacity, Magix Audio Cleaner, IzotopeRX, Melodine, Sony Vegas, etc. And the voice recognition apps i told to try to recover the voice of old narrators or dubbers of the film.

The main problem with old movies (specially those that i had from 16mm films) is the bad quality of the audio. Although the restoration process is a bit of fun, sometimes it takes too long time to edit them on the "conventional" apps.

About coding for it, i didn´t tried yet to code something like that before. I focused my poor free time in video, but audio is something that i´m really interested in try to code eventually.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Zen

#14
SIEKMANSKI,
I made a quick scan of your source code,...it looks REALLY interesting,...
However, am running on Windows Seven Professional and have only DirectX Version 11 installed on my system, and so,...the initialization failed.
I get the: "Unable to create DirectSound Object" MessageBox.
Could you provide us with some more information on which DirectSound version your program is compiled against ??? The source includes define the IDirectSound8 interfaces. So where can I get the correct DirectSound DLLs to get the application to operate correctly ???
Zen