Author Topic: Benchmark with minimum overhead  (Read 13843 times)

guga

  • Member
  • *****
  • Posts: 1282
  • Assembly is a state of art.
    • RosAsm
Re: Benchmark with minimum overhead
« Reply #15 on: December 19, 2015, 04:55:31 AM »
New version. (Fixed problem for windows 2000), removed manifest file (that was causing problems on win8.1 x64) and tried to fix the slowdown problems and the bad identification of the CPU Frequency.

Please, let me know if the errors were fixed. many thanks
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TWell

  • Member
  • ****
  • Posts: 748
Re: Benchmark with minimum overhead
« Reply #16 on: December 19, 2015, 06:28:19 AM »
AMD again

guga

  • Member
  • *****
  • Posts: 1282
  • Assembly is a state of art.
    • RosAsm
Re: Benchmark with minimum overhead
« Reply #17 on: December 19, 2015, 06:54:39 AM »
Great, TWell. many thanks.

It seem to work properly on a AMD.Did it took so long this time to finish ? In my machine it takes something around 2 seconds to finish. (I´ll consider inserting a function to find the total time it takes to work only after i reach the necessary stability i want. This is because, if i have to use gettickcount or other ways to measure it, maybe it interferes (again) with the computation).

On my I7 870 ,2.93 GHZ it have a average of

162.32416152339977 nanoseconds with Algo1 (Cpuid/rdtsc)
161.72941143718396 nanoseconds with Algo2 (Lfence/rdtsc)
163.20196901594997 nanoseconds with Algo3 (Lfence/rdtsc/lfence)

But still, i´m having a fluctuation of 1 nanoseconds after restarting the counting. This variations are due to some minor interferences on the new GetCpuFrequency function and the instability generated by the usage of cpuid/rdtsc. On my machine, what seems more stable is the 2nd, 5th and 7th algorithms.

I suceeded to overcome a bit with the GetCpuFrequency but it seems that still it have a margin of error or 1 or 2 nanoseconds here and there. Although this is not a problem for timming real functions, this causes errors on anlysing short codes, such as analyse only "mov eax 0" or "xor eax eax" etc. The timmings i was achieving before the implementation of that were something around 0.6 to 0.8 nanoseconds (That is more related to the measures from intel and agner fog).

Anyway, if the average you are finding on the different algos don´t varies that much, then, the app is working as planned :) I´ll continue the analysis here to see if i can get more stability and accuracy.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

mabdelouahab

  • Member
  • ***
  • Posts: 454
Re: Benchmark with minimum overhead
« Reply #18 on: December 19, 2015, 06:57:11 AM »
win8.1 x64

guga

  • Member
  • *****
  • Posts: 1282
  • Assembly is a state of art.
    • RosAsm
Re: Benchmark with minimum overhead
« Reply #19 on: December 19, 2015, 07:04:11 AM »
Great timmings mabdelouahab. Since you have rdtscp on your machine, try to use Algo7 or 5 to check the results.
I`m surprised to it found a variance of 43 !!! It wasn´t suppose to happen . I´ll review the code for collecting the "good samples". A variance of that rate means that some sample passed through the function that was supposed to barrier him.

Although the minimum STD you found was 279 (Which may represents the actual time your processor is computing it), it is weird to find a variance so high. It is not uncommon CPUID causes high rates on the variance, but on this level, it means i need to review the function used to collect the samples.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

mabdelouahab

  • Member
  • ***
  • Posts: 454
Re: Benchmark with minimum overhead
« Reply #20 on: December 19, 2015, 07:19:46 AM »



guga

  • Member
  • *****
  • Posts: 1282
  • Assembly is a state of art.
    • RosAsm
Re: Benchmark with minimum overhead
« Reply #21 on: December 19, 2015, 07:56:05 AM »
Many tks. Ok, you got more stability with algo7. It is still high the variance, but i guess the results can help me find what is going on.  :t

The good news is that the differences between the Population and sample Standard Deviations are almost zero, meaning that i´m on the right path to collect the samples to be analyse :) :) :) The main problem here seems to be a fine tunning
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1282
  • Assembly is a state of art.
    • RosAsm
Re: Benchmark with minimum overhead
« Reply #22 on: December 19, 2015, 08:05:53 AM »
But...wait..there´s something weird.Why your CPU frequency have different values ? The value at the bottom retrieved from the string of CPUID says 1,70 Ghz, but te value collected by Dave´s function says 2,39 ?

I´ll ask dave to see what values are the correct ones,
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

mabdelouahab

  • Member
  • ***
  • Posts: 454
Re: Benchmark with minimum overhead
« Reply #23 on: December 19, 2015, 09:22:12 AM »
But...wait..there´s something weird.Why your CPU frequency have different values ? The value at the bottom retrieved from the string of CPUID says 1,70 Ghz, but te value collected by Dave´s function says 2,39 ?

I´ll ask dave to see what values are the correct ones,

Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz 2.40GHz  :redface:



FORTRANS

  • Member
  • *****
  • Posts: 1077
Re: Benchmark with minimum overhead
« Reply #24 on: December 19, 2015, 09:37:38 AM »
Hi,

   Windows 2000 results.  Ran in less than a minute.  Much, much
quicker than before.

Cheers,

Steve N.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7537
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Benchmark with minimum overhead
« Reply #25 on: December 19, 2015, 09:50:53 AM »
Hi Guga, this one works fine on my Win7 64 box.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

guga

  • Member
  • *****
  • Posts: 1282
  • Assembly is a state of art.
    • RosAsm
Re: Benchmark with minimum overhead
« Reply #26 on: December 19, 2015, 10:42:26 AM »
Hi Guys

many thanks. I´m working on it, trying to improve the accuracy. I´m glad it is working now on other OSes and processors
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 2326
Re: Benchmark with minimum overhead
« Reply #27 on: December 19, 2015, 04:18:11 PM »
Windows 8.1 64bit

Algo 5 & 7

Creative coders use backward thinking techniques as a strategy.

Grincheux

  • Member
  • ***
  • Posts: 330
  • Never be pleased, Always improve
    • Asm for fun
Re: Benchmark with minimum overhead
« Reply #28 on: December 19, 2015, 09:04:07 PM »
I try to answer when someone requires infos on test to do. Now I would like to know the goal of this program. Will it be useful for us ?
Kenavo (Bye)
----------------------
Asm for Fun
My Links
"La garde meurt mais ne rend pas"
Cambronne à Waterloo

guga

  • Member
  • *****
  • Posts: 1282
  • Assembly is a state of art.
    • RosAsm
Re: Benchmark with minimum overhead
« Reply #29 on: December 19, 2015, 11:24:54 PM »
Hi Grincheux

Tks. The info i´m looking for is how fast the different algorithms runs. The purpose of this is make a benchmark app that is more accurated. The current benchmarks apps used to measure the time do a raw aproximation of the total amount of time your code will take to run.  The one i´m developing tries to be more stable and accurate using different algorithms to measure time, trying to avoid all overheads that may interfere with the computation of the real time.

Concerning the usage: It can be used by developers measure the time X performance of their apps. For example, i´m building in my free time a plugin to be used in sony Vegas. The algorithm used to process the image needs to be fast, otherwise the plugin can take hours to finish. So, one way to know exactly how fast the plugin will  run, a better way to measure the timings are needed, and the best way to know how much time will take, is measuring the functions you are programming, so you can try to improve your app.

In image or video processing for example, the app you create neeeds to be fast and accurate, depending the technique you are using. If you are trying to create a color transfer algorithm, for example, the app demands heavy computation. And optimization of it may be needed or not. So, to achieve a more reliable level of accuracy of how much time the app will take, you need a better benchmark tool

And this is the purpose of the app i´m building: trying to make a better approach of the amount of time your code will take to run and how "stable" it will be. What i mean with stability is if your code may be easily influenced by other parts of the app or not.

To analyse the speed of the code you are benchmarking, you analyses the resultant Mean (Or the Minimum values, can also be used to analyse the timmings)
To Analyse the stability, you measure the values retrieved from The STD and variance after different tests to see how much they differs on each execution and also see if during the tests you found too much "overheads"

The speed of the code you are analysing (The mean) must be as fixed as possible (No matter which algo method you are using). Afterall, the amount of time the mnemonics you are using in your function takes to work are fixed by the processor, the variation is minimum. So, if your function takes 128 clock cycles to run, no matter how many times you test (benchmark) the result always must be this 128 clock cycles.

Once i succeed to finish the analysis of the different methods, i´ll try to choose the ones that can be better used to measure the timmings and create a alternate dll version for the general usage. So, you can benchmark your own apps using the dll. Unless you are coding with RosAsm, so, the executable version will be enough
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com