News:

Masm32 SDK description, downloads and other helpful links

Main Menu

Averaging the Count/Timings -- a test

Started by zedd151, September 10, 2015, 03:49:57 AM

Previous topic - Next topic

zedd151

I have been experimenting with timers and counters lately. I was wondering
if anyone else had thought about using averaging to smooth out the variances
that occur with the current timer and counter macros.

I made this little piece, to test my theories. It runs for 128 repititions,
and also runs through 10 loops. I did this to test for consistency. There is
still some variance, but overall I think that even up to 2% variance is
acceptable. Unless of course one has the means to turn off every other process
in the computer. lol. Hey what about testing in safe-mode. But then again I
guess it is better to test under 'normal' operating conditions - whatever
that means.

The display where it says 'average cycles' should be taken as a relative value,
and by no means do I purport that they are actual cycle count values. I have
doubts about that myself - I have set the 'Sleep' value to 10, where I saw
fairly good consistency. When set at 100, on my machine I saw wide variations.
At zero, it cut the alledged counts in half. I knew THAT wasn't right,

Here are some results with the 'algo' as in the attachment (using GlobalAlloc to
allocate 2 1000 byte memory blocks, then GlobalFree to free them)
I ran 4 concurrent instances of the program.


2833 average cycles - 128 reps
2865 average cycles - 128 reps
2890 average cycles - 128 reps
2889 average cycles - 128 reps
2890 average cycles - 128 reps
2889 average cycles - 128 reps
2888 average cycles - 128 reps
2869 average cycles - 128 reps
2872 average cycles - 128 reps
2885 average cycles - 128 reps
Press any key to continue ...

2870 average cycles - 128 reps
2904 average cycles - 128 reps
2894 average cycles - 128 reps
2918 average cycles - 128 reps
2868 average cycles - 128 reps
2874 average cycles - 128 reps
2871 average cycles - 128 reps
2872 average cycles - 128 reps
2874 average cycles - 128 reps
2870 average cycles - 128 reps
Press any key to continue ...

2496 average cycles - 128 reps ; <--- one odd one (falls within 2%)
2853 average cycles - 128 reps
2876 average cycles - 128 reps
2892 average cycles - 128 reps
2900 average cycles - 128 reps
2866 average cycles - 128 reps
2862 average cycles - 128 reps
2854 average cycles - 128 reps
2862 average cycles - 128 reps
2856 average cycles - 128 reps
Press any key to continue ...

2900 average cycles - 128 reps
2503 average cycles - 128 reps ; <--- and another (also within 2%)
2872 average cycles - 128 reps
2889 average cycles - 128 reps
2856 average cycles - 128 reps
2895 average cycles - 128 reps
2871 average cycles - 128 reps
2872 average cycles - 128 reps
2881 average cycles - 128 reps
2874 average cycles - 128 reps
Press any key to continue ...



Here is my little gizmo...


ctr128reps
Regards, zedd...

jj2007

Averaging to smooth out is not sufficient. You typically have 5-10% outliers, which should be eliminated rather than included in the average. One rule of thumb is: the fastest run is the correct one, because it is unlikely that the CPU runs an algo faster than average. Think about it 8)

See http://www.masmforum.com/board/index.php?topic=18737.msg158573#msg158573 for more detail.

zedd151

Thanks jj.

I gotta lot to learn.  :redface:
Regards, zedd...

dedndave

Jochen's right - of course
we've spent many hours playing with it - perhaps noone more than him   :P

every so often, the OS demands some CPU time for housekeeping, no matter how much you hog it
these little "bumps" cause us to lose a lot of hair, as we are trying to get a repeatable reading - lol

there are a couple tips that can help, though....

first, select a single core to run on - many CPU's have multiple cores
some of these cores are "logical" cores (Hyper-Threading Technology)
some are physical cores
at the beginning of a test program, i generally do something like this...
        INVOKE  GetCurrentProcess             ;returns hProcess in EAX
        INVOKE  SetProcessAffinityMask,eax,1  ;no matter how many cores, bit 0 is for the first one
        INVOKE  Sleep,700                     ;bind to single core and settle time


the other tip is to select a loop count that yields about 0.5 seconds in each test pass
it can be longer, but that seems to work well

now - do as Jochen mentioned, and take the lowest result of, say, 5 test passes   :t

zedd151

Thanks dave. Yeah I think jochen is related to Agner Fog in some way.   :lol:

Any tips for using the timer macros, or is it basically the same approach?

I'm just now starting to experiment with both.

edit = typos
Regards, zedd...

jj2007

Quote from: zedd151 on September 10, 2015, 06:38:14 AMAny tips for using the timer macros, or is it basically the same approach?

The standard timer macros in the Lab were written by MichaelW, and work very fine in many cases. The Pentium 4 is a notable exception.

My own tests use often the NanoTimer() macro, which is reliable, precise and simple to use but yields milliseconds, not cycles. For benchmarking, that's ok, and it's closer to real life situations. Check yourself, test your own algos with various and see which one delivers the most consistent results.

dedndave

this is what i use as a "timing template"
adjust the loop count for 0.5 second passes and take the lower reading
;###############################################################################################

        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
        .686p
        .MMX
        .XMM
        INCLUDE    \Masm32\Macros\Timers.asm
        .List

;###############################################################################################

Loop_Count = 10000

;###############################################################################################

        .DATA

;***********************************************************************************************

        .DATA?

;###############################################################################################

        .CODE

;***********************************************************************************************

main    PROC

        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,750

        mov     ecx,5

Loop00: push    ecx

        counter_begin Loop_Count,HIGH_PRIORITY_CLASS

;code to time goes here

        counter_end

        print   str$(eax),32
        pop     ecx
        dec     ecx
        jnz     Loop00

        print   chr$(13,10)
        inkey
        INVOKE  ExitProcess,0

main    ENDP

;###############################################################################################

        END     main

zedd151

Quote from: dedndave on September 10, 2015, 06:54:52 AM
adjust the loop count for 0.5 second passes...

I see. I was manipulating the 'Sleep' timer.

I have seen several different values for the 'Sleep' timer, does that really make a big difference? i.e., 1000 vs 750 vs 500?

Regards, zedd...

dedndave

that only executes once, at initialization
i use 750, as much as i would like to reduce it to 500 - lol
it will help, if you run the test on a wide range of processors
on one specific machine, you might not notice much

so - it's 750ms to start
then, 500ms for the first result
the total time for 5 test passes and the init code = 3.25 seconds
that's the time i use to adjust loop count   :P

(3 to 4 seconds - close enough)

zedd151

Cool, dave. I'll check it out.  :icon14:


I'd actually prefer my sleep timer to be at least 6 hours myself  :lol:
Regards, zedd...

zedd151

Quote from: dedndave on September 10, 2015, 06:54:52 AM
this is what i use as a "timing template"
adjust the loop count for 0.5 second passes and take the lower reading

        ~~~

        print   str$(eax),32
        pop     ecx
        dec     ecx
        jnz     Loop00

        print   chr$(13,10)
        ~~~


Shouldn't it be ...


        print   str$(eax),32
        print   chr$(13,10)
        pop     ecx
        dec     ecx
        jnz     Loop00


... to have the timings listed vertically?
Regards, zedd...

zedd151

I did some further testing and investigation.
dedndave gave me his counter template, and I ran a shoot-out test
between his template and mine. Here are the results...

dedndaves...


10  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template


And the exact template used in the comparison:


        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
    .686p
        .MMX
        .XMM
        INCLUDE    \Masm32\Macros\Timers.asm
        .List
        Loop_Count = 50000000 ; adjusted for ~~ .5 seconds on my machine
    .DATA
    .DATA?
    .CODE

    main    PROC
        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask, eax, 1
        INVOKE  Sleep, 750
        mov     ecx, 10
    Loop00: push    ecx
        counter_begin Loop_Count, HIGH_PRIORITY_CLASS
; ------------------------------------------------------------------------------
        xor ebx, ebx
        @@:
        inc ebx
        cmp ebx, 10
        jnz @b
        @@:
; ------------------------------------------------------------------------------
        counter_end
        print   str$(eax), 32, " - dedndaves counter template"
        print   chr$(13, 10)
        pop     ecx
        dec     ecx
        jnz     Loop00
        inkey
        INVOKE  ExitProcess, 0
    main    ENDP

    heavyload proc
        ret
    heavyload endp

        END     main



mine... I modified mine to repeat only 16 times.


17 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter


And the exact template used for the comparison:



        include \masm32\include\masm32rt.inc
        .686
        include \masm32\macros\timers.asm
       
    .data
        ctx         dd 0
        loopctr     dd 0
    .code
    start:

        mov loopctr, 10 ; how many indidual test to be printed
       
        top:

        mov ctx, 0
        repeat  16      ; repititions in test
        invoke Sleep, 1
        counter_begin 2800000, HIGH_PRIORITY_CLASS ; value adjusted for stability
; ------------------------------------------------------------------------------
        xor ebx, ebx
        @@:
        inc ebx
        cmp ebx, 10
        jnz @b
        @@:
; ------------------------------------------------------------------------------
        counter_end
        add ctx, eax
        endm
        mov eax, ctx
        shr eax, 4
        print str$(eax), 20h, "average cycles - 16 reps - zedds counter", 10, 13
        dec loopctr
        cmp loopctr, 0
        jg top
        inkey
        exit

    end start



I still think that majority rules. Even in some of the other 'cycle counting' threads,
there are some anomalies on either side of the majority count value.

I would be interested if a few other members would run the same two tests as posted on their
machines and post their results. Would be appreciated.

I know I'm just a new guy, but I could be onto something.
Regards, zedd...

rrr314159

Quote from: zedd151I still think that majority rules.

- Maybe, depends why you're timing. If you're comparing two algos the minimum is right, as jj2007 explained. But if you want to estimate how long it takes under real-world conditions - which includes the thread being interrupted by the scheduler, and whatever - then (I think) majority rules is right

Quote from: zedd151I would be interested if a few other members would run the same two tests as posted on their machines and post their results. Would be appreciated.

- I'd like to help, but unfortunately ...

Quote from: zedd151I know I'm just a new guy ...

- Forum rules are, we can't make test runs for new guys. (If you think that's unfair, complain to hutch - he makes the rules). As soon as you become an old guy, let me know
I am NaN ;)

zedd151

#13
Quote from: rrr314159 on September 11, 2015, 02:20:17 AM

- Maybe, depends why you're timing. If you're comparing two algos the minimum is right, as jj2007 explained....

With advice from several members including rrr314159, Jochen, and Dave (dedndave), I have reconsidered the approach first presented in this thread.
If testing with real-world conditions in mind, then taking the average is ok. But when trying to achieve the minimum cycle count, or time, then the lowest
possible result is the correct way to go.

Now in light of all I have learned regarding timing, counting and MichaelW's counter/timer algorithms - I have created two brand new testbeds.

Each of them take 50 samples (yes they loop 50 times), and only the lowest value is stored and displayed when the program is finished running.

Why 50? you might ask - (originally I had it at 100) - since the accuracy increases with sample size I decided to go with a higher number than the
3, 4, 5 I have seen in some of the threads here in the lab. I have tested, tested and tested. I am quite pleased with the results, but would like
some input from other members. Attached are both testbeds. I used 'windows.inc' as yet another guinea pig for my testing. The algo in the testbeds
is a rather slow line counter. You can replace it with an algo of known timing/cycle count. This will help in determining the accuracy of not only my
testing methods, but perhaps others as well.

Thanks in advance, zedd151.

Regards, zedd...

hutch--

> - Forum rules are, we can't make test runs for new guys. (If you think that's unfair, complain to hutch - he makes the rules). As soon as you become an old guy, let me know

What is this crock of crap ?
http://www.masm32.com    :biggrin:  :skrewy: