The MASM Forum

General => The Laboratory => Topic started by: 0000 on September 10, 2015, 03:49:57 AM

Title: Averaging the Count/Timings -- a test
Post by: 0000 on September 10, 2015, 03:49:57 AM
I have been experimenting with timers and counters lately. I was wondering
if anyone else had thought about using averaging to smooth out the variances
that occur with the current timer and counter macros.

I made this little piece, to test my theories. It runs for 128 repititions,
and also runs through 10 loops. I did this to test for consistency. There is
still some variance, but overall I think that even up to 2% variance is
acceptable. Unless of course one has the means to turn off every other process
in the computer. lol. Hey what about testing in safe-mode. But then again I
guess it is better to test under 'normal' operating conditions - whatever
that means.

The display where it says 'average cycles' should be taken as a relative value,
and by no means do I purport that they are actual cycle count values. I have
doubts about that myself - I have set the 'Sleep' value to 10, where I saw
fairly good consistency. When set at 100, on my machine I saw wide variations.
At zero, it cut the alledged counts in half. I knew THAT wasn't right,

Here are some results with the 'algo' as in the attachment (using GlobalAlloc to
allocate 2 1000 byte memory blocks, then GlobalFree to free them)
I ran 4 concurrent instances of the program.

Code: [Select]
2833 average cycles - 128 reps
2865 average cycles - 128 reps
2890 average cycles - 128 reps
2889 average cycles - 128 reps
2890 average cycles - 128 reps
2889 average cycles - 128 reps
2888 average cycles - 128 reps
2869 average cycles - 128 reps
2872 average cycles - 128 reps
2885 average cycles - 128 reps
Press any key to continue ...

2870 average cycles - 128 reps
2904 average cycles - 128 reps
2894 average cycles - 128 reps
2918 average cycles - 128 reps
2868 average cycles - 128 reps
2874 average cycles - 128 reps
2871 average cycles - 128 reps
2872 average cycles - 128 reps
2874 average cycles - 128 reps
2870 average cycles - 128 reps
Press any key to continue ...

2496 average cycles - 128 reps ; <--- one odd one (falls within 2%)
2853 average cycles - 128 reps
2876 average cycles - 128 reps
2892 average cycles - 128 reps
2900 average cycles - 128 reps
2866 average cycles - 128 reps
2862 average cycles - 128 reps
2854 average cycles - 128 reps
2862 average cycles - 128 reps
2856 average cycles - 128 reps
Press any key to continue ...

2900 average cycles - 128 reps
2503 average cycles - 128 reps ; <--- and another (also within 2%)
2872 average cycles - 128 reps
2889 average cycles - 128 reps
2856 average cycles - 128 reps
2895 average cycles - 128 reps
2871 average cycles - 128 reps
2872 average cycles - 128 reps
2881 average cycles - 128 reps
2874 average cycles - 128 reps
Press any key to continue ...


Here is my little gizmo...


 ctr128reps  (https://www.dropbox.com/s/ax5339l80elewds/ctr128reps.zip?dl=1)
Title: Re: Averaging the Count/Timings -- a test
Post by: jj2007 on September 10, 2015, 05:39:13 AM
Averaging to smooth out is not sufficient. You typically have 5-10% outliers, which should be eliminated rather than included in the average. One rule of thumb is: the fastest run is the correct one, because it is unlikely that the CPU runs an algo faster than average. Think about it 8)

See http://www.masmforum.com/board/index.php?topic=18737.msg158573#msg158573 for more detail.
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 10, 2015, 05:52:04 AM
Thanks jj.

I gotta lot to learn.  :redface:
Title: Re: Averaging the Count/Timings -- a test
Post by: dedndave on September 10, 2015, 06:19:24 AM
Jochen's right - of course
we've spent many hours playing with it - perhaps noone more than him   :P

every so often, the OS demands some CPU time for housekeeping, no matter how much you hog it
these little "bumps" cause us to lose a lot of hair, as we are trying to get a repeatable reading - lol

there are a couple tips that can help, though....

first, select a single core to run on - many CPU's have multiple cores
some of these cores are "logical" cores (Hyper-Threading Technology)
some are physical cores
at the beginning of a test program, i generally do something like this...
Code: [Select]
        INVOKE  GetCurrentProcess             ;returns hProcess in EAX
        INVOKE  SetProcessAffinityMask,eax,1  ;no matter how many cores, bit 0 is for the first one
        INVOKE  Sleep,700                     ;bind to single core and settle time

the other tip is to select a loop count that yields about 0.5 seconds in each test pass
it can be longer, but that seems to work well

now - do as Jochen mentioned, and take the lowest result of, say, 5 test passes   :t
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 10, 2015, 06:38:14 AM
Thanks dave. Yeah I think jochen is related to Agner Fog in some way.   :lol:

Any tips for using the timer macros, or is it basically the same approach?

I'm just now starting to experiment with both.

edit = typos
Title: Re: Averaging the Count/Timings -- a test
Post by: jj2007 on September 10, 2015, 06:44:10 AM
Any tips for using the timer macros, or is it basically the same approach?

The standard timer macros in the Lab were written by MichaelW, and work very fine in many cases. The Pentium 4 is a notable exception.

My own tests use often the NanoTimer() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1171) macro, which is reliable, precise and simple to use but yields milliseconds, not cycles. For benchmarking, that's ok, and it's closer to real life situations. Check yourself, test your own algos with various and see which one delivers the most consistent results.
Title: Re: Averaging the Count/Timings -- a test
Post by: dedndave on September 10, 2015, 06:54:52 AM
this is what i use as a "timing template"
adjust the loop count for 0.5 second passes and take the lower reading
Code: [Select]
;###############################################################################################

        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
        .686p
        .MMX
        .XMM
        INCLUDE    \Masm32\Macros\Timers.asm
        .List

;###############################################################################################

Loop_Count = 10000

;###############################################################################################

        .DATA

;***********************************************************************************************

        .DATA?

;###############################################################################################

        .CODE

;***********************************************************************************************

main    PROC

        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,750

        mov     ecx,5

Loop00: push    ecx

        counter_begin Loop_Count,HIGH_PRIORITY_CLASS

;code to time goes here

        counter_end

        print   str$(eax),32
        pop     ecx
        dec     ecx
        jnz     Loop00

        print   chr$(13,10)
        inkey
        INVOKE  ExitProcess,0

main    ENDP

;###############################################################################################

        END     main
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 10, 2015, 07:01:50 AM
adjust the loop count for 0.5 second passes...

I see. I was manipulating the 'Sleep' timer.

I have seen several different values for the 'Sleep' timer, does that really make a big difference? i.e., 1000 vs 750 vs 500?

Title: Re: Averaging the Count/Timings -- a test
Post by: dedndave on September 10, 2015, 07:24:31 AM
that only executes once, at initialization
i use 750, as much as i would like to reduce it to 500 - lol
it will help, if you run the test on a wide range of processors
on one specific machine, you might not notice much

so - it's 750ms to start
then, 500ms for the first result
the total time for 5 test passes and the init code = 3.25 seconds
that's the time i use to adjust loop count   :P

(3 to 4 seconds - close enough)
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 10, 2015, 08:12:36 AM
Cool, dave. I'll check it out.  :icon14:


I'd actually prefer my sleep timer to be at least 6 hours myself  :lol:
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 10, 2015, 04:16:24 PM
this is what i use as a "timing template"
adjust the loop count for 0.5 second passes and take the lower reading
Code: [Select]
        ~~~

        print   str$(eax),32
        pop     ecx
        dec     ecx
        jnz     Loop00

        print   chr$(13,10)
        ~~~

Shouldn't it be ...

Code: [Select]
        print   str$(eax),32
        print   chr$(13,10)
        pop     ecx
        dec     ecx
        jnz     Loop00

... to have the timings listed vertically?
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 10, 2015, 05:29:46 PM
I did some further testing and investigation.
dedndave gave me his counter template, and I ran a shoot-out test
between his template and mine. Here are the results...

dedndaves...

Code: [Select]
10  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template
15  - dedndaves counter template

And the exact template used in the comparison:

Code: [Select]
        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
    .686p
        .MMX
        .XMM
        INCLUDE    \Masm32\Macros\Timers.asm
        .List
        Loop_Count = 50000000 ; adjusted for ~~ .5 seconds on my machine
    .DATA
    .DATA?
    .CODE

    main    PROC
        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask, eax, 1
        INVOKE  Sleep, 750
        mov     ecx, 10
    Loop00: push    ecx
        counter_begin Loop_Count, HIGH_PRIORITY_CLASS
; ------------------------------------------------------------------------------
        xor ebx, ebx
        @@:
        inc ebx
        cmp ebx, 10
        jnz @b
        @@:
; ------------------------------------------------------------------------------
        counter_end
        print   str$(eax), 32, " - dedndaves counter template"
        print   chr$(13, 10)
        pop     ecx
        dec     ecx
        jnz     Loop00
        inkey
        INVOKE  ExitProcess, 0
    main    ENDP

    heavyload proc
        ret
    heavyload endp
 
        END     main


mine... I modified mine to repeat only 16 times.

Code: [Select]
17 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter
15 average cycles - 16 reps - zedds counter

And the exact template used for the comparison:

Code: [Select]

        include \masm32\include\masm32rt.inc
        .686
        include \masm32\macros\timers.asm
       
    .data
        ctx         dd 0
        loopctr     dd 0
    .code
    start:

        mov loopctr, 10 ; how many indidual test to be printed
       
        top:

        mov ctx, 0
        repeat  16      ; repititions in test
        invoke Sleep, 1
        counter_begin 2800000, HIGH_PRIORITY_CLASS ; value adjusted for stability
; ------------------------------------------------------------------------------
        xor ebx, ebx
        @@:
        inc ebx
        cmp ebx, 10
        jnz @b
        @@:
; ------------------------------------------------------------------------------
        counter_end
        add ctx, eax
        endm
        mov eax, ctx
        shr eax, 4
        print str$(eax), 20h, "average cycles - 16 reps - zedds counter", 10, 13
        dec loopctr
        cmp loopctr, 0
        jg top
        inkey
        exit

    end start


I still think that majority rules. Even in some of the other 'cycle counting' threads,
there are some anomalies on either side of the majority count value.

I would be interested if a few other members would run the same two tests as posted on their
machines and post their results. Would be appreciated.

I know I'm just a new guy, but I could be onto something.
Title: Re: Averaging the Count/Timings -- a test
Post by: rrr314159 on September 11, 2015, 02:20:17 AM
Quote from: zedd151
I still think that majority rules.

- Maybe, depends why you're timing. If you're comparing two algos the minimum is right, as jj2007 explained. But if you want to estimate how long it takes under real-world conditions - which includes the thread being interrupted by the scheduler, and whatever - then (I think) majority rules is right

Quote from: zedd151
I would be interested if a few other members would run the same two tests as posted on their machines and post their results. Would be appreciated.

- I'd like to help, but unfortunately ...

Quote from: zedd151
I know I'm just a new guy ...

- Forum rules are, we can't make test runs for new guys. (If you think that's unfair, complain to hutch - he makes the rules). As soon as you become an old guy, let me know
Title: A New Approach
Post by: 0000 on September 11, 2015, 04:25:10 PM

- Maybe, depends why you're timing. If you're comparing two algos the minimum is right, as jj2007 explained....

With advice from several members including rrr314159, Jochen, and Dave (dedndave), I have reconsidered the approach first presented in this thread.
If testing with real-world conditions in mind, then taking the average is ok. But when trying to achieve the minimum cycle count, or time, then the lowest
possible result is the correct way to go.

Now in light of all I have learned regarding timing, counting and MichaelW's counter/timer algorithms - I have created two brand new testbeds.

Each of them take 50 samples (yes they loop 50 times), and only the lowest value is stored and displayed when the program is finished running.

Why 50? you might ask - (originally I had it at 100) - since the accuracy increases with sample size I decided to go with a higher number than the
3, 4, 5 I have seen in some of the threads here in the lab. I have tested, tested and tested. I am quite pleased with the results, but would like
some input from other members. Attached are both testbeds. I used 'windows.inc' as yet another guinea pig for my testing. The algo in the testbeds
is a rather slow line counter. You can replace it with an algo of known timing/cycle count. This will help in determining the accuracy of not only my
testing methods, but perhaps others as well.

Thanks in advance, zedd151.


 zedds testbeds  (https://www.dropbox.com/s/f1hmhnban48tjdq/zedds_testbeds.zip?dl=1)
Title: Re: Averaging the Count/Timings -- a test
Post by: hutch-- on September 11, 2015, 04:43:26 PM
> - Forum rules are, we can't make test runs for new guys. (If you think that's unfair, complain to hutch - he makes the rules). As soon as you become an old guy, let me know

What is this crock of crap ?
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 11, 2015, 06:02:03 PM
quote from post #12 rrr314159:
Quote
> - Forum rules are, we can't make test runs for new guys.
What is this crock of crap ?

Thats alright hutch, I'm an old guy -  just turned 53, I just hide it better than rrr  8)

So, I didn't take offense.
Title: Re: Averaging the Count/Timings -- a test
Post by: jj2007 on September 11, 2015, 07:20:30 PM
I tested your code and got - 0 ms 8)

What is maybe a bit surprising is that the low values tend to be at the beginning for this kind of exercise:

include \masm32\MasmBasic\MasmBasic.inc      ; download (http://masm32.com/board/index.php?topic=94.0)
  Init
  mov ebx, 999999      ; set high value for getting the minimum
  xor ecx, ecx
  xor edi, edi
  .Repeat
      NanoTimer()
      Let esi=FileRead$("\Masm32\include\Windows.inc")
      mov ebx, Min(ebx, NanoTimer(us))
      add edi, eax      ; sum up the timings
      Print Str$("%i ", eax)
      inc ecx
  .Until ecx>200
  Inkey Str$("\nMinimum=%i µs", ebx), Str$(", average=%i µs", edi/ecx)
  Exit
EndOfCode


Output:
585 613 1444 1476 4341 1505 2202 766 763 675 1001 1376 1447 849 953 1559 1793 1671 1576 1590 1740 1964 1803 1122 1051 888 867 980 779 82
9 892 1665 890 848 1521 1547 1528 1567 987 996 890 946 920 919 883 918 598 673 1259 1605 1217 1182 1443 1526 1399 1186 663 537 623 600 5
33 1305 1207 2526 1327 1226 1261 1359 1403 1617 1198 1441 1615 1682 889 1608 837 1023 782 1438 1675 864 1592 1735 902 840 1456 1788 895
814 880 839 783 887 902 823 887 880 1563 1576 1530 1569 988 1011 1099 971 1001 918 991 911 992 7211 853 901 895 906 1009 949 956 923 914
 1815 984 999 881 1309 987 918 950 974 970 1490 724 973 771 639 1450 1507 989 905 996 1092 871 1938 1573 1498 812 821 887 802 1457 1521
899 2852 818 810 904 819 881 1883 1691 1864 1483 1489 1523 914 904 1251 893 1452 1656 969 999 955 854 1442 1457 1527 1626 817 894 1665 1
453 1845 994 919 987 928 1532 1577 899 843 1700 895 775 785 898 1190 816 816 885
Minimum=533 µs, average=1206 µs
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 11, 2015, 07:35:43 PM
Heres the results for my 'puter with your test.
Code: [Select]
2638 1823 2463 1865 2001 1988 2061 2516 2110 1977 1999 2069 1926 2004 2271 2260
2359 1982 2205 2127 2077 2430 1850 2675 1824 2432 1773 1792 2558 1763 2469 1791
1768 2466 1807 1772 1741 2452 1832 2193 2055 1960 1784 2469 1765 1771 2448 2163
2619 2145 1857 2033 1775 1750 1977 1773 1999 1751 1763 2007 1781 2338 1777 1766
2253 1752 1942 1743 2449 2083 1719 1735 2245 2236 2065 1958 2224 2179 2026 2239
1799 2289 2271 1778 1730 1983 1727 1765 1728 1994 1991 1719 1693 1818 2016 1864
1736 1964 1733 1757 1737 1774 1715 1942 1743 1726 1712 2010 1732 1704 1735 1752
1973 1737 1739 1992 1724 1742 1980 1732 2001 1740 1748 2221 1714 2130 1978 2196
2229 1760 1967 1737 2013 1986 1736 2247 1771 2039 2231 2059 2207 2210 2166 1986
2035 1974 1977 2035 1997 1818 1823 1848 1819 1837 1906 1816 1862 1479 1457 1508
1538 1490 1490 1673 1501 1497 1701 1771 1494 1660 1503 1426 1396 1607 1471 1454
1484 1490 1454 1492 1444 2772 1677 1297 1318 1550 1873 1778 1309 1966 1707 1788
1459 1618 1964 1818 1941 2475 2038 1673 2811
Minimum=1297 µs, average=1909 µs

Thats fine Jochen, thanks.   Did you also try the counter?

The timer I wasn't expecting anything grand from , I get

Code: [Select]
2319 ms lowest of 50 samples
 and my computer is slow 1.6 GHz.
figure a faster computer would show 1, 2, or even 0 like yours.

And for the counter I get
Code: [Select]
3700464 cycles lowest of 50 samples
oops, i posted the wrong numbers the first time = edited

with the test that was attached above
Title: Re: Averaging the Count/Timings -- a test
Post by: jj2007 on September 11, 2015, 07:41:16 PM
Did you also try the counter?

3x14, 3*52... ::)
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 11, 2015, 07:56:10 PM
::)


you don't have to roll your eyes, a no would have been sufficient.  :lol:

I only asked because the figures you gave were in ms, meaning they were from the timer

I edited the numbers in my post above, they were from a different test. oops!
Title: Re: Averaging the Count/Timings -- a test
Post by: jj2007 on September 11, 2015, 08:11:16 PM
you don't have to roll your eyes, a no would have been sufficient.  :lol:

The  ::) was not meant as a critique to you. The difference 14/52 just shows how tricky this business is...
Title: Re: Averaging the Count/Timings -- a test
Post by: 0000 on September 11, 2015, 08:14:16 PM
Okay  :lol:

Oh, your test appears to be only opening windows.inc...
Mine was opening, counting the lines then closing the file and freeing the memory.