News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Michael Webster's code timing macros problem

Started by RuiLoureiro, March 18, 2013, 11:29:07 PM

Previous topic - Next topic

RuiLoureiro

Hi
In the laboratory, i found timers.zip amd counter2.zip

Using the file «c2test2.asm» i got this:

Why (all) NEGATIVE values ? What's wrong ?

HIGH_PRIORITY_CLASS
-392 cycles, empty
-153 cycles, mov eax,1
-144 cycles, mov eax,1 mov eax,2
-188 cycles, nops 4
-179 cycles, mul ecx
-145 cycles, rol ecx,32
-433 cycles, rcr ecx,31
-85 cycles, div ecx
-67 cycles, StrLen

REALTIME_PRIORITY_CLASS
-748 cycles, empty
-153 cycles, mov eax,1
-204 cycles, mov eax,1 mov eax,2
-94 cycles, nops 4
-102 cycles, mul ecx
-94 cycles, rol ecx,32
-76 cycles, rcr ecx,31
-17 cycles, div ecx
-67 cycles, StrLen

Press any key to exit...

dedndave

we can only guess, without seeing the code   :P

from what i understand, there are 2 primary differences between the timing macro sets

1) timers.asm macros do not preserve EBX
counters.asm macros do preserve EBX
(EBX gets trashed by CPUID, which is used to serialize instruction execution)

2) timers.asm macros use DIV to calculate results
counters.asm macros use the FPU to make the calculation
while the FPU is better for the calculation (of course),
it may cause problems if you want to time code that relies on values being left in the FPU

negative values are sometimes the result of trying to time very brief code snippets
however, i am not used to seeing them that large
there are ways to fix this - mainly, repeat the code inside the test and divide the result by the iterations

as i have mentioned in the past, it helps if you bind to a single core to make the tests
and, it helps if each pass of the test loop takes at least 0.5 seconds (adjust loop count)
also, most of us use HIGH_PRIORITY_CLASS as the priority level

MichaelW

Hi Rui,

The counter_begin/counter_end macros in timers.asm effectively average the cycle count over loopcount loops, and the result is corrected for the loop overhead. This method works well over a broad range of processors, but for meaningful results you must avoid timing code that executes in any small number of clock cycles, and you must insert a reasonable delay after the app starts and before you enter the  first timing loop. I usually delay for about 5 seconds, but other people here seem to have good esults with much shorter delays. For repeatable cycle counts, code that runs somewhere around the range 100-200 cycles seems to work well. For code that executes in fewer cycles you can add a secondary loop inside the macro loop to get the cycle count up to a workable level. Looping your tests multiple times and including an empty loop, which in theory should return zero cycles, makes it easy to verify repeatability and spot problems with an inadequate delay or too-low priority (which will likely cause the cycles for the empty loop to increase).

The counter2 macros return the lowest count that occurs in any loop, working on the assumption that the lowest count is the correct count. This method worked well for processors other than the P4, but very badly for the P4. Because of this, and because even running on a non-P4 the counts returned by the two macro sets are not comparable, I essentially abandoned the counter2 macros.


Well Microsoft, here's another nice mess you've gotten us into.

RuiLoureiro

 :biggrin:
Hi dave,
            Thank you !  ;)
           
Hi MichaelW,
            Thank you for your answer !  ;)
            Ok now i understood

I am writing some code to work with arrays or matrices
Here are some results i got for 2 dimensions XY,
3 XYZ and 4 XYZW (here each element is 10 bytes).
One day i will post all the code.

Thank you all  :t

-----------------------  XY ---------------------------
       
***** cycles HIGH_PRIORITY_CLASS by MichaelW *****
1060 cycles, CopyArray10XY1
611 cycles, Array10XYCopy1
940 cycles, CopyValue10XY1
1457 cycles, CopyMatrix10XY1
657 cycles, Matrix10XYCopy1
***** cycles REALTIME_PRIORITY_CLASS by MichaelW *****
1005 cycles, CopyArray10XY1
607 cycles, Array10XYCopy1
955 cycles, CopyValue10XY1
1445 cycles, CopyMatrix10XY1
661 cycles, Matrix10XYCopy1

***** cycles 2 HIGH_PRIORITY_CLASS by Jochen *****
971     cycles, CopyArray10XY1
584     cycles, Array10XYCopy1
918     cycles, CopyValue10XY1
1423    cycles, CopyMatrix10XY1
635     cycles, Matrix10XYCopy1
***** cycles 2 REALTIME_PRIORITY_CLASS by Jochen *****
970     cycles, CopyArray10XY1
585     cycles, Array10XYCopy1
918     cycles, CopyValue10XY1
1416    cycles, CopyMatrix10XY1
625     cycles, Matrix10XYCopy1

***** time HIGH_PRIORITY_CLASS ***** by MichaelW
297 milliseconds, CopyArray10XY1
179 milliseconds, Array10XYCopy1
281 milliseconds, CopyValue10XY1
428 milliseconds, CopyMatrix10XY1
199 milliseconds, Matrix10XYCopy1
***** time REALTIME_PRIORITY_CLASS by MichaelW *****
295 milliseconds, CopyArray10XY1
183 milliseconds, Array10XYCopy1
283 milliseconds, CopyValue10XY1
467 milliseconds, CopyMatrix10XY1
264 milliseconds, Matrix10XYCopy1
Press any key to continue ...

-----------------------  XYZ ---------------------------

***** cycles HIGH_PRIORITY_CLASS by MichaelW *****
2251 cycles, CopyArray10XYZ1
1470 cycles, Array10XYZCopy1
2002 cycles, CopyValue10XYZ1
2168 cycles, CopyMatrix10XYZ1
1516 cycles, Matrix10XYZCopy1
***** cycles REALTIME_PRIORITY_CLASS by MichaelW *****
2215 cycles, CopyArray10XYZ1
1484 cycles, Array10XYZCopy1
2008 cycles, CopyValue10XYZ1
2868 cycles, CopyMatrix10XYZ1
2227 cycles, Matrix10XYZCopy1
***** cycles 2 HIGH_PRIORITY_CLASS by Jochen *****
2120    cycles, CopyArray10XYZ1
1422    cycles, Array10XYZCopy1
1941    cycles, CopyValue10XYZ1
2095    cycles, CopyMatrix10XYZ1
1471    cycles, Matrix10XYZCopy1
***** cycles 2 REALTIME_PRIORITY_CLASS by Jochen *****
2107    cycles, CopyArray10XYZ1
1413    cycles, Array10XYZCopy1
1931    cycles, CopyValue10XYZ1
2083    cycles, CopyMatrix10XYZ1
1473    cycles, Matrix10XYZCopy1
***** time HIGH_PRIORITY_CLASS by MichaelW *****
639 milliseconds, CopyArray10XYZ1
427 milliseconds, Array10XYZCopy1
590 milliseconds, CopyValue10XYZ1
649 milliseconds, CopyMatrix10XYZ1
446 milliseconds, Matrix10XYZCopy1
***** time REALTIME_PRIORITY_CLASS by MichaelW *****
639 milliseconds, CopyArray10XYZ1
430 milliseconds, Array10XYZCopy1
589 milliseconds, CopyValue10XYZ1
652 milliseconds, CopyMatrix10XYZ1
439 milliseconds, Matrix10XYZCopy1
Press any key to continue ...

-----------------------  XYZW ---------------------------

***** cycles HIGH_PRIORITY_CLASS by MichaelW *****
5337 cycles, CopyArray10XYZW1
3658 cycles, Array10XYZWCopy1
3578 cycles, CopyValue10XYZW1
5361 cycles, CopyMatrix10XYZW1
3588 cycles, Matrix10XYZWCopy1
***** cycles REALTIME_PRIORITY_CLASS by MichaelW *****
5246 cycles, CopyArray10XYZW1
3595 cycles, Array10XYZWCopy1
3488 cycles, CopyValue10XYZW1
5320 cycles, CopyMatrix10XYZW1
3538 cycles, Matrix10XYZWCopy1
***** cycles 2 HIGH_PRIORITY_CLASS by Jochen *****
5094    cycles, CopyArray10XYZW1
3545    cycles, Array10XYZWCopy1
3386    cycles, CopyValue10XYZW1
5205    cycles, CopyMatrix10XYZW1
3421    cycles, Matrix10XYZWCopy1
***** cycles 2 REALTIME_PRIORITY_CLASS by Jochen *****
5102    cycles, CopyArray10XYZW1
3552    cycles, Array10XYZWCopy1
3383    cycles, CopyValue10XYZW1
5145    cycles, CopyMatrix10XYZW1
3421    cycles, Matrix10XYZWCopy1
***** time HIGH_PRIORITY_CLASS by MichaelW *****
1547 milliseconds, CopyArray10XYZW1
1062 milliseconds, Array10XYZWCopy1
1042 milliseconds, CopyValue10XYZW1
2118 milliseconds, CopyMatrix10XYZW1
1252 milliseconds, Matrix10XYZWCopy1
***** time REALTIME_PRIORITY_CLASS by MichaelW *****
1538 milliseconds, CopyArray10XYZW1
1056 milliseconds, Array10XYZWCopy1
1021 milliseconds, CopyValue10XYZW1
1609 milliseconds, CopyMatrix10XYZW1
1028 milliseconds, Matrix10XYZWCopy1
Press any key to continue ...