News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

New timing macros

Started by jj2007, May 16, 2022, 09:42:21 AM

Previous topic - Next topic

HSE

It's a lot more stable. What is stack?
Equations in Assembly: SmplMath

jj2007

stack equ dword ptr [esp]
stack[4] = first argument
etc

HSE

Equations in Assembly: SmplMath

jj2007

Correct :thumbsup:

(wow, somebody reads and understands my code :rolleyes:)

HSE

I can read my own code! (most of the time )
Equations in Assembly: SmplMath

jj2007

#20
Hi Hector,

Here is a console program using mostly CyCt* macros plus one of the MichaelW counters:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
23       Cycles for some short instructions

19       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2483    kCycles for finding 'Duplicate' with InString
2551    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2508    kCycles for InString using MichaelW's macro


18       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2480    kCycles for finding 'Duplicate' with InString
2548    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2534    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2480    kCycles for finding 'Duplicate' with InString
2545    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2506    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2487    kCycles for finding 'Duplicate' with InString
2551    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2521    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2483    kCycles for finding 'Duplicate' with InString
2549    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2510    kCycles for InString using MichaelW's macro

MichaelW, Instring: 2508 2534 2506 2521 2510 2534-2506=28
CyCt*, Instring: 2483 2480 2480 2487 2483 2487-2480=7


Run the attachment, and you will see that the new macros have another advantage :cool:

HSE

 :biggrin:

Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz
42       Cycles for some short instructions

31       Cycles for fldpi+1*fdiv+fstp st
25       Cycles for fldpi+1*fdiv+fstp st
211     kCycles for finding 'Duplicate' with Instr_
2093    kCycles for finding 'Duplicate' with InString
1886    kCycles for finding 'Duplicate' with crt strstr
712     kCycles for finding 'Duplicate' with Boyer-Moore
2118    kCycles for InString using MichaelW's macro


34       Cycles for fldpi+1*fdiv+fstp st
26       Cycles for fldpi+1*fdiv+fstp st
209     kCycles for finding 'Duplicate' with Instr_
2064    kCycles for finding 'Duplicate' with InString
1880    kCycles for finding 'Duplicate' with crt strstr
696     kCycles for finding 'Duplicate' with Boyer-Moore
2114    kCycles for InString using MichaelW's macro


34       Cycles for fldpi+1*fdiv+fstp st
25       Cycles for fldpi+1*fdiv+fstp st
210     kCycles for finding 'Duplicate' with Instr_
2081    kCycles for finding 'Duplicate' with InString
1883    kCycles for finding 'Duplicate' with crt strstr
696     kCycles for finding 'Duplicate' with Boyer-Moore
2117    kCycles for InString using MichaelW's macro


37       Cycles for fldpi+1*fdiv+fstp st
24       Cycles for fldpi+1*fdiv+fstp st
210     kCycles for finding 'Duplicate' with Instr_
2088    kCycles for finding 'Duplicate' with InString
1926    kCycles for finding 'Duplicate' with crt strstr
713     kCycles for finding 'Duplicate' with Boyer-Moore
2119    kCycles for InString using MichaelW's macro


38       Cycles for fldpi+1*fdiv+fstp st
21       Cycles for fldpi+1*fdiv+fstp st
209     kCycles for finding 'Duplicate' with Instr_
2088    kCycles for finding 'Duplicate' with InString
1926    kCycles for finding 'Duplicate' with crt strstr
695     kCycles for finding 'Duplicate' with Boyer-Moore
2117    kCycles for InString using MichaelW's macro


MichaelW, Instring:    2119-2114= 5
CyCt*, Instring:       2093-2064=29
Equations in Assembly: SmplMath

jj2007

YMMV :cool:

But in general, I get fairly consistent output, and at much, much greater speed. Which is an argument when you are trying to optimise an algo and it takes you many small changes.

jj2007

More refining done... it starts to look convincing :tongue:

Can I have some timings, please?

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+189     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+4       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+16      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString

hutch--

Could you civilize this a bit. Would not run where I downloaded it and had to be installed on my dev drive.

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+163     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
189     kCycles for finding 'Duplicate' in Window.inc
+145     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+145     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+147     Cycles for finding 'test' with InString

jj2007

Here is the civilised version, with an Inkey at the end. The other one runs fine from a DOS prompt, of course  :biggrin:

hutch--

This onje at least stops on the inkey but to run it, it must be copied to my dev drive. Would be a lot easier to run if I could just copy windows.inc to where it gets unzipped. I test on a ramdrive, not my dev drive.

jj2007

Quote from: hutch-- on May 19, 2022, 09:13:15 PMif I could just copy windows.inc to where it gets unzipped

Now I understand the problem :biggrin:

I know one guy who is very proud about all paths in include files being hardcoded to the root as in \thatfolder\whatever :mrgreen:

Let esi=FileRead$("\Masm32\include\Windows.inc")

Attached a version with embedded Windows.inc, and some more timings. Don't complain that it's a fat attachment :cool:

  ; Let esi=FileRead$("\Masm32\include\Windows.inc")
  Let esi=FileRead$(99)   ; resource ID 99

FORTRANS

Hi,

   Two laptops.

Intel(R) Pentium(R) M processor 1.70GHz
+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+458 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+467 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+423 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+422 Cycles for finding 'test' with InString

more (y)?

+32 Cycles for fldpi+1*fdiv+fstp st
+3581 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3581 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

hit any key - bye


Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz
+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+175 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+174 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+174 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+175 Cycles for finding 'test' with InString

more (y)?

+33 Cycles for fldpi+1*fdiv+fstp st
+3522 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2418 kCycles for finding 'Duplicate' with InString
2265 kCycles for finding 'Duplicate' with crt strstr
826 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2418 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3109 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

hit any key - bye


Cheers,

Steve N.

jj2007

Thanks, Steve, now it looks reasonably stable :thumbsup:

A typical usage example:

CyCtStart
fldpi
fmul FP4(100.0)
fistp testDD
CyCtEnd PI*100
CyCtStart
fldpi
fmul FP4(100.0)
fdiv FP4(10.0)
fistp testDD
CyCtEnd PI*100/10
CyCtStart
push eax
pop eax
CyCtEnd push & pop eax