Author Topic: New timing macros  (Read 3908 times)

HSE

  • Member
  • *****
  • Posts: 2465
  • AMD 7-32 / i3 10-64
Re: New timing macros
« Reply #15 on: May 18, 2022, 02:09:37 AM »
It's a lot more stable. What is stack?
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 13872
  • Assembly is fun ;-)
    • MasmBasic
Re: New timing macros
« Reply #16 on: May 18, 2022, 02:11:03 AM »
stack equ dword ptr [esp]
stack[4] = first argument
etc

HSE

  • Member
  • *****
  • Posts: 2465
  • AMD 7-32 / i3 10-64
Re: New timing macros
« Reply #17 on: May 18, 2022, 02:16:40 AM »
But must be edi-10
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 13872
  • Assembly is fun ;-)
    • MasmBasic
Re: New timing macros
« Reply #18 on: May 18, 2022, 02:19:01 AM »
Correct :thumbsup:

(wow, somebody reads and understands my code :rolleyes:)

HSE

  • Member
  • *****
  • Posts: 2465
  • AMD 7-32 / i3 10-64
Re: New timing macros
« Reply #19 on: May 18, 2022, 02:35:21 AM »
I can read my own code! (most of the time )
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 13872
  • Assembly is fun ;-)
    • MasmBasic
Re: New timing macros
« Reply #20 on: May 18, 2022, 06:57:51 AM »
Hi Hector,

Here is a console program using mostly CyCt* macros plus one of the MichaelW counters:

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
23       Cycles for some short instructions

19       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2483    kCycles for finding 'Duplicate' with InString
2551    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2508    kCycles for InString using MichaelW's macro


18       Cycles for fldpi+1*fdiv+fstp st
19       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2480    kCycles for finding 'Duplicate' with InString
2548    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2534    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2480    kCycles for finding 'Duplicate' with InString
2545    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2506    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2487    kCycles for finding 'Duplicate' with InString
2551    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2521    kCycles for InString using MichaelW's macro


19       Cycles for fldpi+1*fdiv+fstp st
18       Cycles for fldpi+1*fdiv+fstp st
186     kCycles for finding 'Duplicate' with Instr_
2483    kCycles for finding 'Duplicate' with InString
2549    kCycles for finding 'Duplicate' with crt strstr
710     kCycles for finding 'Duplicate' with Boyer-Moore
2510    kCycles for InString using MichaelW's macro

MichaelW, Instring: 2508 2534 2506 2521 2510 2534-2506=28
CyCt*, Instring: 2483 2480 2480 2487 2483 2487-2480=7

Run the attachment, and you will see that the new macros have another advantage :cool:
« Last Edit: May 18, 2022, 08:30:30 AM by jj2007 »

HSE

  • Member
  • *****
  • Posts: 2465
  • AMD 7-32 / i3 10-64
Re: New timing macros
« Reply #21 on: May 18, 2022, 08:42:04 AM »
 :biggrin:
Code: [Select]
Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz
42       Cycles for some short instructions

31       Cycles for fldpi+1*fdiv+fstp st
25       Cycles for fldpi+1*fdiv+fstp st
211     kCycles for finding 'Duplicate' with Instr_
2093    kCycles for finding 'Duplicate' with InString
1886    kCycles for finding 'Duplicate' with crt strstr
712     kCycles for finding 'Duplicate' with Boyer-Moore
2118    kCycles for InString using MichaelW's macro


34       Cycles for fldpi+1*fdiv+fstp st
26       Cycles for fldpi+1*fdiv+fstp st
209     kCycles for finding 'Duplicate' with Instr_
2064    kCycles for finding 'Duplicate' with InString
1880    kCycles for finding 'Duplicate' with crt strstr
696     kCycles for finding 'Duplicate' with Boyer-Moore
2114    kCycles for InString using MichaelW's macro


34       Cycles for fldpi+1*fdiv+fstp st
25       Cycles for fldpi+1*fdiv+fstp st
210     kCycles for finding 'Duplicate' with Instr_
2081    kCycles for finding 'Duplicate' with InString
1883    kCycles for finding 'Duplicate' with crt strstr
696     kCycles for finding 'Duplicate' with Boyer-Moore
2117    kCycles for InString using MichaelW's macro


37       Cycles for fldpi+1*fdiv+fstp st
24       Cycles for fldpi+1*fdiv+fstp st
210     kCycles for finding 'Duplicate' with Instr_
2088    kCycles for finding 'Duplicate' with InString
1926    kCycles for finding 'Duplicate' with crt strstr
713     kCycles for finding 'Duplicate' with Boyer-Moore
2119    kCycles for InString using MichaelW's macro


38       Cycles for fldpi+1*fdiv+fstp st
21       Cycles for fldpi+1*fdiv+fstp st
209     kCycles for finding 'Duplicate' with Instr_
2088    kCycles for finding 'Duplicate' with InString
1926    kCycles for finding 'Duplicate' with crt strstr
695     kCycles for finding 'Duplicate' with Boyer-Moore
2117    kCycles for InString using MichaelW's macro


MichaelW, Instring:    2119-2114= 5
CyCt*, Instring:       2093-2064=29
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 13872
  • Assembly is fun ;-)
    • MasmBasic
Re: New timing macros
« Reply #22 on: May 18, 2022, 08:47:28 AM »
YMMV :cool:

But in general, I get fairly consistent output, and at much, much greater speed. Which is an argument when you are trying to optimise an algo and it takes you many small changes.

jj2007

  • Member
  • *****
  • Posts: 13872
  • Assembly is fun ;-)
    • MasmBasic
Re: New timing macros
« Reply #23 on: May 19, 2022, 07:29:53 PM »
More refining done... it starts to look convincing :tongue:

Can I have some timings, please?

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+189     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+4       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+16      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString

+0       Cycles for PI*100
+16      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+13      Cycles for 10 * inc & dec eax
+13      Cycles for 10 * add eax,1 & sub eax,1
+13      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+187     Cycles for finding 'test' with InString

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10572
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: New timing macros
« Reply #24 on: May 19, 2022, 08:26:05 PM »
Could you civilize this a bit. Would not run where I downloaded it and had to be installed on my dev drive.

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+163     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
189     kCycles for finding 'Duplicate' in Window.inc
+145     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+145     Cycles for finding 'test' with InString

+1       Cycles for PI*100
+12      Cycles for PI*100/10
+2       Cycles for push & pop eax
+0       Cycles for empty loop
+14      Cycles for 10 * inc & dec eax
+14      Cycles for 10 * add eax,1 & sub eax,1
+14      Cycles for 10 * inc+dec with lea
190     kCycles for finding 'Duplicate' in Window.inc
+147     Cycles for finding 'test' with InString
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 13872
  • Assembly is fun ;-)
    • MasmBasic
Re: New timing macros
« Reply #25 on: May 19, 2022, 08:45:24 PM »
Here is the civilised version, with an Inkey at the end. The other one runs fine from a DOS prompt, of course  :biggrin:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10572
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: New timing macros
« Reply #26 on: May 19, 2022, 09:13:15 PM »
This onje at least stops on the inkey but to run it, it must be copied to my dev drive. Would be a lot easier to run if I could just copy windows.inc to where it gets unzipped. I test on a ramdrive, not my dev drive.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 13872
  • Assembly is fun ;-)
    • MasmBasic
Re: New timing macros
« Reply #27 on: May 19, 2022, 09:17:34 PM »
if I could just copy windows.inc to where it gets unzipped

Now I understand the problem :biggrin:

I know one guy who is very proud about all paths in include files being hardcoded to the root as in \thatfolder\whatever :mrgreen:

Code: [Select]
Let esi=FileRead$("\Masm32\include\Windows.inc")
Attached a version with embedded Windows.inc, and some more timings. Don't complain that it's a fat attachment :cool:

Code: [Select]
  ; Let esi=FileRead$("\Masm32\include\Windows.inc")
  Let esi=FileRead$(99)   ; resource ID 99

FORTRANS

  • Member
  • *****
  • Posts: 1230
Re: New timing macros
« Reply #28 on: May 20, 2022, 05:01:29 AM »
Hi,

   Two laptops.

Code: [Select]
Intel(R) Pentium(R) M processor 1.70GHz
+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+458 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+467 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+423 Cycles for finding 'test' with InString

+3 Cycles for PI*100
+31 Cycles for PI*100/10
+-1 Cycles for push & pop eax
+0 Cycles for empty loop
+16 Cycles for 10 * inc & dec eax
+16 Cycles for 10 * add eax,1 & sub eax,1
+14 Cycles for 10 * inc+dec with lea
+33 Cycles for finding 'Duplicate' in Window.inc
+422 Cycles for finding 'test' with InString

more (y)?

+32 Cycles for fldpi+1*fdiv+fstp st
+3581 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3581 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

+32 Cycles for fldpi+1*fdiv+fstp st
+3582 Cycles for finding 'test' with Instr_
+33 Cycles for finding 'Duplicate' with Instr_
3497 kCycles for finding 'Duplicate' with InString
3315 kCycles for finding 'Duplicate' with crt strstr
1190 kCycles for finding 'Duplicate' with Boyer-Moore

hit any key - bye


Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz
+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+175 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+174 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+174 Cycles for finding 'test' with InString

+1 Cycles for PI*100
+15 Cycles for PI*100/10
+3 Cycles for push & pop eax
+0 Cycles for empty loop
+17 Cycles for 10 * inc & dec eax
+17 Cycles for 10 * add eax,1 & sub eax,1
+17 Cycles for 10 * inc+dec with lea
227 kCycles for finding 'Duplicate' in Window.inc
+175 Cycles for finding 'test' with InString

more (y)?

+33 Cycles for fldpi+1*fdiv+fstp st
+3522 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2418 kCycles for finding 'Duplicate' with InString
2265 kCycles for finding 'Duplicate' with crt strstr
826 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2418 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3108 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

+15 Cycles for fldpi+1*fdiv+fstp st
+3109 Cycles for finding 'test' with Instr_
227 kCycles for finding 'Duplicate' with Instr_
2419 kCycles for finding 'Duplicate' with InString
2264 kCycles for finding 'Duplicate' with crt strstr
825 kCycles for finding 'Duplicate' with Boyer-Moore

hit any key - bye

Cheers,

Steve N.

jj2007

  • Member
  • *****
  • Posts: 13872
  • Assembly is fun ;-)
    • MasmBasic
Re: New timing macros
« Reply #29 on: May 20, 2022, 05:24:06 AM »
Thanks, Steve, now it looks reasonably stable :thumbsup:

A typical usage example:

Code: [Select]
CyCtStart
fldpi
fmul FP4(100.0)
fistp testDD
CyCtEnd PI*100
CyCtStart
fldpi
fmul FP4(100.0)
fdiv FP4(10.0)
fistp testDD
CyCtEnd PI*100/10
CyCtStart
push eax
pop eax
CyCtEnd push & pop eax