News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Invoke, call, jump. Simple benchmark

Started by LordAdef, June 24, 2025, 06:57:25 AM

Previous topic - Next topic

NoCforMe

Quote from: daydreamer on July 03, 2025, 05:11:13 PM@NoCforMe
best with transferring thru registers in your own code,if you prefer using fpu regs or xmm regs for your real4/real8 variables as coding style to your own PROC's

I restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.
32-bit code and Windows 7 foreva!

jj2007

Quote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.

Why so restrictive?

    mov eax, 31416        ; you can mix xmm registers with FPU and ordinary
    movd xmm0, eax        ; registers and directly print the result
    fldpi                 ; load 3.14159 onto the FPU
    mov ecx, 123          ; \n is CrLf, \t is tab in Str$()
    Print Str$("\nresult=\t%f", xmm0/ST(0)*ecx)     ; output: [newline] result=    1230003.0

TimoVJL

Vintage AMD
AMD Athlon(tm) II X2 220 Processor (SSE3)

505     cycles for 100 * proc aligned 16
402     cycles for 100 * proc aligned 16+3
502     cycles for 100 * aligned push+pop
403     cycles for 100 * aligned reg32

502     cycles for 100 * proc aligned 16
403     cycles for 100 * proc aligned 16+3
502     cycles for 100 * aligned push+pop
403     cycles for 100 * aligned reg32

503     cycles for 100 * proc aligned 16
402     cycles for 100 * proc aligned 16+3
502     cycles for 100 * aligned push+pop
403     cycles for 100 * aligned reg32

502     cycles for 100 * proc aligned 16
402     cycles for 100 * proc aligned 16+3
502     cycles for 100 * aligned push+pop
408     cycles for 100 * aligned reg32

502     cycles for 100 * proc aligned 16
402     cycles for 100 * proc aligned 16+3
503     cycles for 100 * aligned push+pop
403     cycles for 100 * aligned reg32

15      bytes for proc aligned 16
19      bytes for proc aligned 16+3
24      bytes for aligned push+pop
20      bytes for aligned reg32
May the source be with you

zedd

Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (SSE4)

344    cycles for 100 * proc aligned 16
256    cycles for 100 * proc aligned 16+3
391    cycles for 100 * aligned push+pop
387    cycles for 100 * aligned reg32

345    cycles for 100 * proc aligned 16
261    cycles for 100 * proc aligned 16+3
392    cycles for 100 * aligned push+pop
380    cycles for 100 * aligned reg32

345    cycles for 100 * proc aligned 16
265    cycles for 100 * proc aligned 16+3
403    cycles for 100 * aligned push+pop
381    cycles for 100 * aligned reg32

341    cycles for 100 * proc aligned 16
260    cycles for 100 * proc aligned 16+3
382    cycles for 100 * aligned push+pop
381    cycles for 100 * aligned reg32

382    cycles for 100 * proc aligned 16
260    cycles for 100 * proc aligned 16+3
374    cycles for 100 * aligned push+pop
389    cycles for 100 * aligned reg32

15      bytes for proc aligned 16
19      bytes for proc aligned 16+3
24      bytes for aligned push+pop
20      bytes for aligned reg32


--- ok ---
:biggrin:

zedd

From the laptop
Intel(R) Celeron(R) N5105 @ 2.00GHz (SSE4)

549     cycles for 100 * proc aligned 16
484     cycles for 100 * proc aligned 16+3
550     cycles for 100 * aligned push+pop
482     cycles for 100 * aligned reg32

551     cycles for 100 * proc aligned 16
484     cycles for 100 * proc aligned 16+3
551     cycles for 100 * aligned push+pop
482     cycles for 100 * aligned reg32

550     cycles for 100 * proc aligned 16
485     cycles for 100 * proc aligned 16+3
552     cycles for 100 * aligned push+pop
482     cycles for 100 * aligned reg32

551     cycles for 100 * proc aligned 16
493     cycles for 100 * proc aligned 16+3
562     cycles for 100 * aligned push+pop
493     cycles for 100 * aligned reg32

564     cycles for 100 * proc aligned 16
496     cycles for 100 * proc aligned 16+3
561     cycles for 100 * aligned push+pop
485     cycles for 100 * aligned reg32

15      bytes for proc aligned 16
19      bytes for proc aligned 16+3
24      bytes for aligned push+pop
20      bytes for aligned reg32


--- ok ---
:biggrin:

NoCforMe

Quote from: jj2007 on July 03, 2025, 08:11:59 PM
Quote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.

Why so restrictive?

I hardly ever use the FPU in my programs, and have never messed around w/XMM. Most of my code is in integer-land.

Nothing wrong with using either one of those register sets to pass parameters, of course.
32-bit code and Windows 7 foreva!

daydreamer

Quote from: NoCforMe on July 04, 2025, 03:12:16 AM
Quote from: jj2007 on July 03, 2025, 08:11:59 PM
Quote from: NoCforMe on July 03, 2025, 05:19:16 PMI restrict my use of registers for passing parameters to the regular general-purpose ones (EAX/EBX/ECX/EDX), not FPU or XMM registers.

Why so restrictive?

I hardly ever use the FPU in my programs, and have never messed around w/XMM. Most of my code is in integer-land.

Nothing wrong with using either one of those register sets to pass parameters, of course.
still performance wise,calling a proc that performs any integer array math or float math on an array of integers or floats,using ecx to pass lenght of array(s) and one or more other registers as pointing as start of arrays and maybe return fail value in EAX back from PROC ,for example if detects divide by zero or other things that it failed to complete  math operation on array(s)

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

TimoVJL

And hopefully people learn something about CPUs
Those just don't usually work similar ways.
An Intel CPU 7 family is very interesting.
An AMD Ryzen that i have is just too similar, what Jochen have.
Earlier Jochen have an intel i5 and later got similar laptop with same CPU.


May the source be with you