News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Invoke, call, jump. Simple benchmark

Started by LordAdef, June 24, 2025, 06:57:25 AM

Previous topic - Next topic

daydreamer

Quote from: NoCforMe on June 25, 2025, 08:54:24 AM
Quote from: daydreamer on June 24, 2025, 11:14:15 PM
Quote from: NoCforMe on June 24, 2025, 09:00:25 PMSo it's better not to align a proc???
I thought it should be like that with Align vs unaligned  proc start,because innerloop some opcodes later ends up aligned

Excellent point; hadn't thought of that.

It's not necessarily the proc's entry point that you want aligned:
it's whatever instruction that marks the start of a time-sensitive part of the proc, a loop or whatever.
So the alignment should probably be done inside the proc, not outside.
Might wanna start thread with comparing Align 64 with a innerloop vs unaligned loop, where cpu constantly need to reload cache lines
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: jj2007 on June 25, 2025, 01:14:58 AM
Quote from: daydreamer on June 25, 2025, 12:54:02 AMmysterious error,alignedinvoke.exe wont work
Just run it through Olly and tell me where it chokes. Btw no MasmBasic there, it's purest Masm32 :cool:

Daydreamer, I am still waiting...

LordAdef

Quote from: NoCforMe on June 25, 2025, 08:54:24 AM
Quote from: daydreamer on June 24, 2025, 11:14:15 PM
Quote from: NoCforMe on June 24, 2025, 09:00:25 PMSo it's better not to align a proc???
I thought it should be like that with Align vs unaligned  proc start,because innerloop some opcodes later ends up aligned

Excellent point; hadn't thought of that.

It's not necessarily the proc's entry point that you want aligned:
it's whatever instruction that marks the start of a time-sensitive part of the proc, a loop or whatever.
So the alignment should probably be done inside the proc, not outside.

Yep, interesting how this apparently obvious benchmark brings interesting results.

my _jmp macro aligns the jump caller and the returning one.

The _jmp is align 4, because it was my best "average" results. Aligning 16 got me better results, but not always

The other thing is how different it behaves when we move the target address within the code, or any changes we make in the prog. It seems there's not much to be done without either benchmarking the code all the time or to check the binary.

While writing the test prog, I was moving routines around and sometimes I got "invoke" faster than "call"

quick edit: I assumed NEAR jumps should give me faster results. In my machine it didn't make any difference. But... it should

NoCforMe

Quote from: LordAdef on June 27, 2025, 05:04:23 AMWhile writing the test prog, I was moving routines around and sometimes I got "invoke" faster than "call"

You don't seem to understand what invoke actually does.
It's a macro which behaves differently depending on whether the thing being invoked has parameters or not:
  • If the subroutine has parameters (and if there is a PROTOtype defined for that routine), invoke pushes the parameters from the "outside in", then does a CALL.
  • If the subroutine has no parameters, then invoke simply does a CALL.

So in the latter case there's no difference between invoke and CALL.

invoke is not a processor opcode.
32-bit code and Windows 7 foreva!

TimoVJL

CPU cache suffers long jumps too ?
Also call flush cache ?
May the source be with you

jj2007

Quote from: NoCforMe on June 27, 2025, 06:06:57 AMYou don't seem to understand what invoke actually does.

I'm sure he does.

LordAdef


LordAdef

Quote from: TimoVJL on June 27, 2025, 07:11:52 PMCPU cache suffers long jumps too ?
Also call flush cache ?
good questions

NoCforMe

Quote from: jj2007 on June 27, 2025, 10:42:23 PM
Quote from: NoCforMe on June 27, 2025, 06:06:57 AMYou don't seem to understand what invoke actually does.

I'm sure he does.

OK, but then why would he suppose there's any difference between INVOKE and CALL?

It's not as if the macro does any alignment or anything like that.
32-bit code and Windows 7 foreva!

LordAdef

Quote from: NoCforMe on June 28, 2025, 05:05:18 AM
Quote from: jj2007 on June 27, 2025, 10:42:23 PM
Quote from: NoCforMe on June 27, 2025, 06:06:57 AMYou don't seem to understand what invoke actually does.

I'm sure he does.

OK, but then why would he suppose there's any difference between INVOKE and CALL?

It's not as if the macro does any alignment or anything like that.

Because it doesn't matter.
Because I quickly tested pushing arguments, but left without it.
Because, if you read my first post, a new comer may have the misconception that call is always faster than invoke.
Because, you may try to use the test example and compare, if you like, an aligned invoke passing one arg, and try to beat an unaligned call. Whatever.

The point of this thread is alignment

NoCforMe

Quote from: LordAdef on June 28, 2025, 08:13:25 AMBecause, if you read my first post, a new comer may have the misconception that call is always faster than invoke.

I read your first post again; it says nothing at all about call vs. invoke.
However, it's a very good point to get across.
32-bit code and Windows 7 foreva!

daydreamer

#26
Today with old 32bit arguments vs 64 bit fastcall, put data in registers would be more interesting
32bit fastcall transfer data in registers vs 32 bit invoke pushing data would be most fair to test
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: daydreamer on July 02, 2025, 06:58:12 PM32bit fastcall transfer data in registers vs 32 bit invoke pushing data would be most fair to test

Yep, you can save a cycle :thumbsup:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

400    cycles for 100 * proc aligned 16
400    cycles for 100 * proc aligned 16+3
417    cycles for 100 * aligned push+pop
273    cycles for 100 * aligned reg32

405    cycles for 100 * proc aligned 16
409    cycles for 100 * proc aligned 16+3
426    cycles for 100 * aligned push+pop
276    cycles for 100 * aligned reg32

409    cycles for 100 * proc aligned 16
402    cycles for 100 * proc aligned 16+3
422    cycles for 100 * aligned push+pop
290    cycles for 100 * aligned reg32

403    cycles for 100 * proc aligned 16
406    cycles for 100 * proc aligned 16+3
426    cycles for 100 * aligned push+pop
278    cycles for 100 * aligned reg32

406    cycles for 100 * proc aligned 16
416    cycles for 100 * proc aligned 16+3
421    cycles for 100 * aligned push+pop
281    cycles for 100 * aligned reg32

15      bytes for proc aligned 16
19      bytes for proc aligned 16+3
24      bytes for aligned push+pop
20      bytes for aligned reg32

NoCforMe

#28
Quote from: daydreamer on July 02, 2025, 06:58:12 PM32bit fastcall transfer data in registers vs 32 bit invoke pushing data would be most fair to test

I transfer arguments to (and from) subroutines in registers all the time in my own code.
No need to follow someone else's ABI when it's your own personal code that you can do what the hell what you like with.

Of course, I do follow the part of the Win32 ABI that requires you to respect the "sacred" registers (EBX, ESI, EDI).
32-bit code and Windows 7 foreva!

daydreamer

@NoCforMe
best with transferring thru registers in your own code,if you prefer using fpu regs or xmm regs for your real4/real8 variables as coding style to your own PROC's




Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (SSE4)

322     cycles for 100 * proc aligned 16
267     cycles for 100 * proc aligned 16+3
392     cycles for 100 * aligned push+pop
392     cycles for 100 * aligned reg32

310     cycles for 100 * proc aligned 16
266     cycles for 100 * proc aligned 16+3
397     cycles for 100 * aligned push+pop
394     cycles for 100 * aligned reg32

308     cycles for 100 * proc aligned 16
269     cycles for 100 * proc aligned 16+3
408     cycles for 100 * aligned push+pop
392     cycles for 100 * aligned reg32

314     cycles for 100 * proc aligned 16
263     cycles for 100 * proc aligned 16+3
404     cycles for 100 * aligned push+pop
399     cycles for 100 * aligned reg32

308     cycles for 100 * proc aligned 16
267     cycles for 100 * proc aligned 16+3
395     cycles for 100 * aligned push+pop
391     cycles for 100 * aligned reg32

15      bytes for proc aligned 16
19      bytes for proc aligned 16+3
24      bytes for aligned push+pop
20      bytes for aligned reg32


-
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding