The MASM Forum

General => The Laboratory => Topic started by: jj2007 on December 09, 2015, 05:43:42 AM

Title: Passing args on the stack: what is fastest?
Post by: jj2007 on December 09, 2015, 05:43:42 AM
Testing various ways to pass one arg on the stack, and to preserve regs:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
467     cycles for 100 * pop retadd, pop arg, push retadd
483     cycles for 100 * pop retadd, pop arg, jmp retadd
565     cycles for 100 * mov eax, arg/ret
873     cycles for 100 * push esi edi ebx ecx
2183    cycles for 100 * pushad

466     cycles for 100 * pop retadd, pop arg, push retadd
484     cycles for 100 * pop retadd, pop arg, jmp retadd
566     cycles for 100 * mov eax, arg/ret
874     cycles for 100 * push esi edi ebx ecx
2178    cycles for 100 * pushad
Title: Re: Passing args on the stack: what is fastest?
Post by: Siekmanski on December 09, 2015, 07:10:15 AM
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

463     cycles for 100 * pop retadd, pop arg, push retadd
478     cycles for 100 * pop retadd, pop arg, jmp retadd
518     cycles for 100 * mov eax, arg/ret
777     cycles for 100 * push esi edi ebx ecx
2188    cycles for 100 * pushad

464     cycles for 100 * pop retadd, pop arg, push retadd
480     cycles for 100 * pop retadd, pop arg, jmp retadd
552     cycles for 100 * mov eax, arg/ret
776     cycles for 100 * push esi edi ebx ecx
2185    cycles for 100 * pushad

464     cycles for 100 * pop retadd, pop arg, push retadd
478     cycles for 100 * pop retadd, pop arg, jmp retadd
536     cycles for 100 * mov eax, arg/ret
776     cycles for 100 * push esi edi ebx ecx
2186    cycles for 100 * pushad

463     cycles for 100 * pop retadd, pop arg, push retadd
479     cycles for 100 * pop retadd, pop arg, jmp retadd
555     cycles for 100 * mov eax, arg/ret
776     cycles for 100 * push esi edi ebx ecx
2185    cycles for 100 * pushad

464     cycles for 100 * pop retadd, pop arg, push retadd
479     cycles for 100 * pop retadd, pop arg, jmp retadd
548     cycles for 100 * mov eax, arg/ret
777     cycles for 100 * push esi edi ebx ecx
2185    cycles for 100 * pushad

11      bytes for pop retadd, pop arg, push retadd
11      bytes for pop retadd, pop arg, jmp retadd
15      bytes for mov eax, arg/ret
31      bytes for push esi edi ebx ecx
27      bytes for pushad

Title: Re: Passing args on the stack: what is fastest?
Post by: Grincheux on December 09, 2015, 07:38:32 AM
Quote
AMD Athlon(tm) II X2 250 Processor (SSE3)

631   cycles for 100 * pop retadd, pop arg, push retadd
426   cycles for 100 * pop retadd, pop arg, jmp retadd
433   cycles for 100 * mov eax, arg/ret
970   cycles for 100 * push esi edi ebx ecx
1489   cycles for 100 * pushad

703   cycles for 100 * pop retadd, pop arg, push retadd
426   cycles for 100 * pop retadd, pop arg, jmp retadd
428   cycles for 100 * mov eax, arg/ret
976   cycles for 100 * push esi edi ebx ecx
1476   cycles for 100 * pushad

668   cycles for 100 * pop retadd, pop arg, push retadd
433   cycles for 100 * pop retadd, pop arg, jmp retadd
426   cycles for 100 * mov eax, arg/ret
971   cycles for 100 * push esi edi ebx ecx
1486   cycles for 100 * pushad

699   cycles for 100 * pop retadd, pop arg, push retadd
425   cycles for 100 * pop retadd, pop arg, jmp retadd
425   cycles for 100 * mov eax, arg/ret
967   cycles for 100 * push esi edi ebx ecx
1487   cycles for 100 * pushad

659   cycles for 100 * pop retadd, pop arg, push retadd
434   cycles for 100 * pop retadd, pop arg, jmp retadd
426   cycles for 100 * mov eax, arg/ret
971   cycles for 100 * push esi edi ebx ecx
1518   cycles for 100 * pushad

11   bytes for pop retadd, pop arg, push retadd
11   bytes for pop retadd, pop arg, jmp retadd
15   bytes for mov eax, arg/ret
31   bytes for push esi edi ebx ecx
27   bytes for pushad


--- ok ---
Title: Re: Passing args on the stack: what is fastest?
Post by: Grincheux on December 09, 2015, 07:39:49 AM
Quote426   cycles for 100 * pop retadd, pop arg, jmp retadd

My Athlon is the fastest!
Title: Re: Passing args on the stack: what is fastest?
Post by: Grincheux on December 09, 2015, 07:56:06 AM
Quote
   Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)   Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)   AMD Athlon(tm) II X2 250 Processor (SSE3)
cycles for 100 * pop retadd, pop arg, push retadd   467   463   631
cycles for 100 * pop retadd, pop arg, jmp retadd   483   478   426
cycles for 100 * mov eax, arg/ret   565   518   433
cycles for 100 * push esi edi ebx ecx   873   777   970
cycles for 100 * pushad   2183   2188   1489
         
cycles for 100 * pop retadd, pop arg, push retadd   466   464   703
cycles for 100 * pop retadd, pop arg, jmp retadd   484   480   426
cycles for 100 * mov eax, arg/ret   566   552   428
cycles for 100 * push esi edi ebx ecx   874   776   976
cycles for 100 * pushad   2178   2185   1476
         
cycles for 100 * pop retadd, pop arg, push retadd      464   668
cycles for 100 * pop retadd, pop arg, jmp retadd      478   433
cycles for 100 * mov eax, arg/ret      536   426
cycles for 100 * push esi edi ebx ecx      776   971
cycles for 100 * pushad      2186   1486
         
cycles for 100 * pop retadd, pop arg, push retadd      463   699
cycles for 100 * pop retadd, pop arg, jmp retadd      479   425
cycles for 100 * mov eax, arg/ret      555   425
cycles for 100 * push esi edi ebx ecx      776   967
cycles for 100 * pushad      2185   1487
         
cycles for 100 * pop retadd, pop arg, push retadd      464   659
cycles for 100 * pop retadd, pop arg, jmp retadd      479   434
cycles for 100 * mov eax, arg/ret      548   426
cycles for 100 * push esi edi ebx ecx      777   971
cycles for 100 * pushad      2185   1518
         
11      bytes for pop retadd, pop arg, push retadd      11   11
11      bytes for pop retadd, pop arg, jmp retadd      11   11
15      bytes for mov eax, arg/ret      15   15
31      bytes for push esi edi ebx ecx      31   31
27      bytes for pushad       27   27

Title: Re: Passing args on the stack: what is fastest?
Post by: jj2007 on December 09, 2015, 01:53:14 PM
Quote from: Grincheux on December 09, 2015, 07:39:49 AM
My Athlon is the fastest!

It seems so :t

However, Intel is faster for
pop edx  ; ret addr
pop eax  ; arg
push edx ; ret addr
Title: Re: Passing args on the stack: what is fastest?
Post by: TWell on December 09, 2015, 08:11:54 PM
Older AMDAMD Athlon(tm) II X2 220 Processor (SSE3) 2.8 GHz

643     cycles for 100 * pop retadd, pop arg, push retadd
432     cycles for 100 * pop retadd, pop arg, jmp retadd
429     cycles for 100 * mov eax, arg/ret
969     cycles for 100 * push esi edi ebx ecx
1478    cycles for 100 * pushad

633     cycles for 100 * pop retadd, pop arg, push retadd
426     cycles for 100 * pop retadd, pop arg, jmp retadd
428     cycles for 100 * mov eax, arg/ret
969     cycles for 100 * push esi edi ebx ecx
1477    cycles for 100 * pushad

653     cycles for 100 * pop retadd, pop arg, push retadd
428     cycles for 100 * pop retadd, pop arg, jmp retadd
434     cycles for 100 * mov eax, arg/ret
978     cycles for 100 * push esi edi ebx ecx
1500    cycles for 100 * pushad

632     cycles for 100 * pop retadd, pop arg, push retadd
426     cycles for 100 * pop retadd, pop arg, jmp retadd
429     cycles for 100 * mov eax, arg/ret
968     cycles for 100 * push esi edi ebx ecx
1493    cycles for 100 * pushad

632     cycles for 100 * pop retadd, pop arg, push retadd
427     cycles for 100 * pop retadd, pop arg, jmp retadd
428     cycles for 100 * mov eax, arg/ret
975     cycles for 100 * push esi edi ebx ecx
1497    cycles for 100 * pushad

11      bytes for pop retadd, pop arg, push retadd
11      bytes for pop retadd, pop arg, jmp retadd
15      bytes for mov eax, arg/ret
31      bytes for push esi edi ebx ecx
27      bytes for pushad
Title: Re: Passing args on the stack: what is fastest?
Post by: dedndave on December 09, 2015, 10:00:21 PM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

1071    cycles for 100 * pop retadd, pop arg, push retadd
699     cycles for 100 * pop retadd, pop arg, jmp retadd
876     cycles for 100 * mov eax, arg/ret
1575    cycles for 100 * push esi edi ebx ecx
3614    cycles for 100 * pushad

1083    cycles for 100 * pop retadd, pop arg, push retadd
699     cycles for 100 * pop retadd, pop arg, jmp retadd
878     cycles for 100 * mov eax, arg/ret
1539    cycles for 100 * push esi edi ebx ecx
3578    cycles for 100 * pushad

1062    cycles for 100 * pop retadd, pop arg, push retadd
701     cycles for 100 * pop retadd, pop arg, jmp retadd
919     cycles for 100 * mov eax, arg/ret
1559    cycles for 100 * push esi edi ebx ecx
3593    cycles for 100 * pushad

1059    cycles for 100 * pop retadd, pop arg, push retadd
741     cycles for 100 * pop retadd, pop arg, jmp retadd
865     cycles for 100 * mov eax, arg/ret
1547    cycles for 100 * push esi edi ebx ecx
3515    cycles for 100 * pushad

1082    cycles for 100 * pop retadd, pop arg, push retadd
701     cycles for 100 * pop retadd, pop arg, jmp retadd
866     cycles for 100 * mov eax, arg/ret
1715    cycles for 100 * push esi edi ebx ecx
3526    cycles for 100 * pushad
Title: Re: Passing args on the stack: what is fastest?
Post by: jj2007 on December 09, 2015, 11:26:16 PM
Thanks :icon14:

So it seems
pop edx  ; ret addr
pop eax  ; arg
push edx ; ret addr

is good on Core ix but not so good on anything else. The Lingo-style jmp edx is not really an option, as you can rarely preserve edx until the final ret.
Title: Re: Passing args on the stack: what is fastest?
Post by: Grincheux on December 30, 2015, 04:52:17 PM
When we have 3 ou 4 parameters or more is it quicker to pass a structure?
Title: Re: Passing args on the stack: what is fastest?
Post by: TouEnMasm on December 30, 2015, 05:34:21 PM

Intel(R) Core(TM) i3-4150 CPU @ 3.50GHz (SSE4)

424     cycles for 100 * pop retadd, pop arg, push retadd
451     cycles for 100 * pop retadd, pop arg, jmp retadd
520     cycles for 100 * mov eax, arg/ret
804     cycles for 100 * push esi edi ebx ecx
2234    cycles for 100 * pushad

424     cycles for 100 * pop retadd, pop arg, push retadd
442     cycles for 100 * pop retadd, pop arg, jmp retadd
512     cycles for 100 * mov eax, arg/ret
797     cycles for 100 * push esi edi ebx ecx
2198    cycles for 100 * pushad

422     cycles for 100 * pop retadd, pop arg, push retadd
442     cycles for 100 * pop retadd, pop arg, jmp retadd
526     cycles for 100 * mov eax, arg/ret
791     cycles for 100 * push esi edi ebx ecx
2184    cycles for 100 * pushad

420     cycles for 100 * pop retadd, pop arg, push retadd
439     cycles for 100 * pop retadd, pop arg, jmp retadd
519     cycles for 100 * mov eax, arg/ret
792     cycles for 100 * push esi edi ebx ecx
2188    cycles for 100 * pushad

422     cycles for 100 * pop retadd, pop arg, push retadd
439     cycles for 100 * pop retadd, pop arg, jmp retadd
520     cycles for 100 * mov eax, arg/ret
794     cycles for 100 * push esi edi ebx ecx
2185    cycles for 100 * pushad

11      bytes for pop retadd, pop arg, push retadd
11      bytes for pop retadd, pop arg, jmp retadd
15      bytes for mov eax, arg/ret
31      bytes for push esi edi ebx ecx
27      bytes for pushad
Title: Re: Passing args on the stack: what is fastest?
Post by: hutch-- on December 30, 2015, 08:02:15 PM
With up to 3 arguments, register passing usually is a lot faster as it has no stack overhead at all. Basically its a roll your own version of fastcall.
Title: Re: Passing args on the stack: what is fastest?
Post by: ragdog on December 30, 2015, 10:22:56 PM

AMD Athlon(tm) II P360 Dual-Core Processor (SSE3)

496     cycles for 100 * pop retadd, pop arg, push retadd
424     cycles for 100 * pop retadd, pop arg, jmp retadd
425     cycles for 100 * mov eax, arg/ret
733     cycles for 100 * push esi edi ebx ecx
1231    cycles for 100 * pushad

470     cycles for 100 * pop retadd, pop arg, push retadd
426     cycles for 100 * pop retadd, pop arg, jmp retadd
428     cycles for 100 * mov eax, arg/ret
733     cycles for 100 * push esi edi ebx ecx
1231    cycles for 100 * pushad

642     cycles for 100 * pop retadd, pop arg, push retadd
424     cycles for 100 * pop retadd, pop arg, jmp retadd
425     cycles for 100 * mov eax, arg/ret
733     cycles for 100 * push esi edi ebx ecx
1236    cycles for 100 * pushad

475     cycles for 100 * pop retadd, pop arg, push retadd
425     cycles for 100 * pop retadd, pop arg, jmp retadd
425     cycles for 100 * mov eax, arg/ret
733     cycles for 100 * push esi edi ebx ecx
1231    cycles for 100 * pushad

471     cycles for 100 * pop retadd, pop arg, push retadd
426     cycles for 100 * pop retadd, pop arg, jmp retadd
428     cycles for 100 * mov eax, arg/ret
738     cycles for 100 * push esi edi ebx ecx
1231    cycles for 100 * pushad

11      bytes for pop retadd, pop arg, push retadd
11      bytes for pop retadd, pop arg, jmp retadd
15      bytes for mov eax, arg/ret
31      bytes for push esi edi ebx ecx
27      bytes for pushad


--- ok ---
Title: Re: Passing args on the stack: what is fastest?
Post by: jj2007 on December 31, 2015, 03:25:39 AM
Quote from: hutch-- on December 30, 2015, 08:02:15 PM
With up to 3 arguments, register passing usually is a lot faster as it has no stack overhead at all. Basically its a roll your own version of fastcall.

But you must move your args into the regs, unless they are already there. In practice, there is not much difference, see last two entries below.

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
470     cycles for 100 * pop retadd, pop arg, push retadd
490     cycles for 100 * pop retadd, pop arg, jmp retadd
569     cycles for 100 * mov eax, arg/ret
879     cycles for 100 * push esi edi ebx ecx
2199    cycles for 100 * pushad
473     cycles for 100 * popretadd, 2 args
470     cycles for 100 * 2 args via reg

470     cycles for 100 * pop retadd, pop arg, push retadd
486     cycles for 100 * pop retadd, pop arg, jmp retadd
526     cycles for 100 * mov eax, arg/ret
875     cycles for 100 * push esi edi ebx ecx
2189    cycles for 100 * pushad
472     cycles for 100 * popretadd, 2 args
472     cycles for 100 * 2 args via reg
Title: Re: Passing args on the stack: what is fastest?
Post by: Siekmanski on December 31, 2015, 07:00:27 AM
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

464     cycles for 100 * pop retadd, pop arg, push retadd
477     cycles for 100 * pop retadd, pop arg, jmp retadd
555     cycles for 100 * mov eax, arg/ret
778     cycles for 100 * push esi edi ebx ecx
2186    cycles for 100 * pushad
465     cycles for 100 * popretadd, 2 args
465     cycles for 100 * 2 args via reg

464     cycles for 100 * pop retadd, pop arg, push retadd
478     cycles for 100 * pop retadd, pop arg, jmp retadd
544     cycles for 100 * mov eax, arg/ret
778     cycles for 100 * push esi edi ebx ecx
2185    cycles for 100 * pushad
465     cycles for 100 * popretadd, 2 args
466     cycles for 100 * 2 args via reg

464     cycles for 100 * pop retadd, pop arg, push retadd
478     cycles for 100 * pop retadd, pop arg, jmp retadd
541     cycles for 100 * mov eax, arg/ret
777     cycles for 100 * push esi edi ebx ecx
2183    cycles for 100 * pushad
463     cycles for 100 * popretadd, 2 args
464     cycles for 100 * 2 args via reg

464     cycles for 100 * pop retadd, pop arg, push retadd
478     cycles for 100 * pop retadd, pop arg, jmp retadd
541     cycles for 100 * mov eax, arg/ret
778     cycles for 100 * push esi edi ebx ecx
2184    cycles for 100 * pushad
465     cycles for 100 * popretadd, 2 args
464     cycles for 100 * 2 args via reg

464     cycles for 100 * pop retadd, pop arg, push retadd
479     cycles for 100 * pop retadd, pop arg, jmp retadd
551     cycles for 100 * mov eax, arg/ret
778     cycles for 100 * push esi edi ebx ecx
2185    cycles for 100 * pushad
464     cycles for 100 * popretadd, 2 args
465     cycles for 100 * 2 args via reg

11      bytes for pop retadd, pop arg, push retadd
11      bytes for pop retadd, pop arg, jmp retadd
15      bytes for mov eax, arg/ret
31      bytes for push esi edi ebx ecx
27      bytes for pushad
13      bytes for popretadd, 2 args
15      bytes for 2 args via reg
Title: Re: Passing args on the stack: what is fastest?
Post by: FORTRANS on December 31, 2015, 09:43:32 AM
pre-P4 (SSE1)

608 cycles for 100 * pop retadd, pop arg, push retadd
517 cycles for 100 * pop retadd, pop arg, jmp retadd
711 cycles for 100 * mov eax, arg/ret
1519 cycles for 100 * push esi edi ebx ecx
2150 cycles for 100 * pushad
809 cycles for 100 * popretadd, 2 args
504 cycles for 100 * 2 args via reg

610 cycles for 100 * pop retadd, pop arg, push retadd
518 cycles for 100 * pop retadd, pop arg, jmp retadd
711 cycles for 100 * mov eax, arg/ret
1519 cycles for 100 * push esi edi ebx ecx
2145 cycles for 100 * pushad
810 cycles for 100 * popretadd, 2 args
504 cycles for 100 * 2 args via reg

613 cycles for 100 * pop retadd, pop arg, push retadd
517 cycles for 100 * pop retadd, pop arg, jmp retadd
711 cycles for 100 * mov eax, arg/ret
1521 cycles for 100 * push esi edi ebx ecx
2146 cycles for 100 * pushad
811 cycles for 100 * popretadd, 2 args
504 cycles for 100 * 2 args via reg

608 cycles for 100 * pop retadd, pop arg, push retadd
517 cycles for 100 * pop retadd, pop arg, jmp retadd
711 cycles for 100 * mov eax, arg/ret
1538 cycles for 100 * push esi edi ebx ecx
2148 cycles for 100 * pushad
814 cycles for 100 * popretadd, 2 args
504 cycles for 100 * 2 args via reg

608 cycles for 100 * pop retadd, pop arg, push retadd
530 cycles for 100 * pop retadd, pop arg, jmp retadd
711 cycles for 100 * mov eax, arg/ret
1518 cycles for 100 * push esi edi ebx ecx
2148 cycles for 100 * pushad
810 cycles for 100 * popretadd, 2 args
516 cycles for 100 * 2 args via reg

11 bytes for pop retadd, pop arg, push retadd
11 bytes for pop retadd, pop arg, jmp retadd
15 bytes for mov eax, arg/ret
31 bytes for push esi edi ebx ecx
27 bytes for pushad
13 bytes for popretadd, 2 args
15 bytes for 2 args via reg


--- ok ---
Title: Re: Passing args on the stack: what is fastest?
Post by: sinsi on December 31, 2015, 10:46:59 AM
As usual, AMD says "up yours"

AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G (SSE4)

539     cycles for 100 * pop retadd, pop arg, push retadd
752     cycles for 100 * pop retadd, pop arg, jmp retadd
610     cycles for 100 * mov eax, arg/ret
1070    cycles for 100 * push esi edi ebx ecx
1998    cycles for 100 * pushad
718     cycles for 100 * popretadd, 2 args
330     cycles for 100 * 2 args via reg

548     cycles for 100 * pop retadd, pop arg, push retadd
610     cycles for 100 * pop retadd, pop arg, jmp retadd
615     cycles for 100 * mov eax, arg/ret
1091    cycles for 100 * push esi edi ebx ecx
1996    cycles for 100 * pushad
735     cycles for 100 * popretadd, 2 args
338     cycles for 100 * 2 args via reg

557     cycles for 100 * pop retadd, pop arg, push retadd
557     cycles for 100 * pop retadd, pop arg, jmp retadd
631     cycles for 100 * mov eax, arg/ret
1078    cycles for 100 * push esi edi ebx ecx
2010    cycles for 100 * pushad
739     cycles for 100 * popretadd, 2 args
339     cycles for 100 * 2 args via reg

546     cycles for 100 * pop retadd, pop arg, push retadd
722     cycles for 100 * pop retadd, pop arg, jmp retadd
649     cycles for 100 * mov eax, arg/ret
1085    cycles for 100 * push esi edi ebx ecx
1982    cycles for 100 * pushad
760     cycles for 100 * popretadd, 2 args
339     cycles for 100 * 2 args via reg

568     cycles for 100 * pop retadd, pop arg, push retadd
746     cycles for 100 * pop retadd, pop arg, jmp retadd
614     cycles for 100 * mov eax, arg/ret
1078    cycles for 100 * push esi edi ebx ecx
1998    cycles for 100 * pushad
735     cycles for 100 * popretadd, 2 args
356     cycles for 100 * 2 args via reg

11      bytes for pop retadd, pop arg, push retadd
11      bytes for pop retadd, pop arg, jmp retadd
15      bytes for mov eax, arg/ret
31      bytes for push esi edi ebx ecx
27      bytes for pushad
13      bytes for popretadd, 2 args
15      bytes for 2 args via reg

Title: Re: Passing args on the stack: what is fastest?
Post by: jj2007 on December 31, 2015, 12:55:55 PM
Quote from: sinsi on December 31, 2015, 10:46:59 AM
As usual, AMD says "up yours"

Minority Report?
;)
Title: Re: Passing args on the stack: what is fastest?
Post by: TWell on December 31, 2015, 06:56:31 PM
AMD Athlon(tm) II X2 220 Processor (SSE3)

636     cycles for 100 * pop retadd, pop arg, push retadd
783     cycles for 100 * pop retadd, pop arg, jmp retadd
430     cycles for 100 * mov eax, arg/ret
968     cycles for 100 * push esi edi ebx ecx
1478    cycles for 100 * pushad
856     cycles for 100 * popretadd, 2 args
428     cycles for 100 * 2 args via reg

632     cycles for 100 * pop retadd, pop arg, push retadd
426     cycles for 100 * pop retadd, pop arg, jmp retadd
429     cycles for 100 * mov eax, arg/ret
969     cycles for 100 * push esi edi ebx ecx
1477    cycles for 100 * pushad
856     cycles for 100 * popretadd, 2 args
429     cycles for 100 * 2 args via reg

632     cycles for 100 * pop retadd, pop arg, push retadd
981     cycles for 100 * pop retadd, pop arg, jmp retadd
431     cycles for 100 * mov eax, arg/ret
968     cycles for 100 * push esi edi ebx ecx
1480    cycles for 100 * pushad
856     cycles for 100 * popretadd, 2 args
431     cycles for 100 * 2 args via reg

632     cycles for 100 * pop retadd, pop arg, push retadd
427     cycles for 100 * pop retadd, pop arg, jmp retadd
428     cycles for 100 * mov eax, arg/ret
969     cycles for 100 * push esi edi ebx ecx
1477    cycles for 100 * pushad
857     cycles for 100 * popretadd, 2 args
472     cycles for 100 * 2 args via reg

632     cycles for 100 * pop retadd, pop arg, push retadd
426     cycles for 100 * pop retadd, pop arg, jmp retadd
428     cycles for 100 * mov eax, arg/ret
968     cycles for 100 * push esi edi ebx ecx
1477    cycles for 100 * pushad
857     cycles for 100 * popretadd, 2 args
428     cycles for 100 * 2 args via reg

11      bytes for pop retadd, pop arg, push retadd
11      bytes for pop retadd, pop arg, jmp retadd
15      bytes for mov eax, arg/ret
31      bytes for push esi edi ebx ecx
27      bytes for pushad
13      bytes for popretadd, 2 args
15      bytes for 2 args via reg
Title: Re: Passing args on the stack: what is fastest?
Post by: Grincheux on January 02, 2016, 08:59:23 PM
AMD Athlon(tm) II X2 250 Processor (SSE3)

1463    cycles for 100 * pop retadd, pop arg, push retadd
428     cycles for 100 * pop retadd, pop arg, jmp retadd
438     cycles for 100 * mov eax, arg/ret
979     cycles for 100 * push esi edi ebx ecx
1489    cycles for 100 * pushad
857     cycles for 100 * popretadd, 2 args
496     cycles for 100 * 2 args via reg

1438    cycles for 100 * pop retadd, pop arg, push retadd
426     cycles for 100 * pop retadd, pop arg, jmp retadd
428     cycles for 100 * mov eax, arg/ret
970     cycles for 100 * push esi edi ebx ecx
1490    cycles for 100 * pushad
869     cycles for 100 * popretadd, 2 args
428     cycles for 100 * 2 args via reg

1331    cycles for 100 * pop retadd, pop arg, push retadd
1218    cycles for 100 * pop retadd, pop arg, jmp retadd
428     cycles for 100 * mov eax, arg/ret
979     cycles for 100 * push esi edi ebx ecx
1488    cycles for 100 * pushad
866     cycles for 100 * popretadd, 2 args
439     cycles for 100 * 2 args via reg

642     cycles for 100 * pop retadd, pop arg, push retadd
427     cycles for 100 * pop retadd, pop arg, jmp retadd
440     cycles for 100 * mov eax, arg/ret
984     cycles for 100 * push esi edi ebx ecx
1557    cycles for 100 * pushad
867     cycles for 100 * popretadd, 2 args
439     cycles for 100 * 2 args via reg

769     cycles for 100 * pop retadd, pop arg, push retadd
427     cycles for 100 * pop retadd, pop arg, jmp retadd
429     cycles for 100 * mov eax, arg/ret
979     cycles for 100 * push esi edi ebx ecx
1491    cycles for 100 * pushad
866     cycles for 100 * popretadd, 2 args
430     cycles for 100 * 2 args via reg

11      bytes for pop retadd, pop arg, push retadd
11      bytes for pop retadd, pop arg, jmp retadd
15      bytes for mov eax, arg/ret
31      bytes for push esi edi ebx ecx
27      bytes for pushad
13      bytes for popretadd, 2 args
15      bytes for 2 args via reg


--- ok ---