News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Masm64 SDK ignores "uses"

Started by jj2007, August 28, 2023, 08:36:11 AM

Previous topic - Next topic

zedd151

Quote from: jj2007 on August 29, 2023, 05:19:50 AMThat should be tested

pushing took 1887 ms
moving  took 1888 ms
pushing took 1622 ms ; <----
moving  took 2153 ms ; <----
pushing took 1607 ms
moving  took 1872 ms
pushing took 1607 ms
moving  took 1887 ms

second run
pushing took 1872 ms
moving  took 1887 ms
pushing took 1607 ms ; <----
moving  took 2137 ms ; <----
pushing took 1623 ms
moving  took 1872 ms
pushing took 1607 ms
moving  took 1872 ms

third run
pushing took 1888 ms
moving  took 1903 ms
pushing took 1623 ms ; <----
moving  took 2152 ms ; <----
pushing took 1623 ms
moving  took 1887 ms
pushing took 1607 ms
moving  took 1888 ms
Odd, always the second iteration...

HSE

Hi Vortex,

Quote from: Vortex on August 29, 2023, 04:12:01 AMHere is a known method to preserve the volatile registers without specifiying USES :

I remember we help Hutch to make same thing with the macros. :thumbsup:

I think the idea behand this method is that you can trash the register in a first procedure part, and later you can use original value without need to store that twice. Registers stored by "uses" are a little hard to find from inside the procedure.
Equations in Assembly: SmplMath

Vortex

Hi HSE,

You are right, registers stored by "uses" are not easy to find.

By the way, it looks like that the push \ pop pair is faster than the mov instruction, I tested Jochen's code.

HSE

Quote from: Vortex on August 29, 2023, 06:45:58 AMit looks like that the push \ pop pair is faster than the mov instruction,

:biggrin:  :biggrin:

pushing took 1063 ms
moving  took 953 ms
pushing took 906 ms
moving  took 828 ms
pushing took 938 ms
moving  took 844 ms
pushing took 906 ms
moving  took 953 ms
Equations in Assembly: SmplMath

Vortex

Here are my results :

pushing took 1825 ms
moving  took 1810 ms
pushing took 1544 ms
moving  took 2059 ms
pushing took 1529 ms
moving  took 1857 ms
pushing took 1638 ms
moving  took 1809 ms

zedd151

Different processors, different results.

jj2007

Quote from: zedd151 on August 29, 2023, 05:38:57 AMOdd, always the second iteration...

I added an align 16, now it looks stable on my machine:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
pushing took 1592 ms
moving  took 1841 ms
pushing took 1575 ms
moving  took 1825 ms
pushing took 1591 ms
moving  took 1825 ms
pushing took 1592 ms
moving  took 1856 ms

I also added a Masm64 SDK-compatible PrintCpu macro for Héctor ;-)

zedd151

Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz
pushing took 1716 ms
moving  took 2091 ms
pushing took 1794 ms
moving  took 2090 ms
pushing took 1779 ms
moving  took 2090 ms
pushing took 1763 ms
moving  took 1934 ms
Press any key to continue . . .
Looks better
We need some AMD's

jj2007

Quote from: zedd151 on August 29, 2023, 07:22:29 AMWe need some AMD's

Your wish is my command:
AMD Athlon Gold 3150U with Radeon Graphics
pushing took 1828 ms
moving  took 1844 ms
pushing took 1875 ms
moving  took 1984 ms
pushing took 1906 ms
moving  took 1875 ms
pushing took 1875 ms
moving  took 1906 ms

zedd151

Quote from: jj2007 on August 29, 2023, 07:29:28 AMAMD Athlon Gold 3150U with Radeon Graphics
Not a big variance. A little flip-flopping, though. I would call it about even for your AMD.

TimoVJL

AMD Ryzen 5 3400G
pushing took 1375 ms
moving  took 1390 ms
pushing took 1375 ms
moving  took 1407 ms
pushing took 1453 ms
moving  took 1437 ms
pushing took 1469 ms
moving  took 1641 ms
pushing took 1453 ms
moving  took 1390 ms
pushing took 1391 ms
moving  took 1390 ms
pushing took 1391 ms
moving  took 1390 ms
pushing took 1391 ms
moving  took 1406 ms
pushing took 1625 ms
moving  took 1359 ms
pushing took 1359 ms
moving  took 1375 ms
pushing took 1375 ms
moving  took 1375 ms
pushing took 1360 ms
moving  took 1359 ms
May the source be with you

jj2007

So it seems that AMD CPUs take exactly the same amount of cycles, while Intel CPUs are slightly faster with push & pop.

This is remarkable, since the hype around the x64 ABI is based on the idea that moving stuff is faster than pushing :cool:

x64 Architecture is an interesting read. Did you know that you can align 16 the stack with a simple, short and spl, 0F0h?

48:83E4 F0                 | and rsp,FFFFFFFFFFFFFFF0        | OK
83E4 F0                    | and esp,FFFFFFF0                | not recommended, clears upper dword
66:83E4 F0                 | and sp,FFF0                     | OK
40:32E4                    | xor spl,spl                     | align stack 256
40:80E4 F0                 | and spl,F0                      | OK

Another interesting bit:
QuoteThe caller reserves space on the stack for arguments passed in registers

It doesn't say anything about our dear habit to put a sub rsp, 80h somewhere on top of the proc. It just says for arguments passed in registers, i.e. rcx, rdx, r8 and r9. At least that's what I read into this phrase - xmm0 is a register, right?

HSE

Quote from: jj2007 on August 29, 2023, 07:09:07 PMwhile Intel CPUs are slightly faster with push & pop.

Not exactly. Here results are same number: 5.59 cycles, and variance is so big (179 and 160 cycles^2) that it's not possible to say very much.

Picture is from mov, but pushpop is the same.


Equations in Assembly: SmplMath

jj2007

What's your actual code? Here is mine:
method1:
  push rsi
  push rdi
  push rbx
  nop
  pop rbx
  pop rdi
  pop rsi
  ret
method2:
 mov [rbp+16], rsi
 mov [rbp+24], rdi
 mov [rbp+32], rbx
  nop
  mov rbx, [rbp+32]
  mov rdi, [rbp+32]
  mov rsi, [rbp+32]
  ret
...
  REPEAT 4
  mov ticks, rv(GetTickCount)
  mov ecx, iterations
  align 16
@@:
  call method1
  dec ecx
  jns @B
  sub rv(GetTickCount), ticks
  invoke __imp__cprintf, cfm$("pushing took %i ms\n"), rax

  mov ticks, rv(GetTickCount)
  mov ecx, iterations
  align 16
@@:
  call method2
  dec ecx
  jns @B
  sub rv(GetTickCount), ticks
  invoke __imp__cprintf, cfm$("moving  took %i ms\n"), rax
  ENDM

HSE

function_under_glass5 macro
  push rsi
  push rdi
  push rbx
  nop
  pop rbx
  pop rdi
  pop rsi
endm

function_under_glass6 macro
  mov _rsi, rsi
  mov _rdi, rdi
  mov _rbx, rbx
  nop
  mov rbx, _rbx
  mov rdi, _rdi
  mov rsi, _rsi
endm

There is no call in this test.

      .while !ZERO?
        function_under_glass6
        dec ebx
      .endw
Equations in Assembly: SmplMath