Masm64 SDK ignores "uses"

zedd151 · August 29, 2023, 05:38:57 AM

Quote from: jj2007 on August 29, 2023, 05:19:50 AMThat should be tested

pushing took 1887 ms
moving  took 1888 ms
pushing took 1622 ms ; <----
moving  took 2153 ms ; <----
pushing took 1607 ms
moving  took 1872 ms
pushing took 1607 ms
moving  took 1887 ms

second run

Code Select

pushing took 1872 ms
moving  took 1887 ms
pushing took 1607 ms ; <----
moving  took 2137 ms ; <----
pushing took 1623 ms
moving  took 1872 ms
pushing took 1607 ms
moving  took 1872 ms

third run

Code Select

pushing took 1888 ms
moving  took 1903 ms
pushing took 1623 ms ; <----
moving  took 2152 ms ; <----
pushing took 1623 ms
moving  took 1887 ms
pushing took 1607 ms
moving  took 1888 ms

Odd, always the second iteration...

HSE · August 29, 2023, 06:34:34 AM

Hi Vortex,

Quote from: Vortex on August 29, 2023, 04:12:01 AMHere is a known method to preserve the volatile registers without specifiying USES :

I remember we help Hutch to make same thing with the macros.

I think the idea behand this method is that you can trash the register in a first procedure part, and later you can use original value without need to store that twice. Registers stored by "uses" are a little hard to find from inside the procedure.

Vortex · August 29, 2023, 06:45:58 AM

Hi HSE,

You are right, registers stored by "uses" are not easy to find.

By the way, it looks like that the push \ pop pair is faster than the mov instruction, I tested Jochen's code.

HSE · August 29, 2023, 06:53:17 AM

Quote from: Vortex on August 29, 2023, 06:45:58 AMit looks like that the push \ pop pair is faster than the mov instruction,

Code Select

pushing took 1063 ms
moving  took 953 ms
pushing took 906 ms
moving  took 828 ms
pushing took 938 ms
moving  took 844 ms
pushing took 906 ms
moving  took 953 ms

Vortex · August 29, 2023, 06:55:17 AM

Here are my results :

Code Select

pushing took 1825 ms
moving  took 1810 ms
pushing took 1544 ms
moving  took 2059 ms
pushing took 1529 ms
moving  took 1857 ms
pushing took 1638 ms
moving  took 1809 ms

zedd151 · August 29, 2023, 07:04:00 AM

Different processors, different results.

jj2007 · August 29, 2023, 07:19:14 AM

Quote from: zedd151 on August 29, 2023, 05:38:57 AMOdd, always the second iteration...

I added an align 16, now it looks stable on my machine:

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
pushing took 1592 ms
moving  took 1841 ms
pushing took 1575 ms
moving  took 1825 ms
pushing took 1591 ms
moving  took 1825 ms
pushing took 1592 ms
moving  took 1856 ms

I also added a Masm64 SDK-compatible PrintCpu macro for Héctor ;-)

zedd151 · August 29, 2023, 07:22:29 AM

Code Select

Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz
pushing took 1716 ms
moving  took 2091 ms
pushing took 1794 ms
moving  took 2090 ms
pushing took 1779 ms
moving  took 2090 ms
pushing took 1763 ms
moving  took 1934 ms
Press any key to continue . . .

Looks better
We need some AMD's

jj2007 · August 29, 2023, 07:29:28 AM

Quote from: zedd151 on August 29, 2023, 07:22:29 AMWe need some AMD's

Your wish is my command:

Code Select

AMD Athlon Gold 3150U with Radeon Graphics
pushing took 1828 ms
moving  took 1844 ms
pushing took 1875 ms
moving  took 1984 ms
pushing took 1906 ms
moving  took 1875 ms
pushing took 1875 ms
moving  took 1906 ms

zedd151 · August 29, 2023, 07:30:49 AM

Quote from: jj2007 on August 29, 2023, 07:29:28 AMAMD Athlon Gold 3150U with Radeon Graphics

Not a big variance. A little flip-flopping, though. I would call it about even for your AMD.

TimoVJL · August 29, 2023, 06:29:27 PM

AMD Ryzen 5 3400G

Code Select

pushing took 1375 ms
moving  took 1390 ms
pushing took 1375 ms
moving  took 1407 ms
pushing took 1453 ms
moving  took 1437 ms
pushing took 1469 ms
moving  took 1641 ms

Code Select

pushing took 1453 ms
moving  took 1390 ms
pushing took 1391 ms
moving  took 1390 ms
pushing took 1391 ms
moving  took 1390 ms
pushing took 1391 ms
moving  took 1406 ms

Code Select

pushing took 1625 ms
moving  took 1359 ms
pushing took 1359 ms
moving  took 1375 ms
pushing took 1375 ms
moving  took 1375 ms
pushing took 1360 ms
moving  took 1359 ms

jj2007 · August 29, 2023, 07:09:07 PM

So it seems that AMD CPUs take exactly the same amount of cycles, while Intel CPUs are slightly faster with push & pop.

This is remarkable, since the hype around the x64 ABI is based on the idea that moving stuff is faster than pushing

x64 Architecture is an interesting read. Did you know that you can align 16 the stack with a simple, short and spl, 0F0h?

Code Select

48:83E4 F0                 | and rsp,FFFFFFFFFFFFFFF0        | OK
83E4 F0                    | and esp,FFFFFFF0                | not recommended, clears upper dword
66:83E4 F0                 | and sp,FFF0                     | OK
40:32E4                    | xor spl,spl                     | align stack 256
40:80E4 F0                 | and spl,F0                      | OK

Another interesting bit:

QuoteThe caller reserves space on the stack for arguments passed in registers

It doesn't say anything about our dear habit to put a sub rsp, 80h somewhere on top of the proc. It just says for arguments passed in registers, i.e. rcx, rdx, r8 and r9. At least that's what I read into this phrase - xmm0 is a register, right?

HSE · August 29, 2023, 09:44:40 PM

Quote from: jj2007 on August 29, 2023, 07:09:07 PMwhile Intel CPUs are slightly faster with push & pop.

Not exactly. Here results are same number: 5.59 cycles, and variance is so big (179 and 160 cycles^2) that it's not possible to say very much.

Picture is from mov, but pushpop is the same.

jj2007 · August 29, 2023, 10:57:57 PM

What's your actual code? Here is mine:

Code Select

method1:
  push rsi
  push rdi
  push rbx
  nop
  pop rbx
  pop rdi
  pop rsi
  ret
method2:
 mov [rbp+16], rsi
 mov [rbp+24], rdi
 mov [rbp+32], rbx
  nop
  mov rbx, [rbp+32]
  mov rdi, [rbp+32]
  mov rsi, [rbp+32]
  ret

...

Code Select

  REPEAT 4
  mov ticks, rv(GetTickCount)
  mov ecx, iterations
  align 16
@@:
  call method1
  dec ecx
  jns @B
  sub rv(GetTickCount), ticks
  invoke __imp__cprintf, cfm$("pushing took %i ms\n"), rax

  mov ticks, rv(GetTickCount)
  mov ecx, iterations
  align 16
@@:
  call method2
  dec ecx
  jns @B
  sub rv(GetTickCount), ticks
  invoke __imp__cprintf, cfm$("moving  took %i ms\n"), rax
  ENDM

HSE · August 29, 2023, 11:08:32 PM

Code Select

function_under_glass5 macro
  push rsi
  push rdi
  push rbx
  nop
  pop rbx
  pop rdi
  pop rsi
endm

function_under_glass6 macro
  mov _rsi, rsi
  mov _rdi, rdi
  mov _rbx, rbx
  nop
  mov rbx, _rbx
  mov rdi, _rdi
  mov rsi, _rsi
endm

There is no call in this test.

Code Select

      .while !ZERO?
        function_under_glass6
        dec ebx
      .endw

The MASM Forum

News:

Masm64 SDK ignores "uses"

zedd151

HSE

Vortex

HSE

Vortex

zedd151

jj2007

zedd151

jj2007

zedd151

TimoVJL

jj2007

HSE

jj2007

HSE