Author Topic: freg: Pseudo push/pop registers in 64 bits  (Read 4641 times)

HSE

  • Member
  • *****
  • Posts: 2193
  • AMD 7-32 / i3 10-64
freg: Pseudo push/pop registers in 64 bits
« on: March 30, 2021, 06:18:01 AM »
Hi All!

This is a little macro system to simulate pushs/pops. It's adapted from SmplMath results storing system. Work in 32/64 and can be thread safe/unsafe. Size of registers can be 1, 2, 4 or 8 bytes.

The idea is to work with SmplMath, but only 1 file is requiered from SmplMath and I included it here to make the system independent (and ML or JWASM).

Code: [Select]
other proc
    local fregTLS()
   
    conout "    eax", tab

    mov eax, 1350
    freg_push eax

    mov eax, 2264

    freg_pop eax
    conout str$(eax),lf,lf

    ret
other endp

Code: [Select]
Howdy, your new console template here.

    eax 1350

Press any key to continue...

Of course more test is needed :biggrin:

Regards, HSE.
« Last Edit: April 14, 2021, 02:23:17 AM by HSE »
Equations in Assembly: SmplMath

Biterider

  • Member
  • ****
  • Posts: 975
  • ObjAsm Developer
    • ObjAsm
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #1 on: March 30, 2021, 06:37:39 AM »
Hi HSE
Good idea to use TLS for this purpose  :thumbsup:
This can solve the problems on 64-bit with rsp-based frames. I think this solution may have some timing drawbacks compared to regular push/pop instructions. Did you do a benchmark?

Biterider

HSE

  • Member
  • *****
  • Posts: 2193
  • AMD 7-32 / i3 10-64
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #2 on: March 30, 2021, 08:13:05 AM »
Hi Biterider!

This can solve the problems on 64-bit with rsp-based frames.
A lot better that some rsp-based frameworks I saw. If qWord solved storage in this way, can not be so bad  :biggrin:

I think this solution may have some timing drawbacks compared to regular push/pop instructions. Did you do a benchmark?
No benmark, but in theory moving to/from calculated adresses always is slower than push/pop. The idea is to translate easily 32bit programs to 64 bit, then is interesting that the sytem work in 32 bit, just to be sure that 32 bit build work well before to build in 64 bit. The dual application have to pay some price in 64 bit (after testing, in 32 bit just make  freg_xxx macro reg / xxx &reg / endm ).

Regards, HSE.
« Last Edit: March 31, 2021, 03:51:27 AM by HSE »
Equations in Assembly: SmplMath

Biterider

  • Member
  • ****
  • Posts: 975
  • ObjAsm Developer
    • ObjAsm
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #3 on: March 30, 2021, 06:01:16 PM »
Hi HSE
It's a brilliant idea. :icon_idea:
I think the penalties are all related to cache misses. Once the TLS is loaded it should perform the same way.

I think there is an alternative that I haven't tested or implemented yet. You can count the number of peudo-pushes and pseudo-pops, reserve some place on the stack (e.g. a local area) and save the content there. It has the benefit of not getting hit by the cache misses and it's thread safe too.
I think it's worth exploring.

Regards, Biterider
« Last Edit: March 30, 2021, 08:53:11 PM by Biterider »

jj2007

  • Member
  • *****
  • Posts: 12949
  • Assembler is fun ;-)
    • MasmBasic
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #4 on: March 30, 2021, 07:31:35 PM »
A lot better that some rsp-based frame I saw.

Is there any evidence that rsp-based frames are faster and/or shorter?

HSE

  • Member
  • *****
  • Posts: 2193
  • AMD 7-32 / i3 10-64
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #5 on: March 30, 2021, 11:07:29 PM »
Hi Biterider!

It's a brilliant idea. :icon_idea:
I think the penalties are all related to cache misses. Once the TLS is loaded it should perform the same way.
Perhaps you are thinking something different, and that could be brillant  :biggrin:

You can count the number of peudo-pushes and pseudo-pops, reserve some place on the stack (e.g. a local area) and save the content there.
It's what this system make  :thumbsup:

Regards, HSE.
Equations in Assembly: SmplMath

daydreamer

  • Member
  • *****
  • Posts: 2091
  • beer glass
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #6 on: March 30, 2021, 11:21:50 PM »
Nice :thumbsup:
Pseudo stack would also be nice for SIMD registers

In 32 bit,wonder pushes/ pops to get more registers usable,vs when milliseconds api calls inside loop +3 pushes +3 pops, vs use local variables Inc/Dec as loop counters?


SIMD fan and macro fan
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."

HSE

  • Member
  • *****
  • Posts: 2193
  • AMD 7-32 / i3 10-64
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #7 on: March 30, 2021, 11:33:11 PM »
A lot better that some rsp-based frame I saw.

Is there any evidence that rsp-based frames are faster and/or shorter?
What I saw is the contrary. Rsp-based frameworks is larger and, from Agner Fog count, require more cycles. No motivation to make a test (but you can  :biggrin: )
« Last Edit: March 31, 2021, 03:52:18 AM by HSE »
Equations in Assembly: SmplMath

HSE

  • Member
  • *****
  • Posts: 2193
  • AMD 7-32 / i3 10-64
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #8 on: March 31, 2021, 12:01:46 AM »
Pseudo stack would also be nice for SIMD registers
In theory you have enough xmm registers  :biggrin: I not included xmm registers, but perhaps I will (just in case some SIMD fanatic want to try that).

In 32 bit,wonder pushes/ pops to get more registers usable
That is interesting because you can have 2 different piles. Rsp-based framesworks can not do that.
« Last Edit: March 31, 2021, 03:52:55 AM by HSE »
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 12949
  • Assembler is fun ;-)
    • MasmBasic
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #9 on: March 31, 2021, 01:07:47 AM »
Rsp-based frame is larger and, from Agner Fog count, require more cycles. No motivation to

Even in 64-bit code, all rsp-based moves are one byte longer. So what is the motivation to use rsp-based stack frames? I am curious because I see quite often discussions about them.

Code: [Select]
48 8B 45 04                   | mov rax,qword ptr ss:[rbp+4]    |
48 8B 85 90 01 00 00          | mov rax,qword ptr ss:[rbp+190]  |
48 8B 44 24 04                | mov rax,qword ptr ss:[rsp+4]    |
48 8B 84 24 90 01 00 00       | mov rax,qword ptr ss:[rsp+190]  |

Biterider

  • Member
  • ****
  • Posts: 975
  • ObjAsm Developer
    • ObjAsm
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #10 on: March 31, 2021, 03:15:55 AM »
Hi
A good reason to stick to rsp frames is x64 exception handling.
In order to be able to unwind the code, the operating system needs to find the procedure frames. For this purpose it uses the rsp register and  expects that it will not change within a procedure.

If you don't need exception handling, you can go e.g. with rbp frames or no frames at all.

Regards, Biterider

HSE

  • Member
  • *****
  • Posts: 2193
  • AMD 7-32 / i3 10-64
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #11 on: March 31, 2021, 03:50:11 AM »
 :biggrin: Sorry my English, I was thinking in frameworks, not frames.

 Rsp-based frameworks use push and pop, but recalculate rsp. 
Equations in Assembly: SmplMath

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #12 on: March 31, 2021, 03:53:52 AM »
deleted
« Last Edit: February 26, 2022, 04:38:09 AM by nidud »

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #13 on: March 31, 2021, 05:24:01 AM »
deleted
« Last Edit: February 26, 2022, 04:38:18 AM by nidud »

Biterider

  • Member
  • ****
  • Posts: 975
  • ObjAsm Developer
    • ObjAsm
Re: freg: Pseudo push/pop registers in 64 bits
« Reply #14 on: March 31, 2021, 07:22:24 AM »
Hi
As I read, when you write your own prologue/epilogue, the following applies:

If you fail to register unwind codes, then the system will assume that you are a lightweight leaf function, which means that it will assume that all nonvolatile registers are unmodified from the calling function, the stack pointer has not been changed from its value at function entry, and that the return address is in the default location. For x64, this means that the return address is at the top of the stack; for RISC, it means that the return address is in the standard return address register.

...and that will lead to unpredictable behavior.

I guess they end up in the department where Raymond Chen works :biggrin:
  :biggrin:

Biterider