News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

freg: Pseudo push/pop registers in 64 bits

Started by HSE, March 30, 2021, 06:18:01 AM

Previous topic - Next topic

HSE

Hi All!

This is a little macro system to simulate pushs/pops. It's adapted from SmplMath results storing system. Work in 32/64 and can be thread safe/unsafe. Size of registers can be 1, 2, 4 or 8 bytes.

The idea is to work with SmplMath, but only 1 file is requiered from SmplMath and I included it here to make the system independent (and ML or JWASM).

other proc
    local fregTLS()
   
    conout "    eax", tab

    mov eax, 1350
    freg_push eax

    mov eax, 2264

    freg_pop eax
    conout str$(eax),lf,lf

    ret
other endp


Howdy, your new console template here.

    eax 1350

Press any key to continue...


Of course more test is needed :biggrin:

Regards, HSE.
Equations in Assembly: SmplMath

Biterider

Hi HSE
Good idea to use TLS for this purpose  :thumbsup:
This can solve the problems on 64-bit with rsp-based frames. I think this solution may have some timing drawbacks compared to regular push/pop instructions. Did you do a benchmark?

Biterider

HSE

#2
Hi Biterider!

Quote from: Biterider on March 30, 2021, 06:37:39 AM
This can solve the problems on 64-bit with rsp-based frames.
A lot better that some rsp-based frameworks I saw. If qWord solved storage in this way, can not be so bad  :biggrin:

Quote from: Biterider on March 30, 2021, 06:37:39 AM
I think this solution may have some timing drawbacks compared to regular push/pop instructions. Did you do a benchmark?
No benmark, but in theory moving to/from calculated adresses always is slower than push/pop. The idea is to translate easily 32bit programs to 64 bit, then is interesting that the sytem work in 32 bit, just to be sure that 32 bit build work well before to build in 64 bit. The dual application have to pay some price in 64 bit (after testing, in 32 bit just make  freg_xxx macro reg / xxx &reg / endm ).

Regards, HSE.
Equations in Assembly: SmplMath

Biterider

#3
Hi HSE
It's a brilliant idea. :icon_idea:
I think the penalties are all related to cache misses. Once the TLS is loaded it should perform the same way.

I think there is an alternative that I haven't tested or implemented yet. You can count the number of peudo-pushes and pseudo-pops, reserve some place on the stack (e.g. a local area) and save the content there. It has the benefit of not getting hit by the cache misses and it's thread safe too.
I think it's worth exploring.

Regards, Biterider

jj2007

Quote from: HSE on March 30, 2021, 08:13:05 AMA lot better that some rsp-based frame I saw.

Is there any evidence that rsp-based frames are faster and/or shorter?

HSE

Hi Biterider!

Quote from: Biterider on March 30, 2021, 06:01:16 PM
It's a brilliant idea. :icon_idea:
I think the penalties are all related to cache misses. Once the TLS is loaded it should perform the same way.
Perhaps you are thinking something different, and that could be brillant  :biggrin:

Quote from: Biterider on March 30, 2021, 06:01:16 PM
You can count the number of peudo-pushes and pseudo-pops, reserve some place on the stack (e.g. a local area) and save the content there.
It's what this system make  :thumbsup:

Regards, HSE.
Equations in Assembly: SmplMath

daydreamer

Nice :thumbsup:
Pseudo stack would also be nice for SIMD registers

In 32 bit,wonder pushes/ pops to get more registers usable,vs when milliseconds api calls inside loop +3 pushes +3 pops, vs use local variables Inc/Dec as loop counters?


my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

HSE

#7
Quote from: jj2007 on March 30, 2021, 07:31:35 PM
Quote from: HSE on March 30, 2021, 08:13:05 AMA lot better that some rsp-based frame I saw.

Is there any evidence that rsp-based frames are faster and/or shorter?
What I saw is the contrary. Rsp-based frameworks is larger and, from Agner Fog count, require more cycles. No motivation to make a test (but you can  :biggrin: )
Equations in Assembly: SmplMath

HSE

#8
Quote from: daydreamer on March 30, 2021, 11:21:50 PM
Pseudo stack would also be nice for SIMD registers
In theory you have enough xmm registers  :biggrin: I not included xmm registers, but perhaps I will (just in case some SIMD fanatic want to try that).

Quote from: daydreamer on March 30, 2021, 11:21:50 PM
In 32 bit,wonder pushes/ pops to get more registers usable
That is interesting because you can have 2 different piles. Rsp-based framesworks can not do that.
Equations in Assembly: SmplMath

jj2007

Quote from: HSE on March 30, 2021, 11:33:11 PMRsp-based frame is larger and, from Agner Fog count, require more cycles. No motivation to

Even in 64-bit code, all rsp-based moves are one byte longer. So what is the motivation to use rsp-based stack frames? I am curious because I see quite often discussions about them.

48 8B 45 04                   | mov rax,qword ptr ss:[rbp+4]    |
48 8B 85 90 01 00 00          | mov rax,qword ptr ss:[rbp+190]  |
48 8B 44 24 04                | mov rax,qword ptr ss:[rsp+4]    |
48 8B 84 24 90 01 00 00       | mov rax,qword ptr ss:[rsp+190]  |

Biterider

Hi
A good reason to stick to rsp frames is x64 exception handling.
In order to be able to unwind the code, the operating system needs to find the procedure frames. For this purpose it uses the rsp register and  expects that it will not change within a procedure.

If you don't need exception handling, you can go e.g. with rbp frames or no frames at all.

Regards, Biterider

HSE

 :biggrin: Sorry my English, I was thinking in frameworks, not frames.

Rsp-based frameworks use push and pop, but recalculate rsp. 
Equations in Assembly: SmplMath

nidud

#12
deleted

nidud

#13
deleted

Biterider

Hi
As I read, when you write your own prologue/epilogue, the following applies:

If you fail to register unwind codes, then the system will assume that you are a lightweight leaf function, which means that it will assume that all nonvolatile registers are unmodified from the calling function, the stack pointer has not been changed from its value at function entry, and that the return address is in the default location. For x64, this means that the return address is at the top of the stack; for RISC, it means that the return address is in the standard return address register.

...and that will lead to unpredictable behavior.

Quote from: nidud on March 31, 2021, 05:24:01 AM
I guess they end up in the department where Raymond Chen works :biggrin:
:biggrin:

Biterider