The MASM Forum

Miscellaneous => Miscellaneous Projects => Topic started by: HSE on March 30, 2021, 06:18:01 AM

Title: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on March 30, 2021, 06:18:01 AM
Hi All!

This is a little macro system to simulate pushs/pops. It's adapted from SmplMath results storing system. Work in 32/64 and can be thread safe/unsafe. Size of registers can be 1, 2, 4 or 8 bytes.

The idea is to work with SmplMath, but only 1 file is requiered from SmplMath and I included it here to make the system independent (and ML or JWASM).

other proc
    local fregTLS()
   
    conout "    eax", tab

    mov eax, 1350
    freg_push eax

    mov eax, 2264

    freg_pop eax
    conout str$(eax),lf,lf

    ret
other endp


Howdy, your new console template here.

    eax 1350

Press any key to continue...


Of course more test is needed :biggrin:

Regards, HSE.
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: Biterider on March 30, 2021, 06:37:39 AM
Hi HSE
Good idea to use TLS for this purpose  :thumbsup:
This can solve the problems on 64-bit with rsp-based frames. I think this solution may have some timing drawbacks compared to regular push/pop instructions. Did you do a benchmark?

Biterider
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on March 30, 2021, 08:13:05 AM
Hi Biterider!

Quote from: Biterider on March 30, 2021, 06:37:39 AM
This can solve the problems on 64-bit with rsp-based frames.
A lot better that some rsp-based frameworks I saw. If qWord solved storage in this way, can not be so bad  :biggrin:

Quote from: Biterider on March 30, 2021, 06:37:39 AM
I think this solution may have some timing drawbacks compared to regular push/pop instructions. Did you do a benchmark?
No benmark, but in theory moving to/from calculated adresses always is slower than push/pop. The idea is to translate easily 32bit programs to 64 bit, then is interesting that the sytem work in 32 bit, just to be sure that 32 bit build work well before to build in 64 bit. The dual application have to pay some price in 64 bit (after testing, in 32 bit just make  freg_xxx macro reg / xxx &reg / endm ).

Regards, HSE.
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: Biterider on March 30, 2021, 06:01:16 PM
Hi HSE
It's a brilliant idea. :icon_idea:
I think the penalties are all related to cache misses. Once the TLS is loaded it should perform the same way.

I think there is an alternative that I haven't tested or implemented yet. You can count the number of peudo-pushes and pseudo-pops, reserve some place on the stack (e.g. a local area) and save the content there. It has the benefit of not getting hit by the cache misses and it's thread safe too.
I think it's worth exploring.

Regards, Biterider
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: jj2007 on March 30, 2021, 07:31:35 PM
Quote from: HSE on March 30, 2021, 08:13:05 AMA lot better that some rsp-based frame I saw.

Is there any evidence that rsp-based frames are faster and/or shorter?
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on March 30, 2021, 11:07:29 PM
Hi Biterider!

Quote from: Biterider on March 30, 2021, 06:01:16 PM
It's a brilliant idea. :icon_idea:
I think the penalties are all related to cache misses. Once the TLS is loaded it should perform the same way.
Perhaps you are thinking something different, and that could be brillant  :biggrin:

Quote from: Biterider on March 30, 2021, 06:01:16 PM
You can count the number of peudo-pushes and pseudo-pops, reserve some place on the stack (e.g. a local area) and save the content there.
It's what this system make  :thumbsup:

Regards, HSE.
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: daydreamer on March 30, 2021, 11:21:50 PM
Nice :thumbsup:
Pseudo stack would also be nice for SIMD registers

In 32 bit,wonder pushes/ pops to get more registers usable,vs when milliseconds api calls inside loop +3 pushes +3 pops, vs use local variables Inc/Dec as loop counters?


Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on March 30, 2021, 11:33:11 PM
Quote from: jj2007 on March 30, 2021, 07:31:35 PM
Quote from: HSE on March 30, 2021, 08:13:05 AMA lot better that some rsp-based frame I saw.

Is there any evidence that rsp-based frames are faster and/or shorter?
What I saw is the contrary. Rsp-based frameworks is larger and, from Agner Fog count, require more cycles. No motivation to make a test (but you can  :biggrin: )
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on March 31, 2021, 12:01:46 AM
Quote from: daydreamer on March 30, 2021, 11:21:50 PM
Pseudo stack would also be nice for SIMD registers
In theory you have enough xmm registers  :biggrin: I not included xmm registers, but perhaps I will (just in case some SIMD fanatic want to try that).

Quote from: daydreamer on March 30, 2021, 11:21:50 PM
In 32 bit,wonder pushes/ pops to get more registers usable
That is interesting because you can have 2 different piles. Rsp-based framesworks can not do that.
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: jj2007 on March 31, 2021, 01:07:47 AM
Quote from: HSE on March 30, 2021, 11:33:11 PMRsp-based frame is larger and, from Agner Fog count, require more cycles. No motivation to

Even in 64-bit code, all rsp-based moves are one byte longer. So what is the motivation to use rsp-based stack frames? I am curious because I see quite often discussions about them.

48 8B 45 04                   | mov rax,qword ptr ss:[rbp+4]    |
48 8B 85 90 01 00 00          | mov rax,qword ptr ss:[rbp+190]  |
48 8B 44 24 04                | mov rax,qword ptr ss:[rsp+4]    |
48 8B 84 24 90 01 00 00       | mov rax,qword ptr ss:[rsp+190]  |
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: Biterider on March 31, 2021, 03:15:55 AM
Hi
A good reason to stick to rsp frames is x64 exception handling.
In order to be able to unwind the code, the operating system needs to find the procedure frames. For this purpose it uses the rsp register and  expects that it will not change within a procedure.

If you don't need exception handling, you can go e.g. with rbp frames or no frames at all.

Regards, Biterider
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on March 31, 2021, 03:50:11 AM
 :biggrin: Sorry my English, I was thinking in frameworks, not frames.

Rsp-based frameworks use push and pop, but recalculate rsp. 
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: nidud on March 31, 2021, 03:53:52 AM
deleted
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: nidud on March 31, 2021, 05:24:01 AM
deleted
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: Biterider on March 31, 2021, 07:22:24 AM
Hi
As I read, when you write your own prologue/epilogue, the following applies:

If you fail to register unwind codes, then the system will assume that you are a lightweight leaf function, which means that it will assume that all nonvolatile registers are unmodified from the calling function, the stack pointer has not been changed from its value at function entry, and that the return address is in the default location. For x64, this means that the return address is at the top of the stack; for RISC, it means that the return address is in the standard return address register.

...and that will lead to unpredictable behavior.

Quote from: nidud on March 31, 2021, 05:24:01 AM
I guess they end up in the department where Raymond Chen works :biggrin:
:biggrin:

Biterider
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: jj2007 on April 03, 2021, 12:11:13 AM
Quote from: Biterider on March 31, 2021, 03:15:55 AM
A good reason to stick to rsp frames is x64 exception handling.

https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-160
QuoteFrame register

If nonzero, then the function uses a frame pointer (FP), and this field is the number of the nonvolatile register used as the frame pointer, using the same encoding for the operation info field of UNWIND_CODE nodes.

Frame register offset (scaled)

If the frame register field is nonzero, this field is the scaled offset from RSP that is applied to the FP register when it's established. The actual FP register is set to RSP + 16 * this number, allowing offsets from 0 to 240. This offset permits pointing the FP register into the middle of the local stack allocation for dynamic stack frames, allowing better code density through shorter instructions. (That is, more instructions can use the 8-bit signed offset form.)

For timings, see Shadow space in 64-bit programming (http://masm32.com/board/index.php?topic=9227.msg101815#msg101815)

P.S.: If anybody knows what exactly they mean with "middle", please tell me, I am curious. It sounds good to have the full range, and a compiler can surely do it, but I can't see how to do it in with current assemblers. Like this maybe?
someproc
Local v1
  mov rax, v1[Myoffset]
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on April 07, 2021, 03:17:31 AM
 Hi!

There was a little problem if register is "assumed".

When "assumed", register is saved as dword in 32 bits, or as qword in 64 bits (JWasm family).

Updated in first post.

Regards.
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on April 10, 2021, 01:43:29 AM
Hi All!

Added pseudo push/pop of variables (not a lot in my 32 bit code, but there are some). It's a little more tricky because need a GPR to move value (by default are eax and R10 but you can use other):
  freg_pushv [xax].SDLL_ITEM.pNextItem, R11
  ยทยทยทยท
  freg_pop xax


Also a not so automatic correction for unbalanced number of push/pop. That happen in conditional flow:  freg_push xax
  .if [xsi].BibBigMaster.options.TextEdition
invoke CheckMenuItem, xax, IDM_TEXT_ED, MF_UNCHECKED
freg_pop xax
invoke CheckMenuItem, xax, IDM_BLOCK_ED, MF_CHECKED
  .else
invoke CheckMenuItem, xax, IDM_TEXT_ED, MF_CHECKED
freg_correction +1
freg_pop xax
invoke CheckMenuItem, xax, IDM_BLOCK_ED, MF_UNCHECKED
  .endif


Uploaded in first post.

Regars, HSE.
Title: Re: freg: Pseudo push/pop registers in 64 bits
Post by: HSE on April 14, 2021, 02:26:45 AM
Added pseudo peek and more complete example in first post.