I sneaked in a new macro when no-one was looking, a technique for preserving registers in procedures with no stack frame. It can be used in either stackframe or nostackframe procedures but a normal stackframe proc can allocate its own locals so its not really needed there. With testing so far it seems to nest OK but I would like to get more testing done. Importantly the two macros must be used in pairs and the second macro tests if the first is there but I don't want the method to get too clunky to try and make it idiot proof as the quality of idiot exceeds any safety measure.
The latest macro file is attached to the last update of the new help file.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
msgloop proc
preserve_regs r14,r15
xor r14, r14
mov r15, ptr$(msg)
jmp gmsg
mloop:
rcall TranslateMessage,r15
rcall DispatchMessage, r15
gmsg:
test rax, rvcall(GetMessage,r15,r14,r14,r14)
jnz mloop
restore_regs r14,r15
ret
msgloop endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
sub_1400012a2 proc
.text:00000001400012a2 C8800000 enter 0x80, 0x0
.text:00000001400012a6 4883EC60 sub rsp, 0x60
.text:00000001400012aa 4C8935B7200000 mov qword ptr [0x140003368], r14
.text:00000001400012b1 4C893DB8200000 mov qword ptr [0x140003370], r15
.text:00000001400012b8 4D33F6 xor r14, r14
.text:00000001400012bb 488D0576200000 lea rax, [0x140003338]
.text:00000001400012c2 4C8BF8 mov r15, rax
.text:00000001400012c5 EB12 jmp 0x1400012d9
.text:00000001400012c5
.text:00000001400012c7
.text:00000001400012c7 0x1400012c7:
.text:00000001400012c7 498BCF mov rcx, r15
.text:00000001400012ca FF15A80D0000 call qword ptr [TranslateMessage]
.text:00000001400012ca
.text:00000001400012d0 498BCF mov rcx, r15
.text:00000001400012d3 FF15B70D0000 call qword ptr [DispatchMessageA]
.text:00000001400012d3
.text:00000001400012d9
.text:00000001400012d9 0x1400012d9:
.text:00000001400012d9 4D8BCE mov r9, r14
.text:00000001400012dc 4D8BC6 mov r8, r14
.text:00000001400012df 498BD6 mov rdx, r14
.text:00000001400012e2 498BCF mov rcx, r15
.text:00000001400012e5 FF159D0D0000 call qword ptr [GetMessageA]
.text:00000001400012e5
.text:00000001400012eb 4885C0 test rax, rax
.text:00000001400012ee 75D7 jne 0x1400012c7
.text:00000001400012ee
.text:00000001400012f0 4C8B3571200000 mov r14, qword ptr [0x140003368]
.text:00000001400012f7 4C8B3D72200000 mov r15, qword ptr [0x140003370]
.text:00000001400012fe C9 leave
.text:00000001400012ff C3 ret
sub_1400012a2 endp
So not thread safe then?
While I would like to get more testing done with it, by incrementing the counter for each time the macro pair are used, I think it is safe from duplication. It is a macro that writes data to the uninitialised data section so if there are no duplicates, there should not be any problems.
But the macro isn't called by the actual code, so two threads calling the proc will clash when the proc saves the register to the same address twice?
The catch is that the preserved registers are not stored as dynamic code, the register contents are written to the uninitialised data section and in each instance the data is written to unique addresses that are configured at build time, not run time. It means for every macro pair, there are locations already in the uninitialised data section for them to write to.
Now if I understand what you have said, the risk is if a single procedure is used to start multiple threads so that they run in parallel and there is probably a problem here in that multiple threads would be using the same set of data locations. The solution here is to manually allocate local addresses for registers which would require a stack frame. Have I understood what you have pointed out ?
Thread1 calls your proc which execute this code
.text:00000001400012aa 4C8935B7200000 mov qword ptr [0x140003368], r14
.text:00000001400012b1 4C893DB8200000 mov qword ptr [0x140003370], r15
Thread2 calls your proc which also executes this code
.text:00000001400012aa 4C8935B7200000 mov qword ptr [0x140003368], r14
.text:00000001400012b1 4C893DB8200000 mov qword ptr [0x140003370], r15
Assuming the code takes the same time
- thread1 enters the proc and has its r14/r15 saved
- thread2 enters the proc and has its r14/r15 saved to the same addresses, clobbering thread1's registers
- thread1 exits the proc, but with thread2's r14/r15
Can't you tweak your prologue/epilogue to just push the registers?
You really need to use local (stack) memory for multithreaded variables.
Problem is using PUSH POP will mess up the procedure alignment as the stackframe macro can be aligned to greater than QWORD for larger data types.
The alternative for multiple threads is to create LOCAL variables that are written in descending sizes from biggest that match the procedure alignment downwards in data size. This does require a stack frame to do this.
LOCAL var :XMMWORD
LOCAL reg1 :QWORD
LOCAL reg2 :QWORD
; etc ....
sinsi,
Have a look at this one. Problem is it will only work with a stack frame but for thread safe procedures, it should do the job.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
MLOCAL equ LOCAL ; the word LOCAL is ambiguous in a MACRO
; -----------------------------------------------------
; acnt (arg count) must match the number of 64 bit regs
; -----------------------------------------------------
REGSPACE MACRO acnt
MLOCAL r64[acnt] :QWORD
ENDM
; -----------------------------------------------------------
; arglist for both save and restore must be in the same order
; -----------------------------------------------------------
saveregs MACRO arglist:VARARG
cntr = 0
FOR var, <arglist>
mov r64[cntr], var
cntr = cntr + 8
ENDM
ENDM
restregs MACRO arglist:VARARG
cntr = 0
FOR var, <arglist>
mov var, r64[cntr]
cntr = cntr + 8
ENDM
ENDM
; -----------------------------------------------------------
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
rcall regtest
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
regtest proc
REGSPACE 8 ; allocate stack space for 64 bit registers
saveregs r12,r13,r14,r15,rsi,rdi,rbp,rbx
; do it all here !
restregs r12,r13,r14,r15,rsi,rdi,rbp,rbx
ret
regtest endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
comment #
sub_140001037 proc
.text:0000000140001037 C8800000 enter 0x80, 0x0
.text:000000014000103b 4881ECA0000000 sub rsp, 0xa0
.text:0000000140001042 4C89A560FFFFFF mov qword ptr [rbp-0xa0], r12
.text:0000000140001049 4C89AD68FFFFFF mov qword ptr [rbp-0x98], r13
.text:0000000140001050 4C89B570FFFFFF mov qword ptr [rbp-0x90], r14
.text:0000000140001057 4C89BD78FFFFFF mov qword ptr [rbp-0x88], r15
.text:000000014000105e 48897580 mov qword ptr [rbp-0x80], rsi
.text:0000000140001062 48897D88 mov qword ptr [rbp-0x78], rdi
.text:0000000140001066 48896D90 mov qword ptr [rbp-0x70], rbp
.text:000000014000106a 48895D98 mov qword ptr [rbp-0x68], rbx
.text:000000014000106e 4C8BA560FFFFFF mov r12, qword ptr [rbp-0xa0]
.text:0000000140001075 4C8BAD68FFFFFF mov r13, qword ptr [rbp-0x98]
.text:000000014000107c 4C8BB570FFFFFF mov r14, qword ptr [rbp-0x90]
.text:0000000140001083 4C8BBD78FFFFFF mov r15, qword ptr [rbp-0x88]
.text:000000014000108a 488B7580 mov rsi, qword ptr [rbp-0x80]
.text:000000014000108e 488B7D88 mov rdi, qword ptr [rbp-0x78]
.text:0000000140001092 488B6D90 mov rbp, qword ptr [rbp-0x70]
.text:0000000140001096 488B5D98 mov rbx, qword ptr [rbp-0x68]
.text:000000014000109a C9 leave
.text:000000014000109b C3 ret
sub_140001037 endp
#
Rather than having two places to list (and change) registers
saveregs MACRO arglist:VARARG
reglist TEXTEQU <>
cntr = 0
FOR var, <arglist>
mov r64[cntr], var
cntr = cntr + 8
IFNB reglist
reglist CATSTR reglist,<,>,<var>
ELSE
reglist TEXTEQU <var>
ENDIF
ENDM
ENDM
restregs MACRO
cntr = 0
%FOR var, <reglist>
mov var, r64[cntr]
cntr = cntr + 8
ENDM
ENDM
I can' test in this 32bit machine,butreglist TEXTEQU <arglist>
don't work?
I have simplified the save macro and the design works well, what I am not sure about is the restore not having the same arglist. I tend to prefer the identical arglist for clearer code.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
MLOCAL equ LOCAL ; the word LOCAL is ambiguous in a MACRO
; -----------------------------------------------------
; acnt (arg count) must match the number of 64 bit regs
; -----------------------------------------------------
REGSPACE MACRO acnt
MLOCAL r64@@_@@[acnt] :QWORD
ENDM
SaveRegs MACRO arglist:VARARG
cntr = 0
reg@___list___@ equ arglist
FOR arg,<arglist>
;; %echo arg
mov r64@@_@@[cntr], arg
cntr = cntr + 8
ENDM
ENDM
RestoreRegs MACRO
cntr = 0
%FOR arg,<reg@___list___@>
;; %echo arg
mov arg, r64@@_@@[cntr]
cntr = cntr + 8
ENDM
ENDM
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
REGSPACE 4 ; allocate local space for registers
SaveRegs r12,r13,r14,r15 ; save register list
call tstproc
waitkey
RestoreRegs ; restore register list
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
tstproc proc
REGSPACE 4
SaveRegs rax,rbx,rcx,rdx
nop
nop
nop
nop
RestoreRegs
ret
tstproc endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
A troll?:
REGSPACE MACRO arglist:VARARG
cntr = 0
reg@___list___@ equ arglist
FOR arg,<arglist>
cntr = cntr +1
ENDM
MLOCAL r64@@_@@[cntr] :QWORD
ENDM
SaveRegs MACRO
cntr = 0
FOR arg,<reg@___list___@>
mov r64@@_@@[cntr], arg
cntr = cntr +8
ENDM
ENDM
Combining the two seems to work OK and the results in the user code section is clear and easy enough to understand. The only problem I can see is that it will always have to be put as the last LOCAL as the following dynamic code prevents any further LOCAL variables. Might play with it a little longer as the previous suggestion does not have this problem. What I am trying for is a clean and simple to understand technique.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
MLOCAL equ LOCAL ; the word LOCAL is ambiguous in a MACRO
SaveRegs MACRO arglist:VARARG
MLOCAL r64@@_@@[argcount(arglist)] :QWORD
cntr = 0
reg@___list___@ equ arglist
FOR arg,<arglist>
;; %echo arg
mov r64@@_@@[cntr], arg
cntr = cntr + 8
ENDM
ENDM
RestoreRegs MACRO
cntr = 0
%FOR arg,<reg@___list___@>
;; %echo arg
mov arg, r64@@_@@[cntr]
cntr = cntr + 8
ENDM
ENDM
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
SaveRegs r12,r13,r14,r15 ; save register list
call tstproc
waitkey
RestoreRegs ; restore register list
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
tstproc proc
SaveRegs rax,rbx,rcx,rdx
nop
nop
nop
nop
RestoreRegs
ret
tstproc endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
Do you use a custom prologue/epilogue macro?
Yep, its in the macro file. Its been very reliable and can be adjusted to handle different data sizes as the default alignment. This means that if someone wants to use AVX or larger, they can align the procedure so that the stack on entry is aligned and can accept from the largest down and each data size is correctly aligned.
LOCAL avxvar :YMMWORD
LOCAL ssevar :XMMWORD
LOCAL qwdvar :QWORD
LOCAL wrdvar :WORD
LOCAL bytvar :BYTE
Can't you slot "reglist" in there somewhere whilst you are adjusting the stack?
The good thing about reglist is that in the epilogue the registers are reversed i.e. in the correct order for popping.
:biggrin:
> The good thing about reglist is that in the epilogue the registers are reversed i.e. in the correct order for popping.
The bad thing about reglist is that in the epilogue the registers are reversed i.e. in the correct order for popping.
Stack manipulation with PUSH POP messes up the stack alignment which in turn wrecks aligned data larger than QWORD.
Some time ago I remember Sinsi writing that push+pop are a no-no in 64-bit code, not because of alignment issues (just make sure you use an even number of pushes...) but because of the shadow space. Unfortunately I can't find that post, and I can't find a crispy example showing the spill/shadow space problem. Maybe on of you can help out.
But I found this instead (a very long but interesting post by Peter Cordes): (https://stackoverflow.com/questions/49485395/what-c-c-compiler-can-use-push-pop-instructions-for-creating-local-variables)
QuoteModern code generators avoid using PUSH. It is inefficient on today's processors because it modifies the stack pointer, that gums-up a super-scalar core. (Hans Passant)
This was true 15 years ago, but compilers are once again using push when optimizing for speed, not just code-size. Compilers already use push/pop for saving/restoring call-preserved registers they want to use, like rbx, and for pushing stack args
This seems to work OK as well. I have always been keen to make use of the MMX registers when they are just about useless for much else.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
testme proc
movq mm0, rbx
movq mm1, r12
movq mm2, r13
movq mm3, r14
movq mm4, r15
movq mm5, rsi
movq mm6, rdi
movq mm7, rbp
mov rbx, 1
mov r12, 2
mov r13, 3
mov r14, 4
mov r15, 5
mov rsi, 6
mov rdi, 7
mov rbp, 8
movq rbx, mm0
movq r12, mm1
movq r13, mm2
movq r14, mm3
movq r15, mm4
movq rsi, mm5
movq rdi, mm6
movq rbp, mm7
ret
testme endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Quote from: hutch-- on July 17, 2018, 11:05:27 PM
I have always been keen to make use of the MMX registers when they are just about useless for much else.
Obviously You never use FPU!!!!!
You would be surprised. :P
If you are going to use the same registers for floating point, you use the right instruction for clearing them but in win64 the FP/MMX registers are not defined and you can do what you like with them.
Quote from: hutch-- on July 18, 2018, 12:12:19 AM
... but in win64 the FP/MMX registers are not defined and you can do what you like with them.
Even if defined... I will not trust very much in foreign functions inside calculations process.
Just in case, try not to develop something amazing trashing those registers :biggrin:
The simple answer is use "emms" Empty MMX Technology State. MMX is old stuff and the regs are generally not used any longer as there are better SSE and later instructions but if you want the performance of floating point you just clear the MMX state with "emms". What I have been looking for is a way to use more 64 bit registers that you can get with volatile registers and with the MMX registers you have 8 that can be used to preserve normal 64 bit integer registers.