Hi everyone!
Just to inform you that HJWasm 2.15 is uploaded on Teraspace (http://www.terraspace.co.uk/hjwasm.html) 8)
Johnsa and myself lost several kilos (not that we regret it) to make it happened :biggrin:
VECTORCALL was Johnsa's idea and I'll never forgive him for pushing me so hard to w**k on it, however he has done a big part of it and proved to me that he is a brilliant programmer :t
What we have done is:
1.) Allowed xmm, ymm and zmm registers to be saved at the same time with USES.
2.) Fixed problem with the reserved stack size
3.) Implemented VECTORCALL (R-VECTORCALL)
4.) Implemented SSE compatibility with ML64 (automatic xmmword type promotion with switch –Zg)
5.) Fixed the bug with MOVSS
6.) Fixed EIP/RIP encoding bug
Attached files contain all necessary structures and unions for the VECTORCALL, thanks to Johnsa
If someone is interested in VECTORCALL, we provided also some code examples
We believe that this version is the best yet.
Enjoy :biggrin:
Quote from: habran on September 05, 2016, 09:14:38 AMWe believe that this version is the best yet.
Looks good :t
The 32-bit version is 10% faster than HJWasm64 8)
Thanks jj2007 :biggrin:
Just to let you know that the VECTORCALL is available only in 64 bit.
I hope no one will demand it in 32 bit, because I don't want to lose another 5-6 kilos, it would require a change of my clothing size :P
It was indeed a lot .. of.. "fun" to implement vectorcall, it's a typically arse-backward standard that could have been much simpler!
But non the less its in and it does have a very good reason to exist.
Looking forward to 2.16 and 2.17 we have a long list of ideas for new features (many I'm sure the purists will not agree on) :)
These are things that I personally would find very helpful, especially when maintaining larger code bases:
1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.
2) Overloaded procedures, we identify the relevant PROC by name AND parameter types.. this one is super helpful to me especially when combined with 3.
3) Support namespaces.
NAMESPACE VectorMath
Normalize PROC VECTORCALL FRAME vec:__m256d
ret
Normalize ENDP
Normalize PROC VECTORCALL FRAME vec:__m128f
ret
Normalize ENDP
Normalize PROC VECTORCALL FRAME vec:hfa3
ret
Normalize ENDP
ENDS
Now I can just invoke with:
invoke VectorMath.Normalize, myVector ; myVector could be hfa3/4-element float simd type or 4-element double simd type.
As with all of these things, they're not mandatory so you can ignore namespaces and all existing code works as-is by being in the default global namespace.
These are just some ideas and of course we're always welcome to other suggestions!
Quote from: johnsa on September 05, 2016, 06:50:35 PM1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.
It's worth a try. Duplication can be avoided with macros, too; this one uses the same memory location:
PrintLine "This is a test"
PrintLine "This is a test"Still, it would make life easier to let the compiler organise that. Re direct literal string support, it's already implemented in the rv() macro that some of us use. Again, it would do no harm to add it.
These days I don't agree much with Hutch :(
However, on one point I am perfectly in line with him: Your compiler, pardon: assembler, is fine, what's missing is the codebase - macros and libraries.
Quote from: johnsa on September 05, 2016, 06:50:35 PM1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.
:t :t :t
After that this is possible too?
.data
msgA db "ANSI",13,10,0
msgW dw L"UNICODE",13,10,0
That is the intention yes, for data declaration as well as directly in invoke.
Possibly also direct use with opcodes like:
lea rax,"This is an ASCII string"
lea rdx,L"This is a unicode string"
Is it possible to avoid using those sub rsp/add rsp in every invoke with some option switch?
TestProc10:
000000013FC1181A 48 83 EC 28 sub rsp,28h ;HERE
000000013FC1181E 48 8D 04 24 lea rax,[rsp]
000000013FC11822 48 89 44 24 30 mov qword ptr [a],rax
000000013FC11827 C5 F8 10 44 24 30 vmovups xmm0,xmmword ptr [a]
000000013FC1182D 48 83 C4 28 add rsp,28h ;HERE
000000013FC11831 C3 ret
Are you referring to the above as part of the prologue/epilogue ?
HJWASM won't add any stack modification if you don't have/use any locals or arguments.
Removing it when there are locals/arguments wouldn't add any value as the stack references would need to use negative indices and may/probably be less performant than the single add/sub.. in addition it would make it very difficult to handle nested or recursive calls.
No, just a simple invoke of printf.
I compare hwjasm and poasm.
OPTION WIN64:2 is good for my tests.
Can you send an example of your printf invoke that's generating add/sub rsps?
thanks
Congratulations sir's habran and johnsa.
Downloading and trying.
option casemap :none
option epilogue:none
option prologue:none
OPTION WIN64:2
exit proto :dword
printf proto args:vararg
includelib msvcrt64.lib
.data
msg db "Hello msvcrt.dll",13,10,0
.code
mainCRTStartup proc
invoke printf,offset msg
invoke exit, 0
mainCRTStartup endp
end
That all seems very odd.. with some modification:
--- b.asm ----------------------------------------------------------------------
mainCRTStartup:
000000013F161010 48 83 EC 20 sub rsp,20h
000000013F161014 48 8D 0D E5 3F 00 00 lea rcx,[msg (013F165000h)]
000000013F16101B E8 14 10 00 00 call printf (013F162034h)
000000013F161020 33 C9 xor ecx,ecx
000000013F161022 E8 07 10 00 00 call exit (013F16202Eh)
the sub rsp,20h is the prologue for mainCRTStartup and has nothing to do with the invoke of printf.
You also used OFFSET instead of ADDR which wasn't generating the right code.
I also changed my source to msvcrt.lib (don't know if yours is actually called msvcrt64.lib)
I've set the HJWASM option's to the optimal, automatic frame, RSP for stackbase and win64:11:
.x64
option casemap:none
option win64:11
option frame:auto
option STACKBASE:RSP
exit proto :dword
printf proto args:vararg
includelib msvcrt.lib
.data
msg db "Hello msvcrt.dll",13,10,0
.code
mainCRTStartup proc
invoke printf,ADDR msg
invoke exit, 0
mainCRTStartup endp
end
This code i mean:
00000000 sub rsp, 20h ; 4883ec20
00000004 lea rcx, [rip+0h] ; 488d0d00000000
0000000b call printf ; e800000000
00000010 add rsp, 20h ; 4883c420
00000014 sub rsp, 20h ; 4883ec20
00000018 xor ecx, ecx ; 33c9
0000001a call exit ; e800000000
0000001f add rsp, 20h ; 4883c420
I tested naked function to see when it crash, so assembler intervention wasn't an option.
Right options found already.
Quote from: johnsa on September 06, 2016, 01:35:08 AMmainCRTStartup:
000000013F161010 48 83 EC 20 sub rsp,20h
000000013F161014 48 8D 0D E5 3F 00 00 lea rcx,[msg (013F165000h)]
000000013F16101B E8 14 10 00 00 call printf (013F162034h)
the sub rsp,20h is the prologue for mainCRTStartup and has nothing to do with the invoke of printf.
Are you sure? I thought the ABI requires that you grant 4 QWORDS of stack to printf ::)
include \Masm32\MasmBasic\Res\JBasic.inc ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
Init
int 3
jinvoke crt_printf, Chr$("Hello msvcrt.dll, 13, 10")
EndOfCode
translates to:
CC | int3 |
53 | push rbx |
53 | push rbx |
53 | push rbx |
48 8D 0D EF 1F 00 00 | lea rcx, qword ptr ds:[140003004] | 140003004:"Hello rt.dll, , "
51 | push rcx |
FF 15 E4 20 00 00 | call qword ptr ds:[<&printf>] |
48 83 C4 20 | add rsp, 20 |
53 | push rbx |
53 | push rbx |
53 | push rbx |
6A 00 | push 0 |
48 8B 0C 24 | mov rcx, qword ptr ss:[rsp] | correct the stack
FF 15 D9 20 00 00 | call qword ptr ds:[<&RtlExitUserProcess |
A different way to allocate the 4 QWORDs.
Yes, the minimum reservation for any non-leaf function should be 32 bytes (which is 20h).
There is no need to allocate stack space per call, instead it should be done once per PROC by examining all the invokes which occur inside that proc and determining the maximum reservation for all of them.
so for example if you had:
Main PROC
invoke a
invoke b,1,2,3,4
invoke c,1,2
ENDP
There would be a single sub rsp,20h (because no PROC used inside Main requires more than 4 arguments).
However, if you had:
Main PROC
invoke a
invoke b,1,2,3,4
invoke c,1,2,3,4,5
ENDP
you would get a single sub rsp,28h to account for the 5 arguments of function c, but to keep the stack xmmword aligned we'd round that up to 30h so that first local allocated in any proc was 16byte aligned too.
This is better for cache access to the stack, and also saves a load of unnecessary add/sub (or in your example pushes).
On a slightly unrelated note, we do a lot of these behind the scenes optimisations to setup the stack prologue/epilogue generation, managing alignment etc in hjwasm.. which is another reason that invoke is so important (I believe there was a comment on another thread about it ).. It allows the assembler to take care of things that would otherwise be extremely difficult or painful to manage manually (especially in 64bit), and would more than likely either land up with hard to find bugs or sub-optimal code.. and if writing assembly by hand lands up generating less optimal code than a C compiler .... then it really has lost any reason to exist!
This was one of the main drivers for having vectorcall support (apart from interop) .. we can't have a tool that doesnt' provide every opportunity to write the most performant code possible with the minimum of trouble when some silly old HLL compiler can do it! :)
QuoteAre you sure? I thought the ABI requires that you grant 4 QWORDS of stack to printf ::)
You are correct JJ :t
Because TWell is not using FRAME and PROLOGUE and he is using only OPTION WIN64:2, 20h is allocation of home space
for 4 registers shadows for invoke:
W64F_SAVEREGPARAMS = 0x01, /* 1=save register params in shadow space on proc entry */
W64F_AUTOSTACKSP = 0x02, /* 1=calculate required stack space for arguments of INVOKE */
W64F_STACKALIGN16 = 0x04, /* 1=stack variables are 16-byte aligned; added in v2.12 */
W64F_SMART = 0x08, /* 1=takes care of everything */
Long time ago I have written first STACKBASE:RSP and forced Japheth to implement it in official version, because I was pissed off with the stupid adding and subtracting 20h on every call of subroutine.
Now, hutch is having the same problem with ML64
So, my answer to your original question:
Is it possible to avoid using those sub rsp/add rsp in every invoke with some option switch?
Yes, use this in the beginning of your source:
option casemap:none
option win64:11 ;W64F_SAVEREGPARAMS+W64F_AUTOSTACKSP + W64F_SMART
option frame:auto ;this writes PROLOGUE and EPILOGUE for you automatically
option STACKBASE:RSP ;this allocates home space only ones for all invoke you use in a PROC
Quote from: habran on September 06, 2016, 06:07:45 AM
QuoteAre you sure? I thought the ABI requires that you grant 4 QWORDS of stack to printf ::)
You are correct JJ :t
Because TWell is not using FRAME and PROLOGUE and he is using only OPTION WIN64:2, 20h is allocation of home space
for 4 registers shadows for invoke:
W64F_SAVEREGPARAMS = 0x01, /* 1=save register params in shadow space on proc entry */
W64F_AUTOSTACKSP = 0x02, /* 1=calculate required stack space for arguments of INVOKE */
W64F_STACKALIGN16 = 0x04, /* 1=stack variables are 16-byte aligned; added in v2.12 */
W64F_SMART = 0x08, /* 1=takes care of everything */
Long time ago I have written first STACKBASE:RSP and forced Japheth to implement it in official version, because I was pissed off with the stupid adding and subtracting 20h on every call of subroutine.
Now, hutch is having the same problem with ML64
So, my answer to your original question:
Is it possible to avoid using those sub rsp/add rsp in every invoke with some option switch?
Yes, use this in the beginning of your source:
option casemap:none
option win64:11 ;W64F_SAVEREGPARAMS+W64F_AUTOSTACKSP + W64F_SMART
option frame:auto ;this writes PROLOGUE and EPILOGUE for you automatically
option STACKBASE:RSP ;this allocates home space only ones for all invoke you use in a PROC
I posted a bit further back the modified version to make use of these options, which generated the shorter/optimal proc with out all the add/sub'ishness.
Quote
.x64
option casemap:none
option win64:11
option frame:auto
option STACKBASE:RSP
exit proto :dword
printf proto args:vararg
includelib msvcrt.lib
.data
msg db "Hello msvcrt.dll",13,10,0
.code
mainCRTStartup proc
invoke printf,ADDR msg
invoke exit, 0
mainCRTStartup endp
end
Exactly as per my post :) Habran and I both insisted on this back while Japheth was still in charge of JWasm. It's exactly the same way a compiler would generate it, and without a tremendous amount of trouble (I'm not sure if at all) to implement this same sort of logic/optimisation from a macro would be impossible as it would require scoped knowledge of the PROC it's being used in and every other invoke within that same proc.. or you can go "low level" and roll this yourself in ML64 and manually update your stack allocation prologue everytime you add an invoke anywhere.. fun times .. not ;)
> Now, hutch is having the same problem with ML64
Only in the experimental stages, I have written a prologue/epilogue that is problem free and adjustable. Either high level code or it can be turned off for pure mnemonic code.
This empty proc,
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
ret
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
gives me this disassembly.
.text:0000000140001000 C8800000 enter 0x80, 0x0
.text:0000000140001004 4883EC40 sub rsp, 0x40
.text:0000000140001008 C9 leave
.text:0000000140001009 C3 ret
Both the ENTER 1st arg and the stack adjustment are adjustable and it has been super reliable.
How are you planning on handling the calculation of the correct amount to adjust RSP by ? using a forward reference to a total which is only determined once the epilogue is reached and back-filling it into the prologue ?
(I'm not sure if that can be done with macros even without trying, depending on the provision of it allowing the forward reference and order of macro expansion :) ) I guess you would have to completely replace ML64's invoke/prologue/epilogue generation and assuming the former idea worked you could possibly achieve the same sort of result as hjwasm invoke.
This high level code generates the following disassembly.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
WndProc proc hWin:QWORD,uMsg:QWORD,wParam:QWORD,lParam:WORD
LOCAL rct :RECT
LOCAL buffer[128]:BYTE
LOCAL pbuf :QWORD
ret
WndProc endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
empty WndProc
sub_14000102b proc
.text:000000014000102b C8800000 enter 0x80, 0x0
.text:000000014000102f 4881ECE0000000 sub rsp, 0xe0
.text:0000000140001036 48894D10 mov qword ptr [rbp+0x10], rcx
.text:000000014000103a 48895518 mov qword ptr [rbp+0x18], rdx
.text:000000014000103e 4C894520 mov qword ptr [rbp+0x20], r8
.text:0000000140001042 4C894D28 mov qword ptr [rbp+0x28], r9
.text:0000000140001046 C9 leave
.text:0000000140001047 C3 ret
sub_14000102b endp
An empty procedure with no stack frame generates the following.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
empty proc
ret
empty endp
STACKFRAME
empty NOSTACKFRAME proc
.text:000000014000104d
.text:000000014000104d 0x14000104d:
.text:000000014000104d C3 ret
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Quote
I guess you would have to completely replace ML64's invoke/prologue/epilogue generation and assuming the former idea worked you could possibly achieve the same sort of result as hjwasm invoke.
ML64 comes unconfigured so you have no choice other than to write a prologue/epilogue and an automated call notation, "invoke" being the most common. It calculate the byte count for the locals then subtracts from RSP while maintaining the correct alignment to provide the LOCAL space. Passed arguments are addressed above RSP.
One more.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
stackframe_dbgdata equ <1> ; turn on stackframe output
stackframe_default equ <256> ; increase default stack size
stackframe_dynamic equ <512> ; increase ENTER dynamic size
STACKFRAME
testproc proc a1:QWORD,a2:QWORD,a3:QWORD,a4:QWORD,a5:QWORD
LOCAL var1 :QWORD
LOCAL var2 :QWORD
LOCAL var3 :QWORD
LOCAL var4 :QWORD
LOCAL var5 :QWORD
LOCAL var6 :QWORD
LOCAL var7 :QWORD
LOCAL var8 :QWORD
nop
ret
testproc endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
. ****************************
. PROLOGUE testproc
. arg count = 5
. local bytes = 64
. ****************************
sub_14000104d proc
.text:000000014000104d C3 ret
.text:000000014000104e C8000200 enter 0x200, 0x0
.text:0000000140001052 4881EC40010000 sub rsp, 0x140
.text:0000000140001059 48894D10 mov qword ptr [rbp+0x10], rcx
.text:000000014000105d 48895518 mov qword ptr [rbp+0x18], rdx
.text:0000000140001061 4C894520 mov qword ptr [rbp+0x20], r8
.text:0000000140001065 4C894D28 mov qword ptr [rbp+0x28], r9
.text:0000000140001069 90 nop
.text:000000014000106a C9 leave
.text:000000014000106b C3 ret
sub_14000104d endp
Quote from: johnsa on September 06, 2016, 06:10:52 AMwithout a tremendous amount of trouble (I'm not sure if at all) to implement this same sort of logic/optimisation from a macro would be impossible as it would require scoped knowledge of the PROC it's being used in and every other invoke within that same proc..
Quote from: johnsa on September 06, 2016, 06:32:06 AMHow are you planning on handling the calculation of the correct amount to adjust RSP by ? using a forward reference to a total which is only determined once the epilogue is reached and back-filling it into the prologue ?
It is not that difficult, actually: You start with a reasonable default (e.g. 12 args as in CreateWindowEx), and if that's not enough, let the EPILOGUE macro tell the user that he must manually increase the reserved stack. Plus, if user is scared of running out of stack, e.g. in a recursive proc, he can start with a very low value, and let the epilogue macro inform him how much is really needed. Manual intervention will be a very rare case.
I had it running already, but, see here (http://masm32.com/board/index.php?topic=5528.msg60079#msg60079), there is a really weird behaviour of the PROLOGUE macro that I am fighting with right now. Maybe one of the masters of the Watcom universe has a clue what happens there. Plain Masm32 test case attached - the Watcom assemblers give also a wrong line count, see @Line in the attached source.
Hi,
Just to let you all know .. I spotted a bug in the new invoke code when using parameter types like [rsi], [reg+ofs] and [reg].struct.member on fastcall procedures.. I've fixed this and updated both the repository and packages on the site so there's an update for you dated 6/9/2016.
John
Perhaps you should post a more recent version here (http://www.terraspace.co.uk/hjwasm.html#p2):
HJWasm 2.15 (32bit) 6/08/2016 hjwasm215_x86.zip 32bit Binary Package (Windows)
HJWasm 2.15 (64bit) 6/08/2016 hjwasm215_x64.zip 64bit Binary Package (Windows)
6 August is one month ago 8)
arghh... dumb typo, fixing now :) thanks for spotting!
No problem, I saw the correct timestamp in the zip archive :P
There is still the prolog issue. What I found out is that
- the prolog macro kicks in only when the assembler hits the first real instruction after the locals (ML and Watcom)
- within the prolog macro, the @Line macro yields the sometest proc line in ML but the line after the locals plus 5 in the Watcom family.
Attached new plain Masm32 testbed. I assume it would be the same for 64-bit code.
I've found the bits of code responsible for it.. I'm just not sure 100% yet what to do about it.. ::)
Technically the macro is being run in the same place by both ML and WATC family and our line number is more "correct" than ML's (not to say there isn't a reason for a custom prologue to do this)
so can you help me understand why you need the line number at time of prologue execution == the proc line ? (maybe it will help decide how best to change it)
As a fix.. I've implemented a new built-in equate @ProcLine
which gives us:
SomeProlog MACRO procname, flags, argbytes, localbytes, reglist, userparms
Local tmp$, up$, alignOK, alignBad, is, alignedUses
pLine=@Line
% echo ## prologue of procname ##
hello$ equ <The prologue macro has changed the string>
push ebp
mov ebp, esp ; create frame
up$ CATSTR <userparms>, < >
tmp$ CATSTR <line >, %@Line, < (should be line 24, ok with ML but not with Watcom family)>
% echo tmp$
tmp$ CATSTR <## PROLOG, line >, %@Line, <: &procname>, <: args+locals=>, <argbytes>, <+>, <localbytes>, <=>, %(argbytes+localbytes), <, _flags=>, <flags>, <, _userparms=>, <userparms>
% echo tmp$
% echo ## end of procname prologue ##
IFDEF @ProcLine
tmp$ CATSTR <line >, %@Line, < - proc line >, %@ProcLine
ELSE
tmp$ CATSTR <line >, %@Line, < - proc line >, %@Line
ENDIF
% echo tmp$
EXITM %localbytes
ENDM
Which gives the correct results then for both ML and HJWASM.
If this works for you I will update source/packages?
I've uploaded this change, give it a try.
You can now use @ProcLine or @Line (just use it inside the IFDEF) and it should work out totally MASM compatible and HJWASM.
Cheers
John
Quote from: johnsa on September 07, 2016, 07:29:00 AMso can you help me understand why you need the line number at time of prologue execution == the proc line ? (maybe it will help decide how best to change it)
Thanks for providing a ML compatible solution. The reason why I am using this is a bit complicated: The jinvoke macro can produce 64-bit code in two ways,
- by pushing 4 times args and dummy args
- by moving args into their stack locations
The switch is the
cs in the prolog macro's userparms, i.e.
WndProc proc <cb cs> uses rsi rdi rbx hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
LOCAL ps:PAINTSTRUCT
cmp uMsg, WM_CREATE
jne not_create
cs stands for "compiler style". If the prolog macro sees
cs, it allocates stack for all jinvokes that follow and sets a flag; if not, the jinvoke macro must create the 4*QWORD stack every time.
To achieve this, prolog contains
jbCompStyle INSTR <userparms>, <cs>Now, the only problem is that this variable, meant for jinvoke, is not yet set if, and only if, jinvoke is the first command after the LOCALs. Therefore I wanted to check, in jinvoke, if it is the first command, and force an error ("put a nop before jinvoke") if that is the case.
Now this is the reason behind the wish to have ML compatibility. What strikes me, though, is that this check regards the last Local line, not the proc line - which is closer to the Watcom behaviour than to ML ::)
I know this is a very exotic request, so don't take it too seriously :P