News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Crashes in HJWASM but works well in JWASM

Started by aw27, March 01, 2017, 04:37:40 AM

Previous topic - Next topic

aw27

Hello developers,

I am trying to compile with HJWASM a large program previously compiled with JWASM with success.
I found an issue, which I hope will be clear with the following:

Source code:
"
option frame:auto
OPTION WIN64:6

.code

sub1 proc public arg1:ptr, arg2:ptr

   ret
sub1 endp

sub2 proc public arg1:ptr, arg2:ptr

   ret
sub2 endp

proc1 proc public FRAME uses xmm8 xmm9 arg1:qword, arg2:qword, arg3 :qword
   mov r9, rcx
   mov r10, rdx
   mov r11, r8
   
   INVOKE sub1, r10, r8
   INVOKE sub2, r9, r11
   mov rax, r9
   ret
proc1 endp

end

"
Command line:
hjwasm64 -c -win64 -Zp8 test.asm

The above proc1 compiles with HJWASM to:
proc1:
000000013FBF16A4  push        rbp 
000000013FBF16A5  mov         rbp,rsp 
000000013FBF16A8  sub         rsp,20h 
000000013FBF16AC  sub         rsp,40h 
000000013FBF16B0  vmovdqu     xmmword ptr [rsp+40h],xmm8 
000000013FBF16B6  vmovdqu     xmmword ptr [rsp+50h],xmm9 
000000013FBF16BC  mov         r9,rcx 
000000013FBF16BF  mov         r10,rdx 
000000013FBF16C2  mov         r11,r8 
000000013FBF16C5  mov         rcx,r10 
000000013FBF16C8  mov         rdx,r8 
000000013FBF16CB  call        sub1 (13FBF1690h) 
000000013FBF16D0  mov         rcx,r9 
000000013FBF16D3  mov         rdx,r11 
000000013FBF16D6  call        sub2 (13FBF169Ah) 
000000013FBF16DB  mov         rax,r9 
000000013FBF16DE  vmovdqu     xmm8,xmmword ptr [rsp+40h] 
000000013FBF16E4  vmovdqu     xmm9,xmmword ptr [rsp+50h] 
000000013FBF16EA  add         rsp,40h 
000000013FBF16EE  pop         rbp 
000000013FBF16EF  ret 

and with JWASM to:
000000013F3016AC  push        rbp 
000000013F3016AD  mov         rbp,rsp 
000000013F3016B0  sub         rsp,40h 
000000013F3016B4  movdqa      xmmword ptr [rsp+20h],xmm8 
000000013F3016BB  movdqa      xmmword ptr [rsp+30h],xmm9 
000000013F3016C2  mov         r9,rcx 
000000013F3016C5  mov         r10,rdx 
000000013F3016C8  mov         r11,r8 
000000013F3016CB  mov         rcx,r10 
000000013F3016CE  mov         rdx,r8 
000000013F3016D1  call        sub1 (13F301690h) 
000000013F3016D6  mov         rcx,r9 
000000013F3016D9  mov         rdx,r11 
000000013F3016DC  call        sub2 (13F30169Eh) 
000000013F3016E1  mov         rax,r9 
000000013F3016E4  movdqa      xmm8,xmmword ptr [rsp+20h] 
000000013F3016EB  movdqa      xmm9,xmmword ptr [rsp+30h] 
000000013F3016F2  add         rsp,40h 
000000013F3016F6  pop         rbp 
000000013F3016F7  ret

So the stack becomes corrupted with HJWASM.

AW27






coder

Not much into HJWASM but I'm wondering about this;

000000013FBF16A8  sub         rsp,20h

not being properly restored at the epilogue? Maybe I am wrong.

habran

Hi aw27,
It is good find of the error which will be fixed in next release :biggrin:
However, it was overseen because we would never use such a combination of win64 flags
In the case below I have used :
option frame:auto
OPTION WIN64:3

Unless you know exactly what are you doing, I would suggest you to use:
option casemap:none      ; causes internal symbol recognition to be case sensitive
option frame:auto           ; generate SEH-compatible prologues and epilogues
option win64:11              ; reserve stack space once per procedure, save registers, calculate required stack space
option STACKBASE:RSP   ; use rsp as a stack base instead of rbp

values for the option win64 are as follow:
enum win64_flag_values {
    W64F_SAVEREGPARAMS = 0x01, /* 1=save register params in shadow space on proc entry */
    W64F_AUTOSTACKSP     = 0x02, /* 1=calculate required stack space for arguments of INVOKE */
    W64F_STACKALIGN16    = 0x04, /* 1=stack variables are 16-byte aligned; added in v2.12 */
    W64F_SMART                = 0x08, /* 1=takes care of everything */
    .....
    .....
};
So, option win64:11 is equivalent of W64F_SAVEREGPARAMS + W64F_AUTOSTACKSP + W64F_SMART




    16: proc1 proc public FRAME uses xmm8 xmm9 arg1:qword, arg2:qword, arg3 :qword
00007FF70377103A 48 83 EC 48          sub         rsp,48h 
00007FF70377103E C5 7A 7F 44 24 20    vmovdqu     xmmword ptr [rsp+20h],xmm8 
00007FF703771044 C5 7A 7F 4C 24 30    vmovdqu     xmmword ptr [rsp+30h],xmm9 
    17:    mov r9, rcx
00007FF70377104A 4C 8B C9             mov         r9,rcx 
    18:    mov r10, rdx
00007FF70377104D 4C 8B D2             mov         r10,rdx 
    19:    mov r11, r8
00007FF703771050 4D 8B D8             mov         r11,r8 
    20:   
    21:    INVOKE sub1, r10, r8
00007FF703771053 49 8B CA             mov         rcx,r10 
00007FF703771056 49 8B D0             mov         rdx,r8 
00007FF703771059 E8 D2 FF FF FF       call        sub1 (07FF703771030h) 
    22:    INVOKE sub2, r9, r11
00007FF70377105E 49 8B C9             mov         rcx,r9 
00007FF703771061 49 8B D3             mov         rdx,r11 
00007FF703771064 E8 CC FF FF FF       call        sub2 (07FF703771035h) 
    23:    mov rax, r9
00007FF703771069 49 8B C1             mov         rax,r9 
    24:    ret
00007FF70377106C C5 7A 6F 44 24 20    vmovdqu     xmm8,xmmword ptr [rsp+20h] 
00007FF703771072 C5 7A 6F 4C 24 30    vmovdqu     xmm9,xmmword ptr [rsp+30h] 
00007FF703771078 48 83 C4 48          add         rsp,48h 
00007FF70377107C C3                   ret


Cod-Father

habran

The bug is exterminated now, thanks aw27 :icon14:
Until next release make sure that the flag W64F_SAVEREGPARAMS is set.
Cod-Father

aw27

Hi habran,

It is good to know you are working on this, I will test again when you are done.

I have used OPTION WIN64:6 because
1) I need stack variables to be 16-byte stack aligned because there are many SSE instructions in the procedures. I know the 1st stack variable is always 16-byte stack aligned but if the second variable is for example a REAL4, the 3rd variable will not be anymore 16-byte stack aligned. I know, I can move around the variables by hand, but I found this is safer. This is how it worked in JWASM, not sure if is different in HJWASM. BTW, why are you using "vmovdqu" to save xmm registers if the memory is guaranteed to be 16-byte aligned at that point?
2) I am not using W64F_SAVEREGPARAMS because in many small procedures it is a waste of time to save the parameters. And my program is large and has procedures of all kinds.
3) Thank you for the hint about STACKBASE:RSP, I will check that out.

AW27

habran

Hi aw27,
If you need 16 byte alignment you can use OPTION WIN64:15.
and of course, option STACKBASE:RSP
if you use OPTION WIN64:11 or in your case OPTION WIN64:15 you don't have to worry about anything because HJWasm will not create stack frame if it is not needed and will store register in a home space only if parameter is used in the function.
I would suggest you to read HJWasm Extended Guide about its features.
Cod-Father

aw27

Thank you, I will check all again on the next release. :t

aw27

I tested the latest release 2.20, if you remove the INVOKEs of my previous example, i.e:

proc1 proc public FRAME uses xmm8 xmm9 arg1:qword, arg2:qword, arg3 :qword
   mov r9, rcx
   mov r10, rdx
   mov r11, r8
   
   ; INVOKE sub1, r10, r8
   ; INVOKE sub2, r9, r11
   mov rax, r9
   ret
proc1 endp

It will compile to:
roc1:
000000013F0516A4  push        rbp 
000000013F0516A5  mov         rbp,rsp 
000000013F0516A8  mov         r9,rcx 
000000013F0516AB  mov         r10,rdx 
000000013F0516AE  mov         r11,r8 
000000013F0516B1  mov         rax,r9 
000000013F0516B4  vmovdqu     xmm8,xmmword ptr [rsp] 
000000013F0516B9  vmovdqu     xmm9,xmmword ptr [rsp+10h] 
000000013F0516BF  add         rsp,28h 
000000013F0516C3  pop         rbp 
000000013F0516C4  ret 

And you know the end result.
I would like to use HJWASM, but I have no much hope.

jj2007

Quote from: aw27 on March 02, 2017, 05:23:43 PMI know the 1st stack variable is always 16-byte stack aligned but if the second variable is for example a REAL4, the 3rd variable will not be anymore 16-byte stack aligned. I know, I can move around the variables by hand, ...

I hope Johnsa & Habran will fix the bug, but in the meantime, if that is a serious issue, why not use a local structure? If its beginning is aligned 16, you have full control over the rest.

aw27

"if that is a serious issue, why not use a local structure"

The point is not that, it compiles and works with JWasm without a glitch. It is a big library of more than 540 functions I put recently in Codeproject.com. I am sure that I could find a way for it to compile and work with HJWasm, but is it worthwhile? Not convinced, although I found some good points in the specification. Wait and see.

jj2007

Quote from: aw27 on March 14, 2017, 09:28:55 PMIt is a big library of more than 540 functions I put recently in Codeproject.com.

Compliments, José, that looks like a big project :t


johnsa

We're busy putting together a fix for this.

Alignment to 16 will always be just the first local, (at least for the near future) as allocating 16bytes per local is wasteful.

vmovdqu is used instead of the aligned equivalent as the same prolog/epilog code is generated to handle not only xmm but ymm and zmm too, and we're not going to align stack to 32/64 byte.
We can special case xmm and align it specifically, but it shouldn't in theory be required as most newer processors automatically implement the faster aligned path micro-architecturally for unaligned instructions if possible.

Given that you're building a math lib (and thusly using vectors/matrices etc) I would strongly suggest looking at vectorcall. We added it in a while back specifically for this type of work to avoid the massive overhead of passing simd types by ref/memory all the time. It's all in the extended manual.

Will have the update ready shortly.

Cheers
John

jj2007

Quote from: johnsa on March 15, 2017, 09:00:19 PMallocating 16bytes per local is wasteful

Correct. One could limit the align 16 to OWORD variables, though. Or insert an align 16 in the right place:

include \Masm32\MasmBasic\Res\JBasic.inc      ; part of MasmBasic

.code
MyTest proc <cb> hwnd
LOCAL MyDword:DWORD
align 16
LOCAL MyOword:XMMWORD

  lea rax, MyDword
  Print Str$("MyDword:\t%x\n", rax)

  lea rax, MyOword
  Print Str$("MyOword:\t%x\n", rax)

  ret
MyTest endp

Init            ; OPT_64 1      ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
  jinvoke MyTest, 123

EndOfCode


Surprisingly enough, the assemblers tested allow this syntax 8)

Problem is that it's misleading - here are the results:
This code was assembled with AsmC in 64-bit format
MyDword:        12fedc
MyOword:        12fec8

This code was assembled with ml64 in 64-bit format
MyDword:        12fedc
MyOword:        12fecc

This code was assembled with HJWasm32 in 64-bit format
MyDword:        12fedc
MyOword:        12fec8

johnsa

I did think about that, automatically re-arranging the simd aligned items to occur first, which is totally do-able, but I'm not sure I like the idea of the assembler doing it's own thing that much.
I like to know it does what I tell it, if I want things aligned i'll keep them at the front of the list.. sometimes i like to keep variables initially grouped by usage, and then when optimising i'll re-arrange them for alignment / locality.

the other option would be to have something which aligns just the specified local, but we'd need to think syntax wise how that should look :

Specifically to allow vectorcall to work properly hjwasm has built-in types for __m128, __m256 etc.. which allow it to know if you're talking about a vector rather than an HFA(Homogenous float aggregate) as they're handled differently in the calling convention.

So those types are what are used to define proc arguments, locals etc instead of the old fashion oword/xmmword etc. (Plus they have the union include file .. which will be even more useful with HJwasm 2.21 when we extend the union to allow it to be initialised with any of it's component structs not just the first one.. )

We could make it the case where __m128 is specifically aligned when used

so

LOCAL myVector:OWORD
LOCAL myVector:XMMWORD and so on wouldn't be aligned unless by virtue of the first item

but

LOCAL myVector:__m128

would be aligned no matter where in the list it's used.