Masm32 SDK description, downloads and other helpful links
Started by aw27, March 01, 2017, 04:37:40 AM
Quote from: jj2007 on March 16, 2017, 05:14:59 AMQuote from: aw27 on March 16, 2017, 04:58:04 AMit should handle better the INVOKE parameters in x64 code than JWasm. The first time I compiled with HJWasm I obtained smaller codeThe x64 ABI is not exactly user-friendly; it took me some time to understand it. There are some tricks to get smaller code, and there are also some people who bark at you if you dare to favour size over speed. Perhaps it would help if you posted one or two examples where different coding styles make a difference for your project, size- or speed-wise. A lot can be done inside the PROLOGUE macro btw.
Quote from: aw27 on March 16, 2017, 04:58:04 AMit should handle better the INVOKE parameters in x64 code than JWasm. The first time I compiled with HJWasm I obtained smaller code
Quote from: aw27 on March 16, 2017, 05:25:44 AMI would like to be able to do something like INVOKE Func, xmm0, xmm1, xmm2, r9
MyFunc proc arg1:???, arg2:???, arg3:???, arg4
000000014000102C | 48 8B 45 64 | mov rax, qword ptr ss:[rbp+64] |0000000140001030 | 48 8B 44 24 64 | mov rax, qword ptr ss:[rsp+64] |
QuoteFor example I would like to be able to do something like INVOKE Func, xmm0, xmm1, xmm2, r9 but is not possible with JWasm. I would have to call something like INVOKE Func, rcx, rdx. r8, r9 even though rcx, rdx and r8 are not used in that call.
Quote from: coder on March 16, 2017, 01:34:38 PMIn 64-bit assembly, the best calling convention is MOV + CALL. Can't go wrong with it :icon_cool:
option frame:auto TXMMATRIX struct r0 XMMWORD ? r1 XMMWORD ? r2 XMMWORD ? r3 XMMWORD ?TXMMATRIX ends_XMVECTORSET MACRO r, float1, float2, float3, float4 movss xmm0, float1 movss xmm1, float2 movss xmm2, float3 movss xmm3, float4 unpcklps xmm1,xmm3 unpcklps xmm2,xmm0 unpcklps xmm1, xmm2 lea r10, [rcx].r movups XMMWORD ptr [r10], xmm1ENDM.codeXMMatrixSet proc public retVal:QWORD, dumbpar1:QWORD, dumbpar2:QWORD, dumbpar3:QWORD, mm03: REAL4, mm10: REAL4, mm11: REAL4, mm12: REAL4, mm13: REAL4, mm20: REAL4, mm21: REAL4, mm22: REAL4, mm23: REAL4, mm30: REAL4, mm31: REAL4, mm32: REAL4, mm33: REAL4 movss xmm0, mm03 unpcklps xmm2,xmm0 unpcklps xmm3,xmm1 unpcklps xmm2, xmm3 ASSUME rcx : ptr TXMMATRIX lea r10, [rcx].r0 movups XMMWORD ptr [r10], xmm2 _XMVECTORSET r1, mm10, mm11, mm12, mm13 _XMVECTORSET r2, mm20, mm21, mm22, mm23 _XMVECTORSET r3, mm30, mm31, mm32, mm33 ASSUME rcx : NOTHING mov rax, rcx retXMMatrixSet endpproc1 proc public LOCAL M : TXMMATRIX mov eax, 0.1 movd xmm1, eax movd xmm2, eax movd xmm3, eax INVOKE XMMatrixSet, addr M, rdx, r8, r9, 0.1, 0.2,0.2,0.2, 0.2, 0.3, 0.3, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4 ; do other stuff ; ...... ; end other stuff retproc1 endp
proc1:000000013F201726 push rbp 000000013F201727 mov rbp,rsp 000000013F20172A sub rsp,40h 000000013F20172E mov eax,3DCCCCCDh 000000013F201733 movd xmm1,eax 000000013F201737 movd xmm2,eax 000000013F20173B movd xmm3,eax 000000013F20173F sub rsp,90h 000000013F201746 lea rcx,[rbp-40h] 000000013F20174A mov dword ptr [rsp+20h],3DCCCCCDh 000000013F201752 mov dword ptr [rsp+28h],3E4CCCCDh 000000013F20175A mov dword ptr [rsp+30h],3E4CCCCDh 000000013F201762 mov dword ptr [rsp+38h],3E4CCCCDh 000000013F20176A mov dword ptr [rsp+40h],3E4CCCCDh 000000013F201772 mov dword ptr [rsp+48h],3E99999Ah 000000013F20177A mov dword ptr [rsp+50h],3E99999Ah 000000013F201782 mov dword ptr [rsp+58h],3E99999Ah 000000013F20178A mov dword ptr [rsp+60h],3E99999Ah 000000013F201792 mov dword ptr [rsp+68h],3ECCCCCDh 000000013F20179A mov dword ptr [rsp+70h],3ECCCCCDh 000000013F2017A2 mov dword ptr [rsp+78h],3ECCCCCDh 000000013F2017AA mov dword ptr [rsp+80h],3ECCCCCDh 000000013F2017B5 call XMMatrixSet (13F201690h) 000000013F2017BA add rsp,90h 000000013F2017C1 leave 000000013F2017C2 ret
Quote from: hutch-- on March 16, 2017, 10:33:13 AMThere appears to be some redundancy in this desire, why would you use an "invoke" call when there are no memory operands involved in the argument list ? You don't really want to use the stack as it involves writing to stack memory on call and at the procedure level the called proc must then translate the stack addresses back to different sized registers. Without the extra clutter the proposed form,INVOKE Func, xmm0, xmm1, xmm2, r9would simply be with the registers loaded with whatever required values,Code Select Expandcall FuncNow given that in most instances the data for xmm, ymm registers must come from somewhere in the application and for performance reasons that memory must be aligned correctly, if you don't want to load different sized registers directly, you pass the addresses of the data items as 64 bit pointers in the normal manner.The option of a macro something like "regcall" would also do the job but only again if you were performing the double process of loading registers first then calling the procedure.
Quote from: jj2007 on March 16, 2017, 06:08:42 AMWould MyFunc move xmm0 into the stack, or use it directly?
Quote from: aw27 on March 16, 2017, 03:50:52 PMI could not disagree more. The real problem is to align the stack, specially when you have a lot of parameters in your call. It is good to know that INVOKE does all the calculations for us.Look at the following code where from proc1 you will set the 16 values of a 4x4 matrix, where the first row is will be all 0.1, the second all 0.2, the third all 0.3 and the fourth all 0.4.
Quote from: johnsa on March 16, 2017, 08:47:19 PMwe've tried to get as close to that as possible with hjwasm, especially using stackbase:rsp / option win64:11I think with this addition of xmm regs to invoke for float arguments and it should be pretty much bang on. It tries to make the invoke/prologue/epilogue generation as optimal as possible and deal with removing anything unused or not needed while supporting all the many combinations.
Quote from: coder on March 16, 2017, 08:37:46 PMIt is not about aligning the stack. IMHO, what you need exactly is custom-built PROC and INVOKE for your own specific needs because AFAIK, there's no single PROC/INVOKE set out there that has the fits-all capability when dealing with uneven parameters. Not even from the likes of NASM and FASM. Of course it can be done with macros but the overhead may outwiegh its benefits.
Quote from: johnsa on March 16, 2017, 08:08:39 PMSo we have quite a few things on the list now worthy to promote it as 2.21.. the list of changes are:1) Fix aw27's bug with sub rsp,82) Double check local alignments to 163) Add arch flag to allow generated code to use either sse or avx4) Support xmm reg type arguments to invoke in fastcall x64
Quote from: coder on March 16, 2017, 08:59:27 PMQuote from: johnsa on March 16, 2017, 08:47:19 PMwe've tried to get as close to that as possible with hjwasm, especially using stackbase:rsp / option win64:11I think with this addition of xmm regs to invoke for float arguments and it should be pretty much bang on. It tries to make the invoke/prologue/epilogue generation as optimal as possible and deal with removing anything unused or not needed while supporting all the many combinations.In other words, HJWASM is trying to anticipate all other custom needs of the users. For how long and how far can you go with it? That beats one design idea of MS 64-ABI - that most of the pre-entry works (alignment, saving volatiles etc) are the responsibility of the user codes / callers and not the modules. Excessive wrappings and abstracting may jeopardize stability and portability in the long run. Have nothing against HJWASM. You guys are doing great job, but the limit must be set somewhere.