The MASM Forum

64 bit assembler => 64 bit assembler. Conceptual Issues => Topic started by: habran on June 14, 2013, 03:39:17 PM

Title: Static RSP built in JWasm
Post by: habran on June 14, 2013, 03:39:17 PM
I am back as I promissed 8)

I have succeeded to build in JWasm STATIC RSP and it works fine
how it works?
use these options

option win64:7
option frame:auto


why all this?

because RSP is static we can calculate params and locals from RSP register, rather than use RBP
and we get awarded with free to use RBP register!!! :bgrin:

In the attached folder are changed sources and 64 bit JWasm.exe

here are some examples:


testproc4 PROC FRAME
LOCAL z:DWORD   
   
mov eax,22 
mov z,eax
ret
testproc4 endp

000000013F02116E  sub         rsp,8 
000000013F021172  mov         eax,16h 
000000013F021177  mov         dword ptr [rsp],eax 
000000013F02117A  add         rsp,8 
000000013F02117E  ret
Title: Re: Static RSP built in JWasm
Post by: johnsa on June 15, 2013, 12:55:39 AM
Nice work! :) Another register back helps given how many 64bit fastcall abi is trashing all over the place.
Title: Re: Static RSP built in JWasm
Post by: habran on June 15, 2013, 05:59:48 AM
thanks johnsa :biggrin:
with this we are getting not only RBP back but code are more compact because of using the home space to store
registers

I have to remind you that RBP register is saved by default and is ready to use any time (that helps for the stack alignment)

I am working on another version where RBP will not be saved
It is working already but there is still some problem not solved

the JWasm version in a folder goes together with the .FOR feature

this is now most advanced assembler in the Universe :t
Title: Re: Static RSP built in JWasm
Post by: habran on June 19, 2013, 09:14:44 PM
I succeeded to build the version without pushing RBP which is more similar to C prologue :t
I will just test it a little bit before I post it here :biggrin:
Title: Re: Static RSP built in JWasm
Post by: habran on June 22, 2013, 07:09:54 AM
here it is 8)

johnsa worked together with me on debug info and Japheth gave some hints how to make debug to work :t

now both versions work correctly with debug info for locals and params

for now only MSVC12 understand debug info

I have replaced the folder at the top with the new version with a debug working
Title: Re: Static RSP built in JWasm
Post by: habran on June 25, 2013, 05:55:44 AM
there was some problem with pushed rbp version to read locals correctly :icon_redface:
both version had a problem with a second RET in a function because of:
      CurrProc->e.procinfo->stored_reg=0;
      CurrProc->e.procinfo->pushed_reg=0;
when I removed it everything worked fine
I have fixed it and replaced both versions 8)
now both version are working beautifully :t 
Title: Re: Static RSP built in JWasm
Post by: habran on June 26, 2013, 09:34:57 PM
this last version with no RBP saved is now OK :t
there was still necessary some tweaking
please test it :biggrin:

Title: Re: Static RSP built in JWasm
Post by: japheth on June 29, 2013, 02:26:14 PM

Cool!

However, if a "frame pointer omission" feature is to be added to jwasm, it should be implemented for 32-bit as well.

This makes the user interface of this implementation a bit unfortunate, because the flags in OPTION:WIN64 are intended for the Win64 ABI only.

Title: Re: Static RSP built in JWasm
Post by: habran on June 29, 2013, 02:57:24 PM
thanks japheth for flowers :biggrin:

I am still working on debug info for different debuggers
I just figured out how to make it for MSVC 2005
but I would like to do that for WinDbg 6.11 and 6.12
I am planing to make switches for different debuggers 
Quote from: japheth on June 29, 2013, 02:26:14 PM
However, if a "frame pointer omission" feature is to be added to jwasm, it should be implemented for 32-bit as well.
I am a little bit skeptic about 32 bit "frame pointer omission" because
people who have already written and tested programs in 32 bit will not
go through all of they program to get read of push/pop commands
and for writing new programs it would be like going in the past
IMO new programs should be written in 64 bit where ever possible

however, if people think that it is good idea it will be no big dill to tweak a bit all together

anyway, thank you japheth for taking time to go through source :t

I am pretty happy how it all together works :bgrin:
Title: Re: Static RSP built in JWasm
Post by: japheth on June 29, 2013, 06:44:31 PM
Quote from: habran on June 29, 2013, 02:57:24 PM
because people who have already written and tested programs in 32 bit will not
go through all of they program to get rid of push/pop commands

It's a matter of course that a frame pointer omission feature for 32-bit must be able to allow PUSH and POP instructions.

That's not impossible, but requires a bit more work. Evidently, since the assembler itself cannot reliably track PUSH and POP inside a procedure, it's the programmer's duty to do this - all the assembler can do is to "assist".

The preferred approach is a new option that will allow a more comprehensive control of how stack variables are accessed.  Additionally a few macros that track the current value of ESP by renaming a few instructions with OPTION RENAMEKEYWORD.

Quote
IMO new programs should be written in 64 bit where ever possible

I still prefer my good, old 32-bit XP.  :icon_cool:

Title: Re: Static RSP built in JWasm
Post by: habran on June 29, 2013, 07:43:17 PM
OPTION RENAMEKEYWORD is good idea :idea:
I agree with you that it would be possible to make it work
and it would be more sophisticated than 64 bit
however, you are on holidays from JWasm, remember ;)

perhaps, because this version is dedicated to Margaret Thatcher
you don't want to "thatch" it any more :biggrin:

QuoteI still prefer my good, old 32-bit XP. :icon_cool:

I still love Commodore 64, unfortunately it is obsolete now :(

I will suggest a proposal here:

If at least five members from this forum vote for 32 bit "static stack", than it is worth to build it in JWasm 
than we will talk about who will implement it :biggrin:

 
Title: Re: Static RSP built in JWasm
Post by: japheth on June 29, 2013, 09:37:56 PM
Quote from: habran on June 29, 2013, 07:43:17 PM
If at least five members from this forum vote for 32 bit "static stack", than it is worth to build it in JWasm 
than we will talk about who will implement it :biggrin:

I'm afraid there aren't five members here who know what we're talking about.
Title: Re: Static RSP built in JWasm
Post by: habran on June 29, 2013, 10:12:13 PM
you are exaggerating :shock:

If so, why would we go through the trouble to build it :(
Title: Re: Static RSP built in JWasm
Post by: sinsi on June 29, 2013, 10:17:04 PM
I'm afraid there aren't five members here who know care what we're talking about.

:biggrin:
Title: Re: Static RSP built in JWasm
Post by: habran on June 29, 2013, 11:22:27 PM
what about you sinsi, Amberman?
do you know care what we're talking about? :biggrin:
Title: Re: Static RSP built in JWasm
Post by: japheth on June 29, 2013, 11:36:51 PM
Quote from: habran on June 29, 2013, 10:12:13 PM
If so, why would we go through the trouble to build it :(

:bgrin:

Questions about causality are my favorites. Perhaps because

Title: Re: Static RSP built in JWasm
Post by: habran on June 29, 2013, 11:42:49 PM
now you are talking... :t
I was just testing you ;)
so, what do you recon?
who will go through the trouble?
you, me or both? :biggrin:
Title: Re: Static RSP built in JWasm
Post by: nidud on June 29, 2013, 11:44:44 PM
deleted
Title: Re: Static RSP built in JWasm
Post by: jj2007 on June 30, 2013, 12:50:15 AM
Quote from: sinsi on June 29, 2013, 10:17:04 PMI'm afraid there aren't five members here who know care what we're talking about.

Is it worth the effort? I can't speak for 64-bit code, but in 32-bit...
- [esp+n] instructions are longer than their [ebp+n] equivalents
- yes you can trace push & pop, even with simple macros, but it gets nasty if they happen inside branches or .if .elseif constructs
- performance is not a valid argument, because a) innermost loops should not call or invoke any code and b) if you really, really need ebp as an extra reg32, just push it before you enter the innermost loop.

So where is the real added value...?
Title: Re: Static RSP built in JWasm
Post by: habran on June 30, 2013, 01:30:51 PM
hey nidud
you have some good pints there :icon14:

hey jj2007,
where is your enthusiasm gone ;)

so, japheth
we have one person who is against (jj2007, as expected) :icon13:
one person who is for (nidud, as expected) :icon14:
and one who doesn't care (sinsi, unpredictable) :P

are you still on holidays or you want to role the slews?
Title: Re: Static RSP built in JWasm
Post by: habran on June 30, 2013, 01:44:43 PM
hey japheth
I have figured out the switches for different debuggers
here are:
/* codeview debug info option flags */
enum cvoption_flags {
        CVO_STATICTLS = 1, /* handle static tls */
CVO_MSVC8     = 2, /* MSVC8 debugger */
CVO_MSVC10    = 4, /* MSVC10 debugger */
CVO_MSVC12    = 8, /* MSVC12 debugger */
};

so I use It like this:
option codeview:2  ;//create MSVC8 debug info


option codeview:8  ;//create MSVC12 debug info

I hvae tested it and it works fine

I still have to figure out WinDbg6.1, WinDbg6.2 and MSVC10 then I will post the code here

here is dbgcv.c code for this:

    } else {
len = sizeof( struct cv_symrec_bprel32 );
cv->ps = checkflush( cv->symbols, cv->ps, 1 + lcl->sym.name_size + len );
cv->ps_br32->sr.size = sizeof( struct cv_symrec_bprel32 ) - sizeof(uint_16) + 1 + lcl->sym.name_size;
cv->ps_br32->sr.type = S_BPREL32;
if ( ModuleInfo.cv_opt & CVO_MSVC12 ){
cv->ps_br32->offset = lcl->sym.offset - sym_ReservedStack->value - proc->e.procinfo->xmmsize; // MODIFIED JOHNSA
if (proc->e.procinfo->xmmsize)cv->ps_br32->offset;
if (lcl->sym.isparam)
{
cv->ps_br32->offset +=0x10; //MODIFIED by johnsa
if (proc->e.procinfo->xmmsize)cv->ps_br32->offset += (proc->e.procinfo->xmmsize);
}
}
else if ( ModuleInfo.cv_opt & CVO_MSVC8 ){
if (lcl->sym.isparam)
{
cv->ps_br32->offset = lcl->sym.offset + sym_ReservedStack->value + proc->e.procinfo->localsize + 8;// - 0x10; //MODIFIED by johnsa
if (proc->e.procinfo->xmmsize)cv->ps_br32->offset += proc->e.procinfo->xmmsize + 8;
}
else{
cv->ps_br32->offset = lcl->sym.offset + sym_ReservedStack->value + 16; // MODIFIED JOHNSA
if (proc->e.procinfo->xmmsize)cv->ps_br32->offset += proc->e.procinfo->xmmsize - 16;
}
}
else {
  cv->ps_br32->offset = lcl->sym.offset;
}
cv->ps_br32->type = lcl->sym.ext_idx1;
        DebugMsg(( "cv_write_symbol(%X): proc=%s, S_BPREL32, var=%s [memt=%X typeref=%X]\n",
                  GetPos(cv->symbols,cv->ps), proc->sym.name, lcl->sym.name, lcl->sym.mem_type, cv->ps_br32->type ));
    }

Title: Re: Static RSP built in JWasm
Post by: habran on July 01, 2013, 06:15:24 AM
I have figured out debug info for debuggers available from MS and found out
that WinDbg 6.12. can not be used with this option because it uses RBP for locals ::)
so available debuggers are here:

/* codeview debug info option flags */
enum cvoption_flags {
        CVO_STATICTLS    = 1, /* handle static tls */
CVO_MSVC12       = 2, /* MSVC12 and MSVC10 debugger are the same */
CVO_WINDBG62     = 4, /* WinDbg 6.2.8400.0 debugger */
CVO_MSVC8        = 8, /* MSVC8 debugger */
};


I have also concluded that option without pushing RBP produces less code
and decided to stick with it
so I replaced the folder on the top with that option
Title: Re: Static RSP built in JWasm
Post by: japheth on July 01, 2013, 08:08:43 PM
Quote from: habran on July 01, 2013, 06:15:24 AM
that WinDbg 6.12. can not be used with this option because it uses RBP for locals ::)

Yes, of course. This is not a debugger issue. If your compiler/assembler emits S_BPREL32 codeview debugging info, then the debugger will assume that [E|R]BP is setup as frame pointer. If [E|R]BP is NOT setup, you MUST NOT emit S_BPREL32 records ( instead use S_REGREL32 ).

Also, for me WinDbg ( various versions ) and MSVC ( 5, 6,8 (VC 2005), 9 (VC 2008) and 10 ( VC 2010) debuggers work very well displaying locals ( I didn't test VC 2012 because, AFAIK, this BS won't run with XP anymore ).

So unless you're providing a test case that will reveal to me what the problem is, I will have to leave everything as it is.
Title: Re: Static RSP built in JWasm
Post by: habran on July 02, 2013, 05:59:32 AM
I tried to change to S_REGREL32 but it wouldn't link:
Error   1   error LNK1103: debugging information corrupt; recompile module   
I have to admit that I am not familiar with dbgcv.c except the cv->ps_br32->offset

why is that that all other debuggers have no problem with S_BPREL32 but WinDBG 6.12?
QuoteSo unless you're providing a test case that will reveal to me what the problem is, I will have to leave everything as it is.
what do you mean with that constatation?
Title: Re: Static RSP built in JWasm
Post by: japheth on July 02, 2013, 09:07:05 AM
Quote from: habran on July 02, 2013, 05:59:32 AM
I tried to change to S_REGREL32 but it wouldn't link:
Error   1   error LNK1103: debugging information corrupt; recompile module   

You probably just replaced S_BPREL32 by S_REGREL32? It's not THAT simple - the S_REGREL32 record has an additional field, a  "register number", which must be set. See codeview debug info documentation.

Quote
why is that that all other debuggers have no problem with S_BPREL32 but WinDBG 6.12?

I don't know ... must be a miracle ... perhaps the Hand of God.

Quotewhat do you mean with that constatation?

AFAIU, jphnsa reported this issue as a bug ( by PM ) - and I'm unable to see a bug.



Title: Re: Static RSP built in JWasm
Post by: habran on July 02, 2013, 10:28:21 AM
QuoteI don't know ... must be a miracle ... perhaps the Hand of God.

If you are trying to piss me off, you have to work harder on it ;)

QuoteYou probably just replaced S_BPREL32 by S_REGREL32? It's not THAT simple - the S_REGREL32 record has an additional field, a  "register number", which must be set. See codeview debug info documentation.

for you it would be a chicken shit to make it work, but you are on bloody holidays
so I have to sweat blood to make it ::)

anyway, don't feel sorry for me, as you said before, we have to push these brains of us if we want to
keep them
if you stay a little bit longer on the vacation I will eventually become as smart as you are :biggrin: 
Title: Re: Static RSP built in JWasm
Post by: habran on July 04, 2013, 02:31:33 PM
I have left only version with no RBP pushed
debug info available:
MSVC12        = option codeview:2
WINDBG 6.2  = option codeview:4 
MSVC8          = option codeview:8
8)
Title: Re: Static RSP built in JWasm
Post by: habran on July 06, 2013, 08:46:35 PM
I have found out how to use S_REGREL32 and that fixes everything
thanks to you japheth's sugestion, now all MS debuggers work properly :t
there is no more need to use "option codeview:"
It took me whole week of surfing the internet to find info for RSP value which is 335
it is contained in cvconst.h (https://code.google.com/p/opendbg/source/browse/branches/minidbg/inc/cvconst.h?r=126)

using the struct:

struct cv_symrec_regrel32 { /* REGREL32 */
struct cv_symrec sr;  
int_32 offset; /* offset of symbol */
        uint_16  reg; /* register index for symbol */
cv_typeref type; /* Type index */
    //unsigned char   name[1];  /* Length-prefixed name */
}; //REGREL32 added by habran



if (ModuleInfo.win64_flags & W64F_STATICRSP){
  len = sizeof( struct cv_symrec_regrel32 );
  cv->ps = checkflush( cv->symbols, cv->ps, 1 + lcl->sym.name_size + len );
  cv->ps_rr32->sr.size = sizeof( struct cv_symrec_regrel32 ) - sizeof(uint_16) + 1 + lcl->sym.name_size;
  cv->ps_rr32->sr.type = S_REGREL32;// replaced S_BPREL32
  cv->ps_rr32->reg = CV_AMD64_RSP; // use RSP register
cv->ps_rr32->offset = lcl->sym.offset + sym_ReservedStack->value ;
if ((proc->e.procinfo->ReservedStack > 32)) cv->ps_rr32->offset += 16;
if (lcl->sym.isparam)
{
  cv->ps_rr32->offset +=0x10;
  if (proc->e.procinfo->xmmsize)cv->ps_rr32->offset += proc->e.procinfo->xmmsize;
}
  cv->ps_rr32->type = lcl->sym.ext_idx1;
  DebugMsg(( "cv_write_symbol(%X): proc=%s, S_BPREL32, var=%s [memt=%X typeref=%X]\n",
     GetPos(cv->symbols,cv->ps), proc->sym.name, lcl->sym.name, lcl->sym.mem_type, cv->ps_rr32->type ));
}
Title: Re: Static RSP built in JWasm
Post by: japheth on July 07, 2013, 11:08:50 AM
Quote from: habran on July 06, 2013, 08:46:35 PM
It took me whole week of surfing the internet

In case it was the first time: welcome to the pleasure of chasing undocumented M$ stuff!  :bgrin:

Quote
to find info for RSP value which is 335
it is contained in cvconst.h (https://code.google.com/p/opendbg/source/browse/branches/minidbg/inc/cvconst.h?r=126)

Nice find and valuable information! I knew that register number for RBP was 334 ( because that is what ML64 emits, it has obviously abandoned S_BPREL32 ), but would have assumed that the number for RSP is 333 then ( this would have matched the Intel register ordering ).,
Title: Re: Static RSP built in JWasm
Post by: habran on July 07, 2013, 01:52:16 PM
thanks japheth :biggrin:
I am very pleased that I succeeded to make everything working as I planed
felling of success is the greatest reword for a hard work
it doesn't matter if only a few people understand the value of your work
these few people count :t
Title: Re: Static RSP built in JWasm
Post by: japheth on July 08, 2013, 01:24:17 PM
Quote from: japheth on July 07, 2013, 11:08:50 AM
because that is what ML64 emits, it has obviously abandoned S_BPREL32

I wondered why ML64 has abandoned S_BPREL32; after a bit of testing it turned out that it apparently wasn't just the joy to do things differently. S_BPREL32 works correctly for register EBP only - that is, if the upper 32-bits of RBP aren't zero, S_BPREL32 isn't appropriate.

It's a bug, but usually it doesn't matter, because even if the base address of your 64-bit binary is beyond the 4 GB frontier, the stack will still reside in the first 4 GB.

Here's a test case:

ExitProcess proto :dword

includelib <kernel32.lib>

.data?

stack db 8000h dup (?)
eos  dq 4 dup (?)

.code

option frame:auto
option win64:3

p2 proc frame a1:dword, a2:qword
local l1:qword
local l2:qword
mov rax,123
mov l1, rax
mov rcx,234
mov l2, rcx
ret
p2 endp

start proc frame
mov rsp, offset eos
invoke p2, 1, 2
invoke ExitProcess, 0
start endp

end start


assemble: jwasm -Zi -win64 test.asm
link: link /debug /base:0x140000000 test.obj /libpath:\wininc\lib64
Title: Re: Static RSP built in JWasm
Post by: habran on July 08, 2013, 07:15:36 PM
QuoteIt's a bug, but usually it doesn't matter, because even if the base address of your 64-bit binary is beyond the 4 GB frontier, the stack will still reside in the first 4 GB.
does it mean that we need to us RSP for a greater stack allocation?
is it an Intel bug?
Title: Re: Static RSP built in JWasm
Post by: japheth on July 09, 2013, 02:25:07 PM
Quote from: habran on July 08, 2013, 07:15:36 PM
does it mean that we need to us RSP for a greater stack allocation?

Not at all. It just means what I already did say: that due to Windows current 64-bit implementation details the upper 32 bits or RSP ( and RBP, if it's used as frame pointer ), is zero - and that's why S_BPREL32 works "by chance".
Title: Re: Static RSP built in JWasm
Post by: habran on July 09, 2013, 03:01:25 PM
so, if I understood correctly,  in any case we can not allocate stack more than 4 GB?
Title: Re: Static RSP built in JWasm
Post by: habran on July 13, 2013, 06:41:56 PM
I have fixed one bug and improved saving of xmm registers

before ve had this:

00000000000D1176  mov         qword ptr [rsp+8],rcx 
00000000000D117B  mov         qword ptr [rsp+10h],rdx 
00000000000D1180  sub         rsp,38h                        ;here we subtract rsp for locals xmm regs
00000000000D1184  movdqa      xmmword ptr [rsp],xmm1 
00000000000D1189  movdqa      xmmword ptr [rsp+10h],xmm2 
00000000000D118F  movdqa      xmmword ptr [aVar],xmm3 
00000000000D1195  sub         rsp,30h                       ;here we subtract rsp again for locals and shadows
00000000000D1199  mov         eax,dword ptr [val2] 
00000000000D119D  mov         dword ptr [bVar],eax 
00000000000D11A1  mov         qword ptr [val],21h 
00000000000D11AA  mov         rdx,qword ptr [val] 
00000000000D11AF  mov         rcx,0D4008h 
00000000000D11B9  call        printf (0D1292h) 
00000000000D11BE  mov         rax,22h   
00000000000D11C5  mov         qword ptr [aVar],rax 
00000000000D11CA  mov         rdx,qword ptr [aVar] 
00000000000D11CF  mov         rcx,0D400Fh 
00000000000D11D9  call        printf (0D1292h)   
00000000000D11DE  call        testproc2 (0D11FAh)   
00000000000D11E3  movdqa      xmm1,xmmword ptr [rsp+40h]  ;wrong displacement shpuld be 30h
00000000000D11E9  movdqa      xmm2,xmmword ptr [rsp+50h]  ;wrong displacement shpuld be 40h
00000000000D11EF  movdqa      xmm3,xmmword ptr [rsp+60h]  ;wrong displacement shpuld be 50h
00000000000D11F5  add         rsp,68h 
00000000000D11F9  ret   


after fix:

0000000000DE1176  mov         qword ptr [rsp+8],rcx 
0000000000DE117B  mov         qword ptr [rsp+10h],rdx 
0000000000DE1180  sub         rsp,68h                ;here we subtract at ones space for xmm and locals
0000000000DE1184  movdqa      xmmword ptr [rsp+30h],xmm1 
0000000000DE118A  movdqa      xmmword ptr [rsp+40h],xmm2 
0000000000DE1190  movdqa      xmmword ptr [rsp+50h],xmm3 
0000000000DE1196  mov         eax,dword ptr [val2] 
0000000000DE119A  mov         dword ptr [bVar],eax 
0000000000DE119E  mov         qword ptr [val],21h 
0000000000DE11A7  mov         rdx,qword ptr [val] 
0000000000DE11AC  mov         rcx,0DE4008h 
0000000000DE11B6  call        printf (0DE1292h) 
0000000000DE11BB  mov         rax,22h   
0000000000DE11C2  mov         qword ptr [aVar],rax 
0000000000DE11C7  mov         rdx,qword ptr [aVar] 
0000000000DE11CC  mov         rcx,0DE400Fh 
0000000000DE11D6  call        printf (0DE1292h)   
0000000000DE11DB  call        testproc2 (0DE11F7h)       
0000000000DE11E0  movdqa      xmm1,xmmword ptr [rsp+30h]  ;now location is correct
0000000000DE11E6  movdqa      xmm2,xmmword ptr [rsp+40h] 
0000000000DE11EC  movdqa      xmm3,xmmword ptr [rsp+50h] 
0000000000DE11F2  add         rsp,68h 
0000000000DE11F6  ret


Title: Re: Static RSP built in JWasm
Post by: habran on July 13, 2013, 07:00:31 PM
I have find out that debugging depends on the linker
if you build with MSVC8 you can debug it with MSVC8 Debugger or WinDbg 6.12
If you build it with MSVC12 you can debug it with MSVC12 Debugger or WinDbg 6.2
Title: Re: Static RSP built in JWasm
Post by: habran on July 17, 2013, 06:43:57 AM
sorry, mea culpa :icon_redface:
the last bug eliminated(hopefully) :bgrin:
Title: Re: Static RSP built in JWasm
Post by: habran on July 18, 2013, 09:40:21 AM
the last bug was not really the last, it just pretended to be the one :icon_eek:
but this one was real one, just undercover :biggrin:
now everything glides :t
Title: Re: Static RSP built in JWasm
Post by: japheth on July 18, 2013, 10:54:49 PM
Quote from: habran on July 18, 2013, 09:40:21 AM
but this one was real one, just undercover :biggrin:

Wow, great!

I always wonder, when releasing another jwasm version with a few dozen bug fixes, how any of the previous versions could ever have been regarded as "stable".  :bgrin:
Title: Re: Static RSP built in JWasm
Post by: habran on July 18, 2013, 11:53:14 PM
we realists always hope for best and at the same tame expect the worst :biggrin:
I experience now the responsibility which a programmers have by publishing their code :icon_mrgreen:
but at the same time I am happy to be able to create that perfect tool  8)
Title: Re: Static RSP built in JWasm
Post by: habran on July 19, 2013, 08:58:55 PM
one more small fix for debug info :biggrin:
Title: Re: Static RSP built in JWasm
Post by: habran on July 21, 2013, 09:56:12 AM
I have replaced the folder with the better one ::)
If you are curious why am I insisting on this version of jwasm when Japheth published new version
JWasm211, which does include option STACKBASE, the answer is:
1) JWasm211 doesn't align first local to 16 byte
2) JWasm211 doesn't use home space if it is free to store registers
3) JWasm211 doesn't have .for/.endfor

I appreciate Japheth and his precious JWasm, I just added little bit more to it :t   
Title: Re: Static RSP built in JWasm
Post by: habran on July 30, 2013, 12:07:02 PM
I uploaded at the top a new, better version of JWasm.exe with some more fixes
as far as I tested it, now debug info works in all situations

I waned to upload complete source with MSVC12 project and exe but it doesnt fit in allowed forum size
even without exe it is not small enough when compressed with winzip
so, I will attach here main folder with .c extension and in new post H folder
I think that the upload size should be increased to 1 MB :biggrin:
Title: Re: Static RSP built in JWasm
Post by: habran on July 30, 2013, 12:09:56 PM
here are headers, just decompress it and drop it in JWasm folder
Title: Re: Static RSP built in JWasm
Post by: habran on August 17, 2013, 07:19:23 PM
Hi again :biggrin:

I have worked more on JWasm and added some more sophisticated features to it
Now it can decide by itself if there is a need for reserved stack
if function is not having invoke inside, there is no need for alocating the stack space
also, if there is USES command and a space for up to 4 registers in the home space
it will PUSH the last one for alignment instead SUB RSP,8

all together it makes it intelligent tool
you don't need any more use:
option win64:0
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

it will realize that there is no need for  PROLOGUE

the binaries are at the top of this thread

it is now advanced so much that Germans would call it "Vorsprung durch Technik" ;)

here is the .c source
Title: Re: Static RSP built in JWasm
Post by: habran on August 17, 2013, 07:22:16 PM
here are headers
Title: Re: Static RSP built in JWasm
Post by: habran on August 24, 2013, 07:43:13 PM
I have a little correction  in the proc.c line: 2162   ::)

this code did not do what I expected :shock:
it was supposed to check if number of registers reminded for push are odd
and if not, to make it happen

  else if (grcount>4){
       i=grcount-r;
       if (grcount & 1 == 0){
           for (i=0;i<4;i++){
           if (info->home_used[i]==0) break;
          }
           info->home_used[i]=1;
        }
     }


replace with this:
  else if (grcount > 4){
       r = grcount-r;
       if (!(r & 1 )){
           for (i=0;i<4;i++){
           if (info->home_used[i]==0) break;
          }
           info->home_used[i]=1;
        }
     }


I have replaced binaries at the top with corrected build

Title: Re: Static RSP built in JWasm
Post by: habran on August 26, 2013, 01:37:12 PM
I uploaded at the top new version with some more bug fixes
the folder also contains the proc.c with changes 8)
Title: Re: Static RSP built in JWasm
Post by: habran on August 28, 2013, 04:56:36 PM
Some more fixes :biggrin:
in folder at the top of the thread are new binaries an three source files
replace these sources in former source folder with these new ones
here are some examples what it does: 8)


testproc4 PROC FRAME
                 
                mov eax,20
                mov edx,eax
                ret 
testproc4 endp

;produces this:
0000000000A011FA  mov         eax,14h 
0000000000A011FF  mov         edx,eax 
0000000000A01201  ret 



testproc4 PROC FRAME
                LOCAL z:DWORD
                 
                mov eax,20
                mov z,eax
                 ret 
testproc4 endp

;produces this:
00000000010D11FA  sub         rsp,8 
00000000010D11FE  mov         eax,14h 
00000000010D1203  mov         dword ptr [rsp],eax 
00000000010D1206  add         rsp,8 
00000000010D120A  ret


testproc4 PROC FRAME USES rbx rdi
                LOCAL z:DWORD
                 
                mov eax,20
                mov z,eax
                ret 
testproc4 endp

;produces this:
00000000013B11FA  mov         qword ptr [rsp+8],rbx 
00000000013B11FF  push        rdi 
00000000013B1200  sub         rsp,8 
00000000013B1204  mov         eax,14h 
00000000013B1209  mov         dword ptr [rsp],eax 
00000000013B120C  mov         rbx,qword ptr [rsp+18h] 
00000000013B1211  add         rsp,8 
00000000013B1215  pop         rdi 
00000000013B1216  ret



testproc4 PROC FRAME USES rbx rdi
                LOCAL z:DWORD
                 
                mov eax,20
                mov z,eax
                invoke printf,rcx,rbx
                ret 
testproc4 endp
;produces this:
0000000000B911FA  mov         qword ptr [rsp+8],rbx 
0000000000B911FF  push        rdi 
0000000000B91200  sub         rsp,30h 
0000000000B91204  mov         eax,14h 
0000000000B91209  mov         dword ptr [rsp+20h],eax 
0000000000B9120D  mov         rdx,rbx 
0000000000B91210  call        printf (0B91358h) 
0000000000B91215  mov         rbx,qword ptr [rsp+40h] 
0000000000B9121A  add         rsp,30h 
0000000000B9121E  pop         rdi 
0000000000B9121F  ret

Title: Re: Static RSP built in JWasm
Post by: habran on August 28, 2013, 05:08:09 PM
as you can see this assembler has got brains ;)
there is no need to use other options to improve produced code
Title: Re: Static RSP built in JWasm
Post by: japheth on August 28, 2013, 09:21:09 PM
Quote from: habran on August 28, 2013, 04:56:36 PM

;produces this:
[b]00000000013B11FA  mov         qword ptr [rsp+8],rbx  [/b]
00000000013B11FF  push        rdi 
00000000013B1200  sub         rsp,8 
00000000013B1204  mov         eax,14h 
00000000013B1209  mov         dword ptr [rsp],eax 
[b]00000000013B120C  mov         rbx,qword ptr [rsp+18h]  [/b]
00000000013B1211  add         rsp,8 
00000000013B1215  pop         rdi 
00000000013B1216  ret


I'm not really happy with this code generation. Why is rbx not pushed like rdi?

This is not just an aesthetic issue, as you probably might assume - it makes the code incompatible with Win64 SEH.

See this - admittedly "advanced" - sample:


;--- Win64 SEH sample, requires jwasm.
;--- it demonstrates:
;--- a) how to install exception handlers in 64-bit
;--- b) how a handler may "refuse" to handle the exception
;--- c) how to "unwind" via RtlUnwind() or RtlUnwindEx()
;--- d) that an exception handler may be called twice,
;---    see "A Crash Course on the Depths of Win32 Structured Exception Handling"
;---    by Matt Pietrek, MSDN 01/1997.

option casemap:none
option win64:3
option frame:auto

.nolist
.nocref
WIN32_LEAN_AND_MEAN equ 1
include windows.inc
include ntdll.inc
include excpt.inc
include stdio.inc
.cref
.list

;UNWFUNC textequ <RtlUnwind>
UNWFUNC textequ <RtlUnwindEx>

ExceptionExecuteHandler equ 4

includelib <kernel32.lib>
includelib <msvcrt.lib>

CStr macro text:vararg
local sym
.const
sym db text, 0
.code
exitm <offset sym>
endm

.code

func1_eh proc frame pRecord:ptr EXCEPTION_RECORD, pFrame:ptr, pContext:ptr CONTEXT

mov rcx, pRecord
invoke printf, CStr("func1_eh( pRecord=%p [code=%X flags=%X prevRec=%p addr=%p], pFrame=%p, pContext=%p )",10), rcx,
[rcx].EXCEPTION_RECORD.ExceptionCode,
[rcx].EXCEPTION_RECORD.ExceptionFlags,
[rcx].EXCEPTION_RECORD.ExceptionRecord,
[rcx].EXCEPTION_RECORD.ExceptionAddress,
pFrame, pContext

mov rcx, pContext
invoke printf, CStr("func1_eh: context.flags=%X",10), [rcx].CONTEXT.ContextFlags

mov eax, ExceptionContinueSearch

ret
align 8

func1_eh endp


func1 proc frame:func1_eh uses rbx rsi rdi

local lcl1:dword

mov lcl1, 12345678h
mov rbx, -1
mov rsi, -1
mov rdi, -1
invoke printf, CStr("func1: rbp=%p rbx=%p rsi=%p rdi=%p",10), rbp, rbx, rsi, rdi

invoke RaiseException, 0E2003456h, 0, 0, 0

invoke printf, CStr("func1: exit, rbp=%p rbx=%p rsi=%p rdi=%p lcl1=%X",10), rbp, rbx, rsi, rdi, lcl1
ret
align 8

func1  endp

main_eh proc frame pRecord:ptr EXCEPTION_RECORD, pFrame:ptr, pContext:ptr CONTEXT

mov rcx, pRecord
invoke printf, CStr("main_eh( pRecord=%p [code=%X flags=%X prevRec=%p addr=%p], pFrame=%p, pContext=%p )",10), rcx,
[rcx].EXCEPTION_RECORD.ExceptionCode,
[rcx].EXCEPTION_RECORD.ExceptionFlags,
[rcx].EXCEPTION_RECORD.ExceptionRecord,
[rcx].EXCEPTION_RECORD.ExceptionAddress,
pFrame, pContext

mov rcx, pContext
invoke printf, CStr("main_eh: context.flags=%X",10), [rcx].CONTEXT.ContextFlags

mov rcx, pRecord
.if !( [rcx].EXCEPTION_RECORD.ExceptionFlags & 2 )
invoke printf, CStr("main_eh: calling ", @CatStr(!", %UNWFUNC, !"), "(), rsp=%p, rbp=%p",10), rsp, rbp
ifidn UNWFUNC, <RtlUnwindEx>
invoke RtlUnwindEx, pFrame, offset returnaddr, pRecord, NULL, pContext, NULL
else
invoke RtlUnwind, pFrame, offset returnaddr, pRecord, NULL
endif
returnaddr:
invoke printf, CStr("main_eh: back from ", @CatStr(!", %UNWFUNC, !"), "(), rsp=%p, rbp=%p",10), rsp, rbp
;--- the 64-bit unwind has restored all registers, including RSP!
;--- hence one cannot execute a RET.
jmp cont_addr
; mov eax, ExceptionContinueExecution
.else
mov eax, ExceptionContinueSearch
.endif
ret
align 8

main_eh endp

main proc frame:main_eh

local lcl1:dword

mov lcl1, 12345678h
;--- initialize non-volatile registers to see if the contents remain unchanged
mov rbx, 055667788deadbeefh
mov rsi, 05555aaaa5555aaaah
mov rdi, 08765432112345678h
invoke printf, CStr("main: rsp=%p rbp=%p rbx=%p rsi=%p rdi=%p",10), rsp, rbp, rbx, rsi, rdi

call func1
cont_addr::
invoke printf, CStr("main: exit, rbp=%p rbx=%p rsi=%p rdi=%p lcl1=%X",10), rbp, rbx, rsi, rdi, lcl1
ret
align 8

main endp

mainCRTStartup proc frame
call main
invoke ExitProcess, 0
mainCRTStartup endp

end mainCRTStartup



I'm sorry that it's a complicated sample, unfortunately, but I don't know how to make it simpler without loosing vital information.

The point is: the sample won't run as expected if it is generated with your version of jwasm - while the standard jwasm v2.10 and also the v2.11 prerelease have no problems generating a "running" sample.

The reason for the incompatibility is somewhat hidden in how Win64 handles the "unwind" thingy - with your version of jwasm, it simply cannot know how to restore all registers.



Title: Re: Static RSP built in JWasm
Post by: habran on August 28, 2013, 11:42:30 PM
Hi Japheth :biggrin:
I am glad that you payed attention to this version
I have tested and looked through the code  above
I agree that it doesn't work, however I don't find it as a good programming
look:by Matt Pietrek, MSDN 01/1997.
it was not meant to handle 64 bit code when written first time
I would use jmp exit instead of ret and it would work fine

here (http://www.codemachine.com/article_x64deepdive.html) is nicely explained about usage of the shadow space

in my example I am using last register to align to 16 byte
and I use unused shadow space to store registers to reduce usage of the stack

I saw it in disassembly of MSVC C functions, they do it exactly the same way as I did
Title: Re: Static RSP built in JWasm
Post by: habran on August 29, 2013, 05:53:15 AM
sorry Japheth :P
I actually just glanced at your example because it was late at night
QuoteI would use jmp exit instead of ret and it would work fine
that was bullshit
I will play around to see why is it not working
Title: Re: Static RSP built in JWasm
Post by: habran on August 29, 2013, 06:19:35 AM
It is probably possible to "teach" SEH how to handle this case :icon_confused:
Title: Re: Static RSP built in JWasm
Post by: habran on August 29, 2013, 01:02:43 PM
here is an example how MSVC12 process C code:

UINT_PTR XIndexOffset(ASMEDIT *asme, const XCHARINDEX *charin, XCHARINDEX *charout, INT_PTR offset, int newline)
{
000000013F710900  mov         qword ptr [rsp+18h],rbx 
000000013F710905  push        rbp 
000000013F710906  sub         rsp,40h 
  XCHARINDEX charcnt=*charin;
000000013F71090A  mov         rax,qword ptr [rdx] 
000000013F71090D  mov         r10,qword ptr [rdx+8] 
000000013F710911  mov         qword ptr [charin],rdi 
  INT_PTR offsetcnt=offset;
  int nSub;
  BYTE nLineBreak;

  if (newline)
000000013F710916  mov         edi,dword ptr [newline] 
000000013F71091A  mov         qword ptr [ciCount],rax 
000000013F71091F  mov         rax,qword ptr [rdx+10h] 
000000013F710923  mov         qword ptr [charout],r14 
000000013F710928  mov         qword ptr [rsp+30h],rax 
000000013F71092D  mov         rbp,r9 
000000013F710930  mov         r14,r8 
...
...
000000013F710B08  mov         rbx,qword ptr [offset] 
000000013F710B0D  add         rsp,40h 
000000013F710B11  pop         rbp 
000000013F710B12  ret 

Title: Re: Static RSP built in JWasm
Post by: japheth on August 29, 2013, 04:52:54 PM

The problem with your jwasm version is that the listing of prologue code is messed if option -Sg has been set. This makes it virtually impossible to see what SEH-primitives your program has created inside the prologue.

But, as you may have noticed, my example is just a slightly modified version of a sample included in Wininc ( it's in the Sampl64\SEHSmpl folder ). And in this folder there is also a Masm64-compatible version, which has to emit the SEH-primitives manually. I suggest to use this version for experiments, adding the prologue code that your jwasm version is creating  by hand.

If you're lucky, you just have to emit a .SAVEREG directive for the register value saved in the shadow space.
Title: Re: Static RSP built in JWasm
Post by: habran on August 29, 2013, 06:04:08 PM
thanks Japheth :t
that's good idea, I'll try it tonight :biggrin:

Title: Re: Static RSP built in JWasm
Post by: habran on August 29, 2013, 08:57:10 PM
I have done it but it brakes :(

here is the code where I implemented it:


/* added for W64F_STATICRSP */
static void win64_StoreRegHome( struct proc_info *info )
/*******************************************************/
{
  int          i = 0;
  int           cnt;
  int           grcount=0;
  int           sizestd =0;
  int           r;
  uint_16    *regist;
info->stored_reg = 0;
      if ( info->regslist ) {
         for( regist = info->regslist,cnt = *regist++; cnt; cnt--, regist++,i++ ) {
      if ( GetValueSp( *regist ) & OP_XMM ) continue;
      else ++grcount;
         }
         for (i=0,r=0;i<4;i++){
            if (info->home_used[i]==0) ++r;
            }
         if (r){
            if (grcount==1) memset(info->home_used, 1, 4);
            else if (grcount==2 && r >= 2){
               for (i=0;i<4;i++){
                     if (info->home_used[i]==0) break;
                    }
                     for (++i;i<4;i++)
                        info->home_used[i]=1;
               }
            else if (grcount==3){
               if ( r == 1) memset(info->home_used, 1, 4);
               if ( r >= 3){
               for (i=0;i<4;i++){
                     if (info->home_used[i]==0) break;
                    }
               for (++i;i<4;i++){
                     if (info->home_used[i]==0) break;
                    }
                     for (++i;i<4;i++)
                        info->home_used[i]=1;
                  }
            }
            else if (grcount==4 && r == 4){
                    info->home_used[4]=1;
               }
            else if (grcount > 4){
                 r = grcount-r;
                 if (!(r & 1 )){
                     for (i=0;i<4;i++){
                     if (info->home_used[i]==0) break;
                    }
                     info->home_used[i]=1;
                  }
               }
            }
            for( i=0,regist = info->regslist,cnt = *regist++; cnt; cnt--, regist++,i++ ) {
        if ( GetValueSp( *regist ) & OP_XMM ) continue;
        else {
    sizestd += 8;
    if (i < 4) {
if (info->home_used[i]==0){
    AddLineQueueX( "mov [%r+%u], %r",T_RSP, NUMQUAL sizestd, *regist );
                            if ( ( 1 << GetRegNo( *regist ) ) & win64_nvgpr )
            AddLineQueueX( "%r %r, %u", T_DOT_SAVEREG, *regist, NUMQUAL sizestd);                           
                             info->stored_reg++;
}
else {
cnt++;regist--;
}
  }
             }
              }/* end for */
           }
return;
}

Title: Re: Static RSP built in JWasm
Post by: japheth on August 30, 2013, 12:07:20 AM
Quote from: habran on August 29, 2013, 08:57:10 PM
I have done it but it brakes :(

Alles muss man selber machen!  :icon_mrgreen:

I did a few experiments with the ML64-compatible version. Result: it works to save a register content into the shadow space - the OS will restore the contents if the proper .SAVEREG directive has been used.

However: the offset to use with .SAVEREG must not be calculated from the current value of RSP, but from the value RSP will have after the prologue! Here's the prologue that worked for me:


func1 proc frame:func1_eh

@localsize = 5*8

; push rbx
; .pushreg rbx
mov [rsp+8], rbx                              ;save register rbx in shadow space
.savereg rbx, 8+16+@localsize ;!!! offset must be the "offset" to the "final" RSP

push rsi
.pushreg rsi
push rdi
.pushreg rdi
sub rsp,@localsize
.allocstack @localsize
.endprolog


and the epilogue looks like this:

add rsp, @localsize
pop rdi
pop rsi
; pop rbx
mov rbx, [rsp+8]
ret

Title: Re: Static RSP built in JWasm
Post by: habran on August 30, 2013, 06:32:26 AM
you meant: "If you want the job done properly do it yourself" :biggrin:
I have tried it with:

if (info->home_used[i]==0){
AddLineQueueX( "mov [%r+%u], %r",T_RSP, NUMQUAL sizestd, *regist );
                 if ( ( 1 << GetRegNo( *regist ) ) & win64_nvgpr )
               AddLineQueueX( "%r %r, %u", T_DOT_SAVEREG, *regist, NUMQUAL sizestd + 16 + info->localsize);                           
                             info->stored_reg++;
}
else {
cnt++;regist--;
}
    }

It still brakes :(
Title: Re: Static RSP built in JWasm
Post by: japheth on August 30, 2013, 05:23:56 PM
Quote from: habran on August 30, 2013, 06:32:26 AM
It still brakes :(

This has probably nothing to do with unwind. It looks like you're allocating a too small stack space:


main_eh:
  0000000000000128: 48 89 4C 24 08                               mov         qword ptr [rsp+8],rcx
  000000000000012D: 48 89 54 24 10                               mov         qword ptr [rsp+10h],rdx
  0000000000000132: 4C 89 44 24 18                               mov         qword ptr [rsp+18h],r8
  0000000000000137: 55                                           push        rbp
  0000000000000138: 48 8B EC                                     mov         rbp,rsp
  000000000000013B: 48 83 EC 20                                  sub         rsp,20h
  000000000000013F: 48 8B 4D 10                                  mov         rcx,qword ptr [rbp+10h]
  0000000000000143: 48 8B 45 20                                  mov         rax,qword ptr [rbp+20h]
  0000000000000147: 48 89 44 24 38                               mov         qword ptr [rsp+38h],rax    ;<---- !!!!!!
  000000000000014C: 48 8B 45 18                                  mov         rax,qword ptr [rbp+18h]
  0000000000000150: 48 89 44 24 30                               mov         qword ptr [rsp+30h],rax    ;<---- !!!!!!
  0000000000000155: 48 8B 41 10                                  mov         rax,qword ptr [rcx+10h]
  0000000000000159: 48 89 44 24 28                               mov         qword ptr [rsp+28h],rax     ;<---- !!!!!!
  000000000000015E: 48 8B 41 08                                  mov         rax,qword ptr [rcx+8]
  0000000000000162: 48 89 44 24 20                               mov         qword ptr [rsp+20h],rax    ;<---- !!!!!!
  0000000000000167: 44 8B 49 04                                  mov         r9d,dword ptr [rcx+4]
  000000000000016B: 44 8B 01                                     mov         r8d,dword ptr [rcx]
  000000000000016E: 48 8B D1                                     mov         rdx,rcx
  0000000000000171: 48 B9 00 00 00 00 00 00 00 00                mov         rcx,offset ??0004
  000000000000017B: E8 00 00 00 00                               call        printf


In this excerpt you're allocating 20h bytes stack space, but the printf function that is called later needs 40h bytes.

Perhaps you did optimize a bit too much  :icon_mrgreen:
Title: Re: Static RSP built in JWasm
Post by: habran on August 30, 2013, 09:50:32 PM
You are right, however, there is no locals in that function that's why it is allocated only size for the reserved stack
I tried to put 4 QWORD dummy locals an it works than
I am actually very busy with my job and did not have enough time to study the sources
I hope on Sunday I will have more time to play with it and hopefully find the solution

thank you Japheth for engaging yourself in this case, I appreciate that  :t
Title: Re: Static RSP built in JWasm
Post by: habran on August 31, 2013, 06:20:42 AM
I did not pay attention that you told me ::)
QuoteMasm64-compatible version, which has to emit the SEH-primitives manually
now I understand what you were talking about and I will work on it as soon as I have a little bit of time
I tasted your version rc10 , it is working fine and I am looking forward for your source code  :t
however, 16 byte alignment in the beginning of proc is missing and that bothers me
my last version has bin fixed so that it works as I wanted it and emits debug info correctly
the only thing now is that damn SEH thing :(
I'll concentrate on that to make it work
Title: Re: Static RSP built in JWasm
Post by: habran on September 08, 2013, 09:16:21 AM
I have uploaded new version at the top of the thread
there was error with restoring a stack when USES xmm used without general registers
replace files with the new ones and rebuild the project