I am back as I promissed 8)
I have succeeded to build in JWasm STATIC RSP and it works fine
how it works?
use these options
option win64:7
option frame:auto
why all this?
because RSP is static we can calculate params and locals from RSP register, rather than use RBP
and we get awarded with free to use RBP register!!! :bgrin:
In the attached folder are changed sources and 64 bit JWasm.exe
here are some examples:
testproc4 PROC FRAME
LOCAL z:DWORD
mov eax,22
mov z,eax
ret
testproc4 endp
000000013F02116E sub rsp,8
000000013F021172 mov eax,16h
000000013F021177 mov dword ptr [rsp],eax
000000013F02117A add rsp,8
000000013F02117E ret
Nice work! :) Another register back helps given how many 64bit fastcall abi is trashing all over the place.
thanks johnsa :biggrin:
with this we are getting not only RBP back but code are more compact because of using the home space to store
registers
I have to remind you that RBP register is saved by default and is ready to use any time (that helps for the stack alignment)
I am working on another version where RBP will not be saved
It is working already but there is still some problem not solved
the JWasm version in a folder goes together with the .FOR feature
this is now most advanced assembler in the Universe :t
I succeeded to build the version without pushing RBP which is more similar to C prologue :t
I will just test it a little bit before I post it here :biggrin:
here it is 8)
johnsa worked together with me on debug info and Japheth gave some hints how to make debug to work :t
now both versions work correctly with debug info for locals and params
for now only MSVC12 understand debug info
I have replaced the folder at the top with the new version with a debug working
there was some problem with pushed rbp version to read locals correctly :icon_redface:
both version had a problem with a second RET in a function because of:
CurrProc->e.procinfo->stored_reg=0;
CurrProc->e.procinfo->pushed_reg=0;
when I removed it everything worked fine
I have fixed it and replaced both versions 8)
now both version are working beautifully :t
this last version with no RBP saved is now OK :t
there was still necessary some tweaking
please test it :biggrin:
Cool!
However, if a "frame pointer omission" feature is to be added to jwasm, it should be implemented for 32-bit as well.
This makes the user interface of this implementation a bit unfortunate, because the flags in OPTION:WIN64 are intended for the Win64 ABI only.
thanks japheth for flowers :biggrin:
I am still working on debug info for different debuggers
I just figured out how to make it for MSVC 2005
but I would like to do that for WinDbg 6.11 and 6.12
I am planing to make switches for different debuggers
Quote from: japheth on June 29, 2013, 02:26:14 PM
However, if a "frame pointer omission" feature is to be added to jwasm, it should be implemented for 32-bit as well.
I am a little bit skeptic about 32 bit "frame pointer omission" because
people who have already written and tested programs in 32 bit will not
go through all of they program to get read of push/pop commands
and for writing new programs it would be like going in the past
IMO new programs should be written in 64 bit where ever possible
however, if people think that it is good idea it will be no big dill to tweak a bit all together
anyway, thank you japheth for taking time to go through source :t
I am pretty happy how it all together works :bgrin:
Quote from: habran on June 29, 2013, 02:57:24 PM
because people who have already written and tested programs in 32 bit will not
go through all of they program to get rid of push/pop commands
It's a matter of course that a frame pointer omission feature for 32-bit must be able to allow PUSH and POP instructions.
That's not impossible, but requires a bit more work. Evidently, since the assembler itself cannot reliably track PUSH and POP inside a procedure, it's the programmer's duty to do this - all the assembler can do is to "assist".
The preferred approach is a new option that will allow a more comprehensive control of how stack variables are accessed. Additionally a few macros that track the current value of ESP by renaming a few instructions with OPTION RENAMEKEYWORD.
Quote
IMO new programs should be written in 64 bit where ever possible
I still prefer my good, old 32-bit XP. :icon_cool:
OPTION RENAMEKEYWORD is good idea :idea:
I agree with you that it would be possible to make it work
and it would be more sophisticated than 64 bit
however, you are on holidays from JWasm, remember ;)
perhaps, because this version is dedicated to Margaret Thatcher
you don't want to "thatch" it any more :biggrin:
QuoteI still prefer my good, old 32-bit XP. :icon_cool:
I still love Commodore 64, unfortunately it is obsolete now :(
I will suggest a proposal here:
If at least five members from this forum vote for 32 bit "static stack", than it is worth to build it in JWasm
than we will talk about who will implement it :biggrin:
Quote from: habran on June 29, 2013, 07:43:17 PM
If at least five members from this forum vote for 32 bit "static stack", than it is worth to build it in JWasm
than we will talk about who will implement it :biggrin:
I'm afraid there aren't five members here who know what we're talking about.
you are exaggerating :shock:
If so, why would we go through the trouble to build it :(
I'm afraid there aren't five members here who know care what we're talking about.
:biggrin:
what about you sinsi, Amberman?
do you know care what we're talking about? :biggrin:
Quote from: habran on June 29, 2013, 10:12:13 PM
If so, why would we go through the trouble to build it :(
:bgrin:
Questions about causality are my favorites. Perhaps because
- we are bored and have nothing better to do?
- it's an intellectual challenge?
- we want to improve our skills in ???
- for aesthetic reasons - to make the assembler "more complete"? :icon_mrgreen:
now you are talking... :t
I was just testing you ;)
so, what do you recon?
who will go through the trouble?
you, me or both? :biggrin:
deleted
Quote from: sinsi on June 29, 2013, 10:17:04 PMI'm afraid there aren't five members here who know care what we're talking about.
Is it worth the effort? I can't speak for 64-bit code, but in 32-bit...
- [esp+n] instructions are longer than their [ebp+n] equivalents
- yes you can trace push & pop, even with simple macros, but it gets nasty if they happen inside branches or .if .elseif constructs
- performance is not a valid argument, because a) innermost loops should not call or invoke any code and b) if you really, really need ebp as an extra reg32, just push it before you enter the innermost loop.
So where is the real added value...?
hey nidud
you have some good pints there :icon14:
hey jj2007,
where is your enthusiasm gone ;)
so, japheth
we have one person who is against (jj2007, as expected) :icon13:
one person who is for (nidud, as expected) :icon14:
and one who doesn't care (sinsi, unpredictable) :P
are you still on holidays or you want to role the slews?
hey japheth
I have figured out the switches for different debuggers
here are:
/* codeview debug info option flags */
enum cvoption_flags {
CVO_STATICTLS = 1, /* handle static tls */
CVO_MSVC8 = 2, /* MSVC8 debugger */
CVO_MSVC10 = 4, /* MSVC10 debugger */
CVO_MSVC12 = 8, /* MSVC12 debugger */
};
so I use It like this:
option codeview:2 ;//create MSVC8 debug info
option codeview:8 ;//create MSVC12 debug info
I hvae tested it and it works fine
I still have to figure out WinDbg6.1, WinDbg6.2 and MSVC10 then I will post the code here
here is dbgcv.c code for this:
} else {
len = sizeof( struct cv_symrec_bprel32 );
cv->ps = checkflush( cv->symbols, cv->ps, 1 + lcl->sym.name_size + len );
cv->ps_br32->sr.size = sizeof( struct cv_symrec_bprel32 ) - sizeof(uint_16) + 1 + lcl->sym.name_size;
cv->ps_br32->sr.type = S_BPREL32;
if ( ModuleInfo.cv_opt & CVO_MSVC12 ){
cv->ps_br32->offset = lcl->sym.offset - sym_ReservedStack->value - proc->e.procinfo->xmmsize; // MODIFIED JOHNSA
if (proc->e.procinfo->xmmsize)cv->ps_br32->offset;
if (lcl->sym.isparam)
{
cv->ps_br32->offset +=0x10; //MODIFIED by johnsa
if (proc->e.procinfo->xmmsize)cv->ps_br32->offset += (proc->e.procinfo->xmmsize);
}
}
else if ( ModuleInfo.cv_opt & CVO_MSVC8 ){
if (lcl->sym.isparam)
{
cv->ps_br32->offset = lcl->sym.offset + sym_ReservedStack->value + proc->e.procinfo->localsize + 8;// - 0x10; //MODIFIED by johnsa
if (proc->e.procinfo->xmmsize)cv->ps_br32->offset += proc->e.procinfo->xmmsize + 8;
}
else{
cv->ps_br32->offset = lcl->sym.offset + sym_ReservedStack->value + 16; // MODIFIED JOHNSA
if (proc->e.procinfo->xmmsize)cv->ps_br32->offset += proc->e.procinfo->xmmsize - 16;
}
}
else {
cv->ps_br32->offset = lcl->sym.offset;
}
cv->ps_br32->type = lcl->sym.ext_idx1;
DebugMsg(( "cv_write_symbol(%X): proc=%s, S_BPREL32, var=%s [memt=%X typeref=%X]\n",
GetPos(cv->symbols,cv->ps), proc->sym.name, lcl->sym.name, lcl->sym.mem_type, cv->ps_br32->type ));
}
I have figured out debug info for debuggers available from MS and found out
that WinDbg 6.12. can not be used with this option because it uses RBP for locals ::)
so available debuggers are here:
/* codeview debug info option flags */
enum cvoption_flags {
CVO_STATICTLS = 1, /* handle static tls */
CVO_MSVC12 = 2, /* MSVC12 and MSVC10 debugger are the same */
CVO_WINDBG62 = 4, /* WinDbg 6.2.8400.0 debugger */
CVO_MSVC8 = 8, /* MSVC8 debugger */
};
I have also concluded that option without pushing RBP produces less code
and decided to stick with it
so I replaced the folder on the top with that option
Quote from: habran on July 01, 2013, 06:15:24 AM
that WinDbg 6.12. can not be used with this option because it uses RBP for locals ::)
Yes, of course. This is not a debugger issue. If your compiler/assembler emits S_BPREL32 codeview debugging info, then the debugger will assume that [E|R]BP is setup as frame pointer. If [E|R]BP is NOT setup, you MUST NOT emit S_BPREL32 records ( instead use S_REGREL32 ).
Also, for me WinDbg ( various versions ) and MSVC ( 5, 6,8 (VC 2005), 9 (VC 2008) and 10 ( VC 2010) debuggers work very well displaying locals ( I didn't test VC 2012 because, AFAIK, this BS won't run with XP anymore ).
So unless you're providing a test case that will reveal to me what the problem is, I will have to leave everything as it is.
I tried to change to S_REGREL32 but it wouldn't link:
Error 1 error LNK1103: debugging information corrupt; recompile module
I have to admit that I am not familiar with dbgcv.c except the cv->ps_br32->offset
why is that that all other debuggers have no problem with S_BPREL32 but WinDBG 6.12?
QuoteSo unless you're providing a test case that will reveal to me what the problem is, I will have to leave everything as it is.
what do you mean with that constatation?
Quote from: habran on July 02, 2013, 05:59:32 AM
I tried to change to S_REGREL32 but it wouldn't link:
Error 1 error LNK1103: debugging information corrupt; recompile module
You probably just replaced S_BPREL32 by S_REGREL32? It's not THAT simple - the S_REGREL32 record has an additional field, a "register number", which must be set. See codeview debug info documentation.
Quote
why is that that all other debuggers have no problem with S_BPREL32 but WinDBG 6.12?
I don't know ... must be a miracle ... perhaps the Hand of God.
Quotewhat do you mean with that constatation?
AFAIU, jphnsa reported this issue as a
bug ( by PM ) - and I'm unable to see a bug.
QuoteI don't know ... must be a miracle ... perhaps the Hand of God.
If you are trying to piss me off, you have to work harder on it ;)
QuoteYou probably just replaced S_BPREL32 by S_REGREL32? It's not THAT simple - the S_REGREL32 record has an additional field, a "register number", which must be set. See codeview debug info documentation.
for you it would be a chicken shit to make it work, but you are on bloody holidays
so I have to sweat blood to make it ::)
anyway, don't feel sorry for me, as you said before, we have to push these brains of us if we want to
keep them
if you stay a little bit longer on the vacation I will eventually become as smart as you are :biggrin:
I have left only version with no RBP pushed
debug info available:
MSVC12 = option codeview:2
WINDBG 6.2 = option codeview:4
MSVC8 = option codeview:8
8)
I have found out how to use S_REGREL32 and that fixes everything
thanks to you japheth's sugestion, now all MS debuggers work properly :t
there is no more need to use "option codeview:"
It took me whole week of surfing the internet to find info for RSP value which is 335
it is contained in cvconst.h (https://code.google.com/p/opendbg/source/browse/branches/minidbg/inc/cvconst.h?r=126)
using the struct:
struct cv_symrec_regrel32 { /* REGREL32 */
struct cv_symrec sr;
int_32 offset; /* offset of symbol */
uint_16 reg; /* register index for symbol */
cv_typeref type; /* Type index */
//unsigned char name[1]; /* Length-prefixed name */
}; //REGREL32 added by habran
if (ModuleInfo.win64_flags & W64F_STATICRSP){
len = sizeof( struct cv_symrec_regrel32 );
cv->ps = checkflush( cv->symbols, cv->ps, 1 + lcl->sym.name_size + len );
cv->ps_rr32->sr.size = sizeof( struct cv_symrec_regrel32 ) - sizeof(uint_16) + 1 + lcl->sym.name_size;
cv->ps_rr32->sr.type = S_REGREL32;// replaced S_BPREL32
cv->ps_rr32->reg = CV_AMD64_RSP; // use RSP register
cv->ps_rr32->offset = lcl->sym.offset + sym_ReservedStack->value ;
if ((proc->e.procinfo->ReservedStack > 32)) cv->ps_rr32->offset += 16;
if (lcl->sym.isparam)
{
cv->ps_rr32->offset +=0x10;
if (proc->e.procinfo->xmmsize)cv->ps_rr32->offset += proc->e.procinfo->xmmsize;
}
cv->ps_rr32->type = lcl->sym.ext_idx1;
DebugMsg(( "cv_write_symbol(%X): proc=%s, S_BPREL32, var=%s [memt=%X typeref=%X]\n",
GetPos(cv->symbols,cv->ps), proc->sym.name, lcl->sym.name, lcl->sym.mem_type, cv->ps_rr32->type ));
}
Quote from: habran on July 06, 2013, 08:46:35 PM
It took me whole week of surfing the internet
In case it was the first time: welcome to the pleasure of chasing undocumented M$ stuff! :bgrin:
Quote
to find info for RSP value which is 335
it is contained in cvconst.h (https://code.google.com/p/opendbg/source/browse/branches/minidbg/inc/cvconst.h?r=126)
Nice find and valuable information! I knew that register number for RBP was 334 ( because that is what ML64 emits, it has obviously abandoned S_BPREL32 ), but would have assumed that the number for RSP is 333 then ( this would have matched the Intel register ordering ).,
thanks japheth :biggrin:
I am very pleased that I succeeded to make everything working as I planed
felling of success is the greatest reword for a hard work
it doesn't matter if only a few people understand the value of your work
these few people count :t
Quote from: japheth on July 07, 2013, 11:08:50 AM
because that is what ML64 emits, it has obviously abandoned S_BPREL32
I wondered why ML64 has abandoned S_BPREL32; after a bit of testing it turned out that it apparently wasn't just the joy to do things differently. S_BPREL32 works correctly for register EBP only - that is, if the upper 32-bits of RBP aren't zero, S_BPREL32 isn't appropriate.
It's a bug, but usually it doesn't matter, because even if the base address of your 64-bit binary is beyond the 4 GB frontier, the stack will still reside in the first 4 GB.
Here's a test case:
ExitProcess proto :dword
includelib <kernel32.lib>
.data?
stack db 8000h dup (?)
eos dq 4 dup (?)
.code
option frame:auto
option win64:3
p2 proc frame a1:dword, a2:qword
local l1:qword
local l2:qword
mov rax,123
mov l1, rax
mov rcx,234
mov l2, rcx
ret
p2 endp
start proc frame
mov rsp, offset eos
invoke p2, 1, 2
invoke ExitProcess, 0
start endp
end start
assemble: jwasm -Zi -win64 test.asm
link: link /debug /base:0x140000000 test.obj /libpath:\wininc\lib64
QuoteIt's a bug, but usually it doesn't matter, because even if the base address of your 64-bit binary is beyond the 4 GB frontier, the stack will still reside in the first 4 GB.
does it mean that we need to us RSP for a greater stack allocation?
is it an Intel bug?
Quote from: habran on July 08, 2013, 07:15:36 PM
does it mean that we need to us RSP for a greater stack allocation?
Not at all. It just means what I already did say: that due to Windows current 64-bit implementation details the upper 32 bits or RSP ( and RBP, if it's used as frame pointer ), is zero - and that's why S_BPREL32 works "by chance".
so, if I understood correctly, in any case we can not allocate stack more than 4 GB?
I have fixed one bug and improved saving of xmm registers
before ve had this:
00000000000D1176 mov qword ptr [rsp+8],rcx
00000000000D117B mov qword ptr [rsp+10h],rdx
00000000000D1180 sub rsp,38h ;here we subtract rsp for locals xmm regs
00000000000D1184 movdqa xmmword ptr [rsp],xmm1
00000000000D1189 movdqa xmmword ptr [rsp+10h],xmm2
00000000000D118F movdqa xmmword ptr [aVar],xmm3
00000000000D1195 sub rsp,30h ;here we subtract rsp again for locals and shadows
00000000000D1199 mov eax,dword ptr [val2]
00000000000D119D mov dword ptr [bVar],eax
00000000000D11A1 mov qword ptr [val],21h
00000000000D11AA mov rdx,qword ptr [val]
00000000000D11AF mov rcx,0D4008h
00000000000D11B9 call printf (0D1292h)
00000000000D11BE mov rax,22h
00000000000D11C5 mov qword ptr [aVar],rax
00000000000D11CA mov rdx,qword ptr [aVar]
00000000000D11CF mov rcx,0D400Fh
00000000000D11D9 call printf (0D1292h)
00000000000D11DE call testproc2 (0D11FAh)
00000000000D11E3 movdqa xmm1,xmmword ptr [rsp+40h] ;wrong displacement shpuld be 30h
00000000000D11E9 movdqa xmm2,xmmword ptr [rsp+50h] ;wrong displacement shpuld be 40h
00000000000D11EF movdqa xmm3,xmmword ptr [rsp+60h] ;wrong displacement shpuld be 50h
00000000000D11F5 add rsp,68h
00000000000D11F9 ret
after fix:
0000000000DE1176 mov qword ptr [rsp+8],rcx
0000000000DE117B mov qword ptr [rsp+10h],rdx
0000000000DE1180 sub rsp,68h ;here we subtract at ones space for xmm and locals
0000000000DE1184 movdqa xmmword ptr [rsp+30h],xmm1
0000000000DE118A movdqa xmmword ptr [rsp+40h],xmm2
0000000000DE1190 movdqa xmmword ptr [rsp+50h],xmm3
0000000000DE1196 mov eax,dword ptr [val2]
0000000000DE119A mov dword ptr [bVar],eax
0000000000DE119E mov qword ptr [val],21h
0000000000DE11A7 mov rdx,qword ptr [val]
0000000000DE11AC mov rcx,0DE4008h
0000000000DE11B6 call printf (0DE1292h)
0000000000DE11BB mov rax,22h
0000000000DE11C2 mov qword ptr [aVar],rax
0000000000DE11C7 mov rdx,qword ptr [aVar]
0000000000DE11CC mov rcx,0DE400Fh
0000000000DE11D6 call printf (0DE1292h)
0000000000DE11DB call testproc2 (0DE11F7h)
0000000000DE11E0 movdqa xmm1,xmmword ptr [rsp+30h] ;now location is correct
0000000000DE11E6 movdqa xmm2,xmmword ptr [rsp+40h]
0000000000DE11EC movdqa xmm3,xmmword ptr [rsp+50h]
0000000000DE11F2 add rsp,68h
0000000000DE11F6 ret
I have find out that debugging depends on the linker
if you build with MSVC8 you can debug it with MSVC8 Debugger or WinDbg 6.12
If you build it with MSVC12 you can debug it with MSVC12 Debugger or WinDbg 6.2
sorry, mea culpa :icon_redface:
the last bug eliminated(hopefully) :bgrin:
the last bug was not really the last, it just pretended to be the one :icon_eek:
but this one was real one, just undercover :biggrin:
now everything glides :t
Quote from: habran on July 18, 2013, 09:40:21 AM
but this one was real one, just undercover :biggrin:
Wow, great!
I always wonder, when releasing another jwasm version with a few dozen bug fixes, how any of the previous versions could ever have been regarded as "stable". :bgrin:
we realists always hope for best and at the same tame expect the worst :biggrin:
I experience now the responsibility which a programmers have by publishing their code :icon_mrgreen:
but at the same time I am happy to be able to create that perfect tool 8)
one more small fix for debug info :biggrin:
I have replaced the folder with the better one ::)
If you are curious why am I insisting on this version of jwasm when Japheth published new version
JWasm211, which does include option STACKBASE, the answer is:
1) JWasm211 doesn't align first local to 16 byte
2) JWasm211 doesn't use home space if it is free to store registers
3) JWasm211 doesn't have .for/.endfor
I appreciate Japheth and his precious JWasm, I just added little bit more to it :t
I uploaded at the top a new, better version of JWasm.exe with some more fixes
as far as I tested it, now debug info works in all situations
I waned to upload complete source with MSVC12 project and exe but it doesnt fit in allowed forum size
even without exe it is not small enough when compressed with winzip
so, I will attach here main folder with .c extension and in new post H folder
I think that the upload size should be increased to 1 MB :biggrin:
here are headers, just decompress it and drop it in JWasm folder
Hi again :biggrin:
I have worked more on JWasm and added some more sophisticated features to it
Now it can decide by itself if there is a need for reserved stack
if function is not having invoke inside, there is no need for alocating the stack space
also, if there is USES command and a space for up to 4 registers in the home space
it will PUSH the last one for alignment instead SUB RSP,8
all together it makes it intelligent tool
you don't need any more use:
option win64:0
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
it will realize that there is no need for PROLOGUE
the binaries are at the top of this thread
it is now advanced so much that Germans would call it "Vorsprung durch Technik" ;)
here is the .c source
here are headers
I have a little correction in the proc.c line: 2162 ::)
this code did not do what I expected :shock:
it was supposed to check if number of registers reminded for push are odd
and if not, to make it happen
else if (grcount>4){
i=grcount-r;
if (grcount & 1 == 0){
for (i=0;i<4;i++){
if (info->home_used[i]==0) break;
}
info->home_used[i]=1;
}
}
replace with this:
else if (grcount > 4){
r = grcount-r;
if (!(r & 1 )){
for (i=0;i<4;i++){
if (info->home_used[i]==0) break;
}
info->home_used[i]=1;
}
}
I have replaced binaries at the top with corrected build
I uploaded at the top new version with some more bug fixes
the folder also contains the proc.c with changes 8)
Some more fixes :biggrin:
in folder at the top of the thread are new binaries an three source files
replace these sources in former source folder with these new ones
here are some examples what it does: 8)
testproc4 PROC FRAME
mov eax,20
mov edx,eax
ret
testproc4 endp
;produces this:
0000000000A011FA mov eax,14h
0000000000A011FF mov edx,eax
0000000000A01201 ret
testproc4 PROC FRAME
LOCAL z:DWORD
mov eax,20
mov z,eax
ret
testproc4 endp
;produces this:
00000000010D11FA sub rsp,8
00000000010D11FE mov eax,14h
00000000010D1203 mov dword ptr [rsp],eax
00000000010D1206 add rsp,8
00000000010D120A ret
testproc4 PROC FRAME USES rbx rdi
LOCAL z:DWORD
mov eax,20
mov z,eax
ret
testproc4 endp
;produces this:
00000000013B11FA mov qword ptr [rsp+8],rbx
00000000013B11FF push rdi
00000000013B1200 sub rsp,8
00000000013B1204 mov eax,14h
00000000013B1209 mov dword ptr [rsp],eax
00000000013B120C mov rbx,qword ptr [rsp+18h]
00000000013B1211 add rsp,8
00000000013B1215 pop rdi
00000000013B1216 ret
testproc4 PROC FRAME USES rbx rdi
LOCAL z:DWORD
mov eax,20
mov z,eax
invoke printf,rcx,rbx
ret
testproc4 endp
;produces this:
0000000000B911FA mov qword ptr [rsp+8],rbx
0000000000B911FF push rdi
0000000000B91200 sub rsp,30h
0000000000B91204 mov eax,14h
0000000000B91209 mov dword ptr [rsp+20h],eax
0000000000B9120D mov rdx,rbx
0000000000B91210 call printf (0B91358h)
0000000000B91215 mov rbx,qword ptr [rsp+40h]
0000000000B9121A add rsp,30h
0000000000B9121E pop rdi
0000000000B9121F ret
as you can see this assembler has got brains ;)
there is no need to use other options to improve produced code
Quote from: habran on August 28, 2013, 04:56:36 PM
;produces this:
[b]00000000013B11FA mov qword ptr [rsp+8],rbx [/b]
00000000013B11FF push rdi
00000000013B1200 sub rsp,8
00000000013B1204 mov eax,14h
00000000013B1209 mov dword ptr [rsp],eax
[b]00000000013B120C mov rbx,qword ptr [rsp+18h] [/b]
00000000013B1211 add rsp,8
00000000013B1215 pop rdi
00000000013B1216 ret
I'm not really happy with this code generation. Why is rbx not pushed like rdi?
This is not just an aesthetic issue, as you probably might assume - it makes the code incompatible with Win64 SEH.
See this - admittedly "advanced" - sample:
;--- Win64 SEH sample, requires jwasm.
;--- it demonstrates:
;--- a) how to install exception handlers in 64-bit
;--- b) how a handler may "refuse" to handle the exception
;--- c) how to "unwind" via RtlUnwind() or RtlUnwindEx()
;--- d) that an exception handler may be called twice,
;--- see "A Crash Course on the Depths of Win32 Structured Exception Handling"
;--- by Matt Pietrek, MSDN 01/1997.
option casemap:none
option win64:3
option frame:auto
.nolist
.nocref
WIN32_LEAN_AND_MEAN equ 1
include windows.inc
include ntdll.inc
include excpt.inc
include stdio.inc
.cref
.list
;UNWFUNC textequ <RtlUnwind>
UNWFUNC textequ <RtlUnwindEx>
ExceptionExecuteHandler equ 4
includelib <kernel32.lib>
includelib <msvcrt.lib>
CStr macro text:vararg
local sym
.const
sym db text, 0
.code
exitm <offset sym>
endm
.code
func1_eh proc frame pRecord:ptr EXCEPTION_RECORD, pFrame:ptr, pContext:ptr CONTEXT
mov rcx, pRecord
invoke printf, CStr("func1_eh( pRecord=%p [code=%X flags=%X prevRec=%p addr=%p], pFrame=%p, pContext=%p )",10), rcx,
[rcx].EXCEPTION_RECORD.ExceptionCode,
[rcx].EXCEPTION_RECORD.ExceptionFlags,
[rcx].EXCEPTION_RECORD.ExceptionRecord,
[rcx].EXCEPTION_RECORD.ExceptionAddress,
pFrame, pContext
mov rcx, pContext
invoke printf, CStr("func1_eh: context.flags=%X",10), [rcx].CONTEXT.ContextFlags
mov eax, ExceptionContinueSearch
ret
align 8
func1_eh endp
func1 proc frame:func1_eh uses rbx rsi rdi
local lcl1:dword
mov lcl1, 12345678h
mov rbx, -1
mov rsi, -1
mov rdi, -1
invoke printf, CStr("func1: rbp=%p rbx=%p rsi=%p rdi=%p",10), rbp, rbx, rsi, rdi
invoke RaiseException, 0E2003456h, 0, 0, 0
invoke printf, CStr("func1: exit, rbp=%p rbx=%p rsi=%p rdi=%p lcl1=%X",10), rbp, rbx, rsi, rdi, lcl1
ret
align 8
func1 endp
main_eh proc frame pRecord:ptr EXCEPTION_RECORD, pFrame:ptr, pContext:ptr CONTEXT
mov rcx, pRecord
invoke printf, CStr("main_eh( pRecord=%p [code=%X flags=%X prevRec=%p addr=%p], pFrame=%p, pContext=%p )",10), rcx,
[rcx].EXCEPTION_RECORD.ExceptionCode,
[rcx].EXCEPTION_RECORD.ExceptionFlags,
[rcx].EXCEPTION_RECORD.ExceptionRecord,
[rcx].EXCEPTION_RECORD.ExceptionAddress,
pFrame, pContext
mov rcx, pContext
invoke printf, CStr("main_eh: context.flags=%X",10), [rcx].CONTEXT.ContextFlags
mov rcx, pRecord
.if !( [rcx].EXCEPTION_RECORD.ExceptionFlags & 2 )
invoke printf, CStr("main_eh: calling ", @CatStr(!", %UNWFUNC, !"), "(), rsp=%p, rbp=%p",10), rsp, rbp
ifidn UNWFUNC, <RtlUnwindEx>
invoke RtlUnwindEx, pFrame, offset returnaddr, pRecord, NULL, pContext, NULL
else
invoke RtlUnwind, pFrame, offset returnaddr, pRecord, NULL
endif
returnaddr:
invoke printf, CStr("main_eh: back from ", @CatStr(!", %UNWFUNC, !"), "(), rsp=%p, rbp=%p",10), rsp, rbp
;--- the 64-bit unwind has restored all registers, including RSP!
;--- hence one cannot execute a RET.
jmp cont_addr
; mov eax, ExceptionContinueExecution
.else
mov eax, ExceptionContinueSearch
.endif
ret
align 8
main_eh endp
main proc frame:main_eh
local lcl1:dword
mov lcl1, 12345678h
;--- initialize non-volatile registers to see if the contents remain unchanged
mov rbx, 055667788deadbeefh
mov rsi, 05555aaaa5555aaaah
mov rdi, 08765432112345678h
invoke printf, CStr("main: rsp=%p rbp=%p rbx=%p rsi=%p rdi=%p",10), rsp, rbp, rbx, rsi, rdi
call func1
cont_addr::
invoke printf, CStr("main: exit, rbp=%p rbx=%p rsi=%p rdi=%p lcl1=%X",10), rbp, rbx, rsi, rdi, lcl1
ret
align 8
main endp
mainCRTStartup proc frame
call main
invoke ExitProcess, 0
mainCRTStartup endp
end mainCRTStartup
I'm sorry that it's a complicated sample, unfortunately, but I don't know how to make it simpler without loosing vital information.
The point is: the sample won't run as expected if it is generated with your version of jwasm - while the standard jwasm v2.10 and also the v2.11 prerelease have no problems generating a "running" sample.
The reason for the incompatibility is somewhat hidden in how Win64 handles the "unwind" thingy - with your version of jwasm, it simply cannot know how to restore all registers.
Hi Japheth :biggrin:
I am glad that you payed attention to this version
I have tested and looked through the code above
I agree that it doesn't work, however I don't find it as a good programming
look:by Matt Pietrek, MSDN 01/1997.
it was not meant to handle 64 bit code when written first time
I would use jmp exit instead of ret and it would work fine
here (http://www.codemachine.com/article_x64deepdive.html) is nicely explained about usage of the shadow space
in my example I am using last register to align to 16 byte
and I use unused shadow space to store registers to reduce usage of the stack
I saw it in disassembly of MSVC C functions, they do it exactly the same way as I did
sorry Japheth :P
I actually just glanced at your example because it was late at night
QuoteI would use jmp exit instead of ret and it would work fine
that was bullshit
I will play around to see why is it not working
It is probably possible to "teach" SEH how to handle this case :icon_confused:
here is an example how MSVC12 process C code:
UINT_PTR XIndexOffset(ASMEDIT *asme, const XCHARINDEX *charin, XCHARINDEX *charout, INT_PTR offset, int newline)
{
000000013F710900 mov qword ptr [rsp+18h],rbx
000000013F710905 push rbp
000000013F710906 sub rsp,40h
XCHARINDEX charcnt=*charin;
000000013F71090A mov rax,qword ptr [rdx]
000000013F71090D mov r10,qword ptr [rdx+8]
000000013F710911 mov qword ptr [charin],rdi
INT_PTR offsetcnt=offset;
int nSub;
BYTE nLineBreak;
if (newline)
000000013F710916 mov edi,dword ptr [newline]
000000013F71091A mov qword ptr [ciCount],rax
000000013F71091F mov rax,qword ptr [rdx+10h]
000000013F710923 mov qword ptr [charout],r14
000000013F710928 mov qword ptr [rsp+30h],rax
000000013F71092D mov rbp,r9
000000013F710930 mov r14,r8
...
...
000000013F710B08 mov rbx,qword ptr [offset]
000000013F710B0D add rsp,40h
000000013F710B11 pop rbp
000000013F710B12 ret
The problem with your jwasm version is that the listing of prologue code is messed if option -Sg has been set. This makes it virtually impossible to see what SEH-primitives your program has created inside the prologue.
But, as you may have noticed, my example is just a slightly modified version of a sample included in Wininc ( it's in the Sampl64\SEHSmpl folder ). And in this folder there is also a Masm64-compatible version, which has to emit the SEH-primitives manually. I suggest to use this version for experiments, adding the prologue code that your jwasm version is creating by hand.
If you're lucky, you just have to emit a .SAVEREG directive for the register value saved in the shadow space.
thanks Japheth :t
that's good idea, I'll try it tonight :biggrin:
I have done it but it brakes :(
here is the code where I implemented it:
/* added for W64F_STATICRSP */
static void win64_StoreRegHome( struct proc_info *info )
/*******************************************************/
{
int i = 0;
int cnt;
int grcount=0;
int sizestd =0;
int r;
uint_16 *regist;
info->stored_reg = 0;
if ( info->regslist ) {
for( regist = info->regslist,cnt = *regist++; cnt; cnt--, regist++,i++ ) {
if ( GetValueSp( *regist ) & OP_XMM ) continue;
else ++grcount;
}
for (i=0,r=0;i<4;i++){
if (info->home_used[i]==0) ++r;
}
if (r){
if (grcount==1) memset(info->home_used, 1, 4);
else if (grcount==2 && r >= 2){
for (i=0;i<4;i++){
if (info->home_used[i]==0) break;
}
for (++i;i<4;i++)
info->home_used[i]=1;
}
else if (grcount==3){
if ( r == 1) memset(info->home_used, 1, 4);
if ( r >= 3){
for (i=0;i<4;i++){
if (info->home_used[i]==0) break;
}
for (++i;i<4;i++){
if (info->home_used[i]==0) break;
}
for (++i;i<4;i++)
info->home_used[i]=1;
}
}
else if (grcount==4 && r == 4){
info->home_used[4]=1;
}
else if (grcount > 4){
r = grcount-r;
if (!(r & 1 )){
for (i=0;i<4;i++){
if (info->home_used[i]==0) break;
}
info->home_used[i]=1;
}
}
}
for( i=0,regist = info->regslist,cnt = *regist++; cnt; cnt--, regist++,i++ ) {
if ( GetValueSp( *regist ) & OP_XMM ) continue;
else {
sizestd += 8;
if (i < 4) {
if (info->home_used[i]==0){
AddLineQueueX( "mov [%r+%u], %r",T_RSP, NUMQUAL sizestd, *regist );
if ( ( 1 << GetRegNo( *regist ) ) & win64_nvgpr )
AddLineQueueX( "%r %r, %u", T_DOT_SAVEREG, *regist, NUMQUAL sizestd);
info->stored_reg++;
}
else {
cnt++;regist--;
}
}
}
}/* end for */
}
return;
}
Quote from: habran on August 29, 2013, 08:57:10 PM
I have done it but it brakes :(
Alles muss man selber machen! :icon_mrgreen:
I did a few experiments with the ML64-compatible version. Result: it works to save a register content into the shadow space - the OS will restore the contents if the proper .SAVEREG directive has been used.
However: the offset to use with .SAVEREG must not be calculated from the current value of RSP, but from the value RSP will have after the prologue! Here's the prologue that worked for me:
func1 proc frame:func1_eh
@localsize = 5*8
; push rbx
; .pushreg rbx
mov [rsp+8], rbx ;save register rbx in shadow space
.savereg rbx, 8+16+@localsize ;!!! offset must be the "offset" to the "final" RSP
push rsi
.pushreg rsi
push rdi
.pushreg rdi
sub rsp,@localsize
.allocstack @localsize
.endprolog
and the epilogue looks like this:
add rsp, @localsize
pop rdi
pop rsi
; pop rbx
mov rbx, [rsp+8]
ret
you meant: "If you want the job done properly do it yourself" :biggrin:
I have tried it with:
if (info->home_used[i]==0){
AddLineQueueX( "mov [%r+%u], %r",T_RSP, NUMQUAL sizestd, *regist );
if ( ( 1 << GetRegNo( *regist ) ) & win64_nvgpr )
AddLineQueueX( "%r %r, %u", T_DOT_SAVEREG, *regist, NUMQUAL sizestd + 16 + info->localsize);
info->stored_reg++;
}
else {
cnt++;regist--;
}
}
It still brakes :(
Quote from: habran on August 30, 2013, 06:32:26 AM
It still brakes :(
This has probably nothing to do with unwind. It looks like you're allocating a too small stack space:
main_eh:
0000000000000128: 48 89 4C 24 08 mov qword ptr [rsp+8],rcx
000000000000012D: 48 89 54 24 10 mov qword ptr [rsp+10h],rdx
0000000000000132: 4C 89 44 24 18 mov qword ptr [rsp+18h],r8
0000000000000137: 55 push rbp
0000000000000138: 48 8B EC mov rbp,rsp
000000000000013B: 48 83 EC 20 sub rsp,20h
000000000000013F: 48 8B 4D 10 mov rcx,qword ptr [rbp+10h]
0000000000000143: 48 8B 45 20 mov rax,qword ptr [rbp+20h]
0000000000000147: 48 89 44 24 38 mov qword ptr [rsp+38h],rax ;<---- !!!!!!
000000000000014C: 48 8B 45 18 mov rax,qword ptr [rbp+18h]
0000000000000150: 48 89 44 24 30 mov qword ptr [rsp+30h],rax ;<---- !!!!!!
0000000000000155: 48 8B 41 10 mov rax,qword ptr [rcx+10h]
0000000000000159: 48 89 44 24 28 mov qword ptr [rsp+28h],rax ;<---- !!!!!!
000000000000015E: 48 8B 41 08 mov rax,qword ptr [rcx+8]
0000000000000162: 48 89 44 24 20 mov qword ptr [rsp+20h],rax ;<---- !!!!!!
0000000000000167: 44 8B 49 04 mov r9d,dword ptr [rcx+4]
000000000000016B: 44 8B 01 mov r8d,dword ptr [rcx]
000000000000016E: 48 8B D1 mov rdx,rcx
0000000000000171: 48 B9 00 00 00 00 00 00 00 00 mov rcx,offset ??0004
000000000000017B: E8 00 00 00 00 call printf
In this excerpt you're allocating 20h bytes stack space, but the printf function that is called later needs 40h bytes.
Perhaps you did optimize a bit too much :icon_mrgreen:
You are right, however, there is no locals in that function that's why it is allocated only size for the reserved stack
I tried to put 4 QWORD dummy locals an it works than
I am actually very busy with my job and did not have enough time to study the sources
I hope on Sunday I will have more time to play with it and hopefully find the solution
thank you Japheth for engaging yourself in this case, I appreciate that :t
I did not pay attention that you told me ::)
QuoteMasm64-compatible version, which has to emit the SEH-primitives manually
now I understand what you were talking about and I will work on it as soon as I have a little bit of time
I tasted your version rc10 , it is working fine and I am looking forward for your source code :t
however, 16 byte alignment in the beginning of proc is missing and that bothers me
my last version has bin fixed so that it works as I wanted it and emits debug info correctly
the only thing now is that damn SEH thing :(
I'll concentrate on that to make it work
I have uploaded new version at the top of the thread
there was error with restoring a stack when USES xmm used without general registers
replace files with the new ones and rebuild the project