Let's say I have a structure with 3 or more members.
STRUCT mystruct
_a dd ?
_b dd ?
_c dd ?
ENDS mystruct
I would like _a to be negative when mystruct is being addressed.
_a = -4
_b = 0
_c = 4
Rather than use EQU, is it possible to address _a in a structure as negative offset?
Say I use instruction
inc dword ptr [eax][mystruct._a]
I'd like this to be translated into
inc dword ptr [eax-4]
No.
mystruct._a= 0 mystruct._b= 4 mystruct._c= 8
Not possible with a STRUCT, but no problem for a macro, e.g. SetGlobals (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1015).
Example:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
mystruct STRUCT
_a dd ?
_b dd ?
_c dd ?
mystruct ENDS
SetGlobals ms:mystruct
Init
SetGlobals ; no args = set ebx
inc ms._a ; Olly (http://www.ollydbg.de/version2.html) shows inc dword ptr [ebx-80]
inkey str$(ms._a), 13, 10
Exit
end start
I guess you want the short encodings for a structure that is longer than 128 bytes, right?
Quote from: peter_asm on June 24, 2014, 05:17:27 AMRather than use EQU, is it possible to address _a in a structure as negative offset?
subtract the corresponding structure field offset:
e.g. : inc [eax-mystruct._b].mystruct._a
However, it seems like a bad design - you might better describe your actual problem.
i'm with qWord - seems like bad design
but, it can be done
this might work....
mystruct STRUCT
_a dd ?
_b dd ?
_c dd ?
mystruct ENDS
o_a EQU mystruct._a - mystruct._b
o_b EQU mystruct._b - mystruct._b
o_c EQU mystruct._c - mystruct._b
.DATA?
mst mystruct <>
.CODE
mov eax,offset mst._b
mov ecx,[eax+o_a]
mov edx,[eax+o_c]
i theory, you should be able to replace the "+" in the last 2 lines with "."
okay, thanks for feedback. I'll stick with EQU instead of using structure.
To answer why I wanted to do it, I was avoiding prologue/epilogue code for local stack variables.
Just playing around with the stack, that's all.
deleted
deleted
Quote from: peter_asm on June 24, 2014, 07:54:04 AM
okay, thanks for feedback. I'll stick with EQU instead of using structure.
To answer why I wanted to do it, I was avoiding prologue/epilogue code for local stack variables.
Just playing around with the stack, that's all.
peter,
it seems to be a
good idea but you
never need any negative offset
to access parameters or local variables. And you should use ESP.
The problem is to write one structure for each procedure. Using EQU seems to be more easy.
nidud gives you an example (don't need «mov eax, esp»).
See the example below
deleted
pre-P4 (SSE1)
------------------------------------------------------
3266213 cycles - 0: standard (ebp)
3268743 cycles - 1: no pro/epilogue (esp)
3272430 cycles - 2: no pro/epilogue (eax)
3270374 cycles - 0: standard (ebp)
3267652 cycles - 1: no pro/epilogue (esp)
3269672 cycles - 2: no pro/epilogue (eax)
3264680 cycles - 0: standard (ebp)
3267378 cycles - 1: no pro/epilogue (esp)
3288766 cycles - 2: no pro/epilogue (eax)
--- ok ---
Older AMD chip from my University Computer:
AMD Athlon(tm) Dual Core Processor 5000B (SSE3)
------------------------------------------------------
1697913 cycles - 0: standard (ebp)
1706533 cycles - 1: no pro/epilogue (esp)
1700249 cycles - 2: no pro/epilogue (eax)
1695385 cycles - 0: standard (ebp)
1712269 cycles - 1: no pro/epilogue (esp)
1693233 cycles - 2: no pro/epilogue (eax)
1693354 cycles - 0: standard (ebp)
1711493 cycles - 1: no pro/epilogue (esp)
1695662 cycles - 2: no pro/epilogue (eax)
--- ok ---
Gunther
nidud,
yes i know this (i did a lot of tests about it -
for instance in converters that i posted here
the results are here elsewhere)
But we may gain the ebp if we need it.
Using eax or ebp seems to give the same result in your tests.
Generally, i write standard versions and then i modify it
to use ESP and i test it. More or less generally it gives me
the same result but i gain ebp.
It seems to me that your test is not a good/best test
in this particular case (but ... ?).
:t
Just as side note, jWasm has the option stackbase, which allows to omit EBP as frame pointer. When modifying ESP, the equate @StackBase must be correct appropriate:
option stackbase:esp
foo proc arg:DWORD
LOCAL x:DWORD,y:DWORD
mov eax,arg
mov x,eax
push eax
@StackBase = @StackBase + 4
mov y,-123
pop eax
@StackBase = @StackBase - 4
ret
foo endp
Unfortunately the INVOKE directive currently does not respect @StackBase, thus it can't be used with locals or procedure parameters.
qWord,
interesting note !
I use invoke in any case
and i compute all things i want. :t
nidud,
i got this
***** Time table *****
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
341 cycles, ConvertFloat10DX, direct, fxam, fxtract, esp - 10 digits
341 cycles, ConvertFloat10DY, direct, examine, fxtract, esp - 10 digits
347 cycles, ConvertReal10DX, direct, fxam, fxtract, esp - 10 digits
356 cycles, ConvertReal10DY, direct, examine, fxtract, esp - 10 digits
356 cycles, ConvertReal10DR, direct, examine, fxtract, ebp - 10 digits
356 cycles, ConvertFloat10DR, direct, examine, fxtract, ebp - 10 digits
365 cycles, ConvertFloat10DRD, direct, examine, fxtract, ebp - 10 digits
365 cycles, ConvertReal10DRD, direct, examine, fxtract, ebp - 10 digits
411 cycles, ConvertFloat10DYD, direct, examine, fxtract, esp - 10 digits
414 cycles, ConvertFloat10DXD, direct, fxam, fxtract, esp - 10 digits
418 cycles, ConvertReal10DYD, direct, examine, fxtract, esp - 10 digits
423 cycles, ConvertReal10DXD, direct, fxam, fxtract, esp - 10 digits
475 cycles, ConvertFloat10DF, direct, examine, fyl2x, ebp - 10 digits
478 cycles, ConvertFloat10DS, direct, fxam, fyl2x, ebp - 10 digits
479 cycles, ConvertReal10DF, direct, examine, fyl2x, ebp - 10 digits
486 cycles, ConvertReal10DSD, direct, fxam, fyl2x, ebp - 10 digits
487 cycles, ConvertFloat10DFD, direct, examine, fyl2x, ebp - 10 digits
487 cycles, ConvertFloat10DSD, direct, fxam, fyl2x, ebp - 10 digits
489 cycles, ConvertReal10DS, direct, fxam, fyl2x, ebp - 10 digits
492 cycles, ConvertReal10DFD, direct, examine, fyl2x, ebp - 10 digits
716 cycles, ConvertFloat10CT, BCD-CT, fxam, fxtract, esp - 10 digits
724 cycles, ConvertFloat10BX, BCD, fxam, fxtract, ebp - 10 digits
729 cycles, ConvertReal10CTD, BCD-CT, fxam, fxtract, esp - 10 digits
731 cycles, ConvertReal10BX, BCD, fxam, fxtract, ebp - 10 digits
739 cycles, ConvertFloat10BY, BCD, examine, fxtract, esp - 10 digits
739 cycles, ConvertFloat10CTD, BCD-CT, fxam, fxtract, esp - 10 digits
742 cycles, ConvertReal10BFD, BCD, fxam, fxtract, esp - 10 digits
744 cycles, ConvertFloat10BF, BCD, fxam, fxtract, esp - 10 digits
744 cycles, ConvertReal10BYD, BCD, examine, fxtract, esp - 10 digits
745 cycles, ConvertFloat10BFD, BCD, fxam, fxtract, esp - 10 digits
745 cycles, ConvertFloat10BYD, BCD, examine, fxtract, esp - 10 digits
746 cycles, ConvertReal10BY, BCD, examine, fxtract, esp - 10 digits
749 cycles, ConvertReal10BXD, BCD, fxam, fxtract, ebp - 10 digits
750 cycles, ConvertFloat10BXD, BCD, fxam, fxtract, ebp - 10 digits
752 cycles, ConvertReal10BF, BCD, fxam, fxtract, esp - 10 digits
775 cycles, ConvertReal10CT, BCD-CT, fxam, fxtract, esp - 10 digits
1147 cycles, ConvertFloat10ZX, BCD - old - 10 digits
1154 cycles, ConvertFloat10Z, BCD -old - 10 digits
2704 cycles, ConvertFloat10DWD, direct,Save FPU, fxam, fxtract, esp -10 digits
2709 cycles, ConvertReal10DWD, direct,Save FPU, fxam, fxtract, esp -10 digits
2798 cycles, ConvertFloat10DW, direct,Save FPU, fxam, fxtract, esp -10 digits
2816 cycles, ConvertReal10DW, direct,Save FPU, fxam, fxtract, esp -10 digits
2989 cycles, ConvertFloat10BW, BCD, Save FPU, fxam, fxtract, ebp - 10 digits
2995 cycles, ConvertReal10BW, BCD, Save FPU, fxam, fxtract, ebp - 10 digits
3042 cycles, ConvertFloat10, BCD, Save FPU -old - 10 digits
3104 cycles, ConvertFloat10BWD, BCD, Save FPU, fxam, fxtract, ebp - 10 digits
3126 cycles, ConvertReal10BWD, BCD, Save FPU, fxam, fxtract, ebp - 10 digits
********** END **********
Quote from: qWord on June 25, 2014, 12:10:49 AM
Just as side note, jWasm has the option stackbase, which allows to omit EBP as frame pointer.
Déjà vu, in a different context (http://www.masmforum.com/board/index.php?topic=11766.msg88926#msg88926). Locals relative to esp aren't any faster but [esp+n] is one byte longer than [ebp+n]. However, "frameless" seems a magic word for assembler programmers ;-)
CPU manufacturers have optimized use of EBP for a stack frame
however, the MASM stack frames (default prologue and epilogue) are less than ideal
i often find myself writing my own stack frame code to get what i want :P
i may use ESP, but only if i access the stack frame a few times
deleted
These are the results:
Test for correctness:
Frame OFF: 123456 61728
Frame ON: 123456 123456
Timings:
1831 cycles for 100*call stack_frame_on
1810 cycles for 100*stack_frame_OFF
Code sizes:
Frame on: 136
Frame off: 142
--- ok --- :t
a well-managed stack frame is a thing of beauty :biggrin:
OPTION PROLOGUE:None
OPTION EPILOGUE:None
FrameFunc PROC dwArg1:DWORD,dwArg2:DWORD
;-----------------------------------------------------
;use TEXTQU to create labels
;if you want to add or remove locals, just change the offsets here
;it also gives you a nice "graphic" view of the overall stack frame
_dwArg2 TEXTEQU <dword ptr [ebp+28]> ;dwArg2
_dwArg1 TEXTEQU <dword ptr [ebp+24]> ;dwArg1
; [ebp+20] ;RETurn address
; [ebp+16] ;preserved EBX
; [ebp+12] ;preserved ESI
; [ebp+8] ;preserved EDI
_dwEaxRet TEXTEQU <dword ptr [ebp+4]> ;return value for EAX
; [ebp] ;preserved EBP
_dwLocal1 TEXTEQU <dword ptr [ebp-4]> ;dwLocal1
_dwLocal2 TEXTEQU <dword ptr [ebp-8]> ;dwLocal2
_dwLocal3 TEXTEQU <dword ptr [ebp-12]> ;dwLocal3
_lpBuff1 TEXTEQU <dword ptr [ebp-16]> ;lpBuff1
_lpBuff2 TEXTEQU <dword ptr [ebp-20]> ;lpBuff2
;-----------------------------------------------------
xor edx,edx
push ebx
push esi
push edi
push edx
push ebp
mov ebp,esp
;code to calculate the initial value of dwLocal1 in EAX
push eax ;[EBP-4] = _dwLocal1
;code to calculate the initial value of dwLocal2 in EAX
push eax ;[EBP-8] = _dwLocal2
;code to calculate the initial value of dwLocal3 in EAX
push eax ;[EBP-12] = _dwLocal3
;create placeholders for the local buffer pointers
push edx ;[EBP-16] = _lpBuff1
push edx ;[EBP-20] = _lpBuff2
;now, we want to calculate the size of the "buffers"
;they might be string buffers, structures, or local arrays
;their sizes should create 4-aligned addresses
mov edx,esp
;code to calculate the size of Buff1 in EAX
sub edx,eax
mov _lpBuff1,edx ;_lpBuff1 = address of Buff1
;code to calculate the size of Buff2 in EAX
sub edx,eax
mov _lpBuff2,edx ;_lpBuff2 = address of Buff2
;probe the stack down to create the buffers
ASSUME FS:Nothing
stack_probe_loop:
push eax
mov esp,fs:[8] ;FS:[8] = TEB.StackLimit
cmp edx,esp
jb stack_probe_loop
ASSUME FS:ERROR
mov esp,edx
;at this point, the stack frame is all set up - no need to use "[ebp-xx]"
;the local dwords may be accessed by using labels like "_dwLocal1"
;the local buffer address may be accessed by using labels like "_lpBuff1"
;the value to be returned in EAX may be accessed by "_dwEaxRet"
;body of function code
;when it comes time to exit, just execute LEAVE
;that discards all the local dwords and buffers
;the return value for EAX is set up for exit
;EBP, EDI, ESI, and EBX are restored
leave
pop eax
pop edi
pop esi
pop ebx
ret 8
FrameFunc ENDP
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
EDIT: of course, if you have a fixed-size buffer, you can add that before the variable-sized ones, as well :P
Hey Dave,
we have more things to do :biggrin:
Quote from: nidud on June 25, 2014, 02:06:37 AM
It depends on what the aim of the test was, but since it included negative values I assumed locals where used, and arguments was used, so you have to take into account usage as well as calling convention
note: i have not negative values.
My point of view is this:
i prefer to test a particular well defined procedure or set of
code that we need to do something than to get a general
rule (?) from a general test...
It seems that you prefer to use your tests to show what
you want to show.
This is why i don't follow your point of view.
DAVE !!!
Really appreciate the a well-managed stack frame is a thing of beauty (http://masm32.com/board/index.php?topic=3326.msg35046#msg35046) post. :icon_cool:
I've always wondered how to identify stack frames,...assuming that I could actually find the stack,... :biggrin:
this one adds a fixed-size buffer (or structure, array)
OPTION PROLOGUE:None
OPTION EPILOGUE:None
FrameFunc PROC dwArg1:DWORD,dwArg2:DWORD
;-----------------------------------------------------
;use TEXTQU to create labels
;if you want to add or remove locals, just change the offsets here
;it also gives you a nice "graphic" view of the overall stack frame
_dwArg2 TEXTEQU <dword ptr [ebp+28]> ;dwArg2
_dwArg1 TEXTEQU <dword ptr [ebp+24]> ;dwArg1
; [ebp+20] ;RETurn address
; [ebp+16] ;preserved EBX
; [ebp+12] ;preserved ESI
; [ebp+8] ;preserved EDI
_dwEaxRet TEXTEQU <dword ptr [ebp+4]> ;return value for EAX
; [ebp] ;preserved EBP
_dwLocal1 TEXTEQU <dword ptr [ebp-4]> ;dwLocal1
_dwLocal2 TEXTEQU <dword ptr [ebp-8]> ;dwLocal2
_dwLocal3 TEXTEQU <dword ptr [ebp-12]> ;dwLocal3
_lpBuff2 TEXTEQU <dword ptr [ebp-16]> ;lpBuff2
_lpBuff3 TEXTEQU <dword ptr [ebp-20]> ;lpBuff3
_Buff1 TEXTEQU <[ebp-60]> ;Buff1 = 40 byte fixed size buffer
;-----------------------------------------------------
xor edx,edx
push ebx
push esi
push edi
push edx
push ebp
mov ebp,esp
;code to calculate the initial value of dwLocal1 in EAX
push eax ;[EBP-4] = _dwLocal1
;code to calculate the initial value of dwLocal2 in EAX
push eax ;[EBP-8] = _dwLocal2
;code to calculate the initial value of dwLocal3 in EAX
push eax ;[EBP-12] = _dwLocal3
;create placeholders for the variable-size local buffer pointers
push edx ;[EBP-16] = _lpBuff2
push edx ;[EBP-20] = _lpBuff3
;now, we want to calculate the size of the "buffers"
;they might be string buffers, structures, or local arrays
;their sizes should create 4-aligned addresses
lea edx,_Buff1 ;start with the address of the lowest fixed-size buffer
;code to calculate the size of Buff2 in EAX
sub edx,eax
mov _lpBuff2,edx ;_lpBuff2 = address of Buff2
;code to calculate the size of Buff3 in EAX
sub edx,eax
mov _lpBuff3,edx ;_lpBuff3 = address of Buff3
;probe the stack down to create the buffers
ASSUME FS:Nothing
stack_probe_loop:
push eax
mov esp,fs:[8] ;FS:[8] = TEB.StackLimit
cmp edx,esp
jb stack_probe_loop
ASSUME FS:ERROR
mov esp,edx
;at this point, the stack frame is all set up - no need to use "[ebp-xx]"
;the local dwords may be accessed by using labels like "_dwLocal1"
;the local fixed-size buffer may be addressed directly by _Buff1, perhaps using LEA
;the local variable-size buffer addresses may be accessed by using labels like "_lpBuff2"
;the value to be returned in EAX may be accessed by "_dwEaxRet"
;body of function code
;when it comes time to exit, just execute LEAVE
;that discards all the local dwords and buffers
;the return value for EAX is set up for exit
;EBP, EDI, ESI, and EBX are restored
leave
pop eax
pop edi
pop esi
pop ebx
ret 8
FrameFunc ENDP
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
deleted
deleted
Yes, i see it and i see your posts, bidud.
but i don't follow your sol. also
but nothing wrong. It seems that you
didnt see what i wrote, but you do
exactly what you want to do.
EDIT: i got some results in milliseconds
where esp is better.
But it says nothing to me as i said
before. That´s all.
deleted
Quote from: nidud on June 25, 2014, 02:06:37 AMUsing ESP is slower than using other register but the call is faster if EBP is not used as base pointer.
No difference here, except for fastcall, of course:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
------------------------------------------------------
3235176 cycles - 0: standard (ebp)
3231541 cycles - 1: no pro/epilogue (esp)
3229054 cycles - 2: no pro/epilogue (eax)
2223710 cycles - 3: fastcall(eax,edx)btw your code builds only with JWasm.
deleted
deleted
I would not worry too much about FASTCALL, its always been easy enough to do your own. For those old enough to have written DLLs in Win16, you had very little stack space to play with so one of the many tricks apart from passing in registers was to have allocated a number of globals that you could write data to then call the DLL function. Worked fine and was fast enough. :biggrin:
Quote from: hutch-- on June 25, 2014, 07:24:26 PM
I would not worry too much about FASTCALL, its always been easy enough to do your own. For those old enough to have written DLLs in Win16, you had very little stack space to play with so one of the many tricks apart from passing in registers was to have allocated a number of globals that you could write data to then call the DLL function. Worked fine and was fast enough. :biggrin:
Right. :t
Gunther
deleted
Fastcall is an issue for compilers. In assembler, you just use the registers you like. If we were forced to adopt one here, it should be M$, because Masm32 is clearly a Windows library. But we are not forced to adopt one, and why pick a stupid one that uses ecx instead of eax, the argument returned by Windows?
deleted
From what I can tell, the Win64 version of FASTCALL is a leftover from clapped out RISC compiler theory where some hardware had a large collection of registers to play with. It has always seemed rediculous to me that RISC theory is being taught when the majority of computers around the world are CISC and while you can waste registers in x86-64 simulating RISC, it comes at the cost of trying to make x86 like processors that are long dead.
The old STDCALL was infinitely extendable in terms of call depth being purely stack based and the only effective limit was stack memory where you run out of registers real fast with a high argument count but it also limits techniques like recursion unless you supplement the register base with stack memory. In a common algo like a quick sort, you can easily get depths of thousands and while the better designs avoid this, you just cannot write a recursive algo with registers alone.
i have tended to see Win64 as much the same mess as the old Win16, obscure and untidy function calling mechanisms, Win32 was designed by the old VAX guys and it was free of most of the Microsoft YUK but it appears that with Microsoft doing the design of what is another hybrid OS version, ( Win7 and 8 ) they have gone back to badly designed and obscure calling methods.
It will probably not be until you have a full long mode 64 bit OS version that you will have a chance to write clean code like Win32 again, do everything in 64 bit on hardware that has enough memory(Win64 does not have that yet) and you will get faster cleaner code for it. To do this well the OS will need in the future, terrabyte volumes of memory and the hardware is not yet up to this.
Think in terms of 4K video running at 60 frames a second and in the future faster again, 120 or higher and you are going to start to need far more powerful hardware than is currently available. I can run 1080 at 60 frames a second OK on my old quad (monitor does not support higher) and current video looks very good but the future is going to be much faster. By this time current 64 bit Windows will be old junk, something like all the fanfare that surrounded the introduction of the i386 so long ago.