The MASM Forum

General => The Campus => Topic started by: peter_asm on June 24, 2014, 05:17:27 AM

Title: Negative offset in structure
Post by: peter_asm on June 24, 2014, 05:17:27 AM
Let's say I have a structure with 3 or more members.

STRUCT mystruct
  _a  dd ?
  _b  dd ?
  _c  dd ?
ENDS mystruct


I would like _a to be negative when mystruct is being addressed.


_a = -4
_b = 0
_c = 4


Rather than use EQU, is it possible to address _a in a structure as negative offset?
Say I use instruction

inc dword ptr [eax][mystruct._a]

I'd like this to be translated into

inc dword ptr [eax-4]
Title: Re: Negative offset in structure
Post by: RuiLoureiro on June 24, 2014, 06:24:08 AM
No.
          mystruct._a= 0       mystruct._b= 4       mystruct._c= 8
Title: Re: Negative offset in structure
Post by: jj2007 on June 24, 2014, 06:43:28 AM
Not possible with a STRUCT, but no problem for a macro, e.g. SetGlobals (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1015).
Example:

include \masm32\MasmBasic\MasmBasic.inc      ; download (http://masm32.com/board/index.php?topic=94.0)

mystruct STRUCT
  _a  dd ?
  _b  dd ?
  _c  dd ?
mystruct ENDS

  SetGlobals ms:mystruct
  Init
  SetGlobals      ; no args = set ebx
  inc ms._a      ; Olly (http://www.ollydbg.de/version2.html) shows inc dword ptr [ebx-80]
  inkey str$(ms._a), 13, 10
  Exit
end start


I guess you want the short encodings for a structure that is longer than 128 bytes, right?
Title: Re: Negative offset in structure
Post by: qWord on June 24, 2014, 06:46:25 AM
Quote from: peter_asm on June 24, 2014, 05:17:27 AMRather than use EQU, is it possible to address _a in a structure as negative offset?
subtract the corresponding structure field offset:
e.g. : inc [eax-mystruct._b].mystruct._a

However, it seems like a bad design - you might better describe your actual problem.
Title: Re: Negative offset in structure
Post by: dedndave on June 24, 2014, 07:00:26 AM
i'm with qWord - seems like bad design
but, it can be done

this might work....

mystruct STRUCT
  _a  dd ?
  _b  dd ?
  _c  dd ?
mystruct ENDS

o_a EQU mystruct._a - mystruct._b
o_b EQU mystruct._b - mystruct._b
o_c EQU mystruct._c - mystruct._b

    .DATA?

mst mystruct <>

    .CODE

    mov     eax,offset mst._b
    mov     ecx,[eax+o_a]
    mov     edx,[eax+o_c]


i theory, you should be able to replace the "+" in the last 2 lines with "."
Title: Re: Negative offset in structure
Post by: peter_asm on June 24, 2014, 07:54:04 AM
okay, thanks for feedback. I'll stick with EQU instead of using structure.
To answer why I wanted to do it, I was avoiding prologue/epilogue code for local stack variables.
Just playing around with the stack, that's all.
Title: Re: Negative offset in structure
Post by: nidud on June 24, 2014, 08:55:57 AM
deleted
Title: Re: Negative offset in structure
Post by: nidud on June 24, 2014, 09:12:30 AM
deleted
Title: Re: Negative offset in structure
Post by: RuiLoureiro on June 24, 2014, 07:44:36 PM
Quote from: peter_asm on June 24, 2014, 07:54:04 AM
okay, thanks for feedback. I'll stick with EQU instead of using structure.
To answer why I wanted to do it, I was avoiding prologue/epilogue code for local stack variables.
Just playing around with the stack, that's all.
peter,
            it seems to be a good idea but you never need any negative offset
            to access parameters or local variables. And you should use ESP.
            The problem is to write one structure for each procedure.
            Using EQU seems to be more easy.
            nidud gives you an example (don't need «mov eax, esp»).
            See the example below
Title: Re: Negative offset in structure
Post by: nidud on June 24, 2014, 10:03:15 PM
deleted
Title: Re: Negative offset in structure
Post by: FORTRANS on June 24, 2014, 10:33:33 PM

pre-P4 (SSE1)
------------------------------------------------------
3266213 cycles - 0: standard (ebp)
3268743 cycles - 1: no pro/epilogue (esp)
3272430 cycles - 2: no pro/epilogue (eax)

3270374 cycles - 0: standard (ebp)
3267652 cycles - 1: no pro/epilogue (esp)
3269672 cycles - 2: no pro/epilogue (eax)

3264680 cycles - 0: standard (ebp)
3267378 cycles - 1: no pro/epilogue (esp)
3288766 cycles - 2: no pro/epilogue (eax)

--- ok ---
Title: Re: Negative offset in structure
Post by: Gunther on June 24, 2014, 11:19:22 PM
Older AMD chip from my University Computer:

AMD Athlon(tm) Dual Core Processor 5000B (SSE3)
------------------------------------------------------
1697913 cycles - 0: standard (ebp)
1706533 cycles - 1: no pro/epilogue (esp)
1700249 cycles - 2: no pro/epilogue (eax)

1695385 cycles - 0: standard (ebp)
1712269 cycles - 1: no pro/epilogue (esp)
1693233 cycles - 2: no pro/epilogue (eax)

1693354 cycles - 0: standard (ebp)
1711493 cycles - 1: no pro/epilogue (esp)
1695662 cycles - 2: no pro/epilogue (eax)

--- ok ---


Gunther
Title: Re: Negative offset in structure
Post by: RuiLoureiro on June 24, 2014, 11:30:11 PM
nidud,
           yes i know this (i did a lot of tests about it -
           for instance in converters that i posted here
           the results are here elsewhere)
           But we may gain the ebp if we need it.
           Using eax or ebp seems to give the same result in your tests.
           Generally, i write standard versions and then i modify it
           to use ESP and i test it. More or less generally it gives me
           the same result but i gain ebp.
           It seems to me that your test is not a good/best test
           in this particular case (but ... ?).
            :t
Title: Re: Negative offset in structure
Post by: qWord on June 25, 2014, 12:10:49 AM
Just as side note, jWasm has the option stackbase, which allows to omit EBP as frame pointer. When modifying ESP, the equate @StackBase must be correct appropriate:
option stackbase:esp

foo proc arg:DWORD
LOCAL x:DWORD,y:DWORD

mov eax,arg
mov x,eax

push eax
@StackBase = @StackBase + 4

mov y,-123

pop eax
@StackBase = @StackBase - 4

ret

foo endp

Unfortunately the INVOKE directive currently does not respect @StackBase, thus it can't be used with locals or procedure parameters.
Title: Re: Negative offset in structure
Post by: RuiLoureiro on June 25, 2014, 12:22:20 AM
qWord,
            interesting note !
            I use invoke in any case
            and i compute all things i want.  :t

nidud,
            i got this

***** Time table *****
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
341  cycles, ConvertFloat10DX, direct, fxam, fxtract, esp - 10 digits
341  cycles, ConvertFloat10DY, direct, examine, fxtract, esp - 10 digits
347  cycles, ConvertReal10DX, direct, fxam, fxtract, esp - 10 digits
356  cycles, ConvertReal10DY, direct, examine, fxtract, esp - 10 digits
356  cycles, ConvertReal10DR, direct, examine, fxtract, ebp - 10 digits
356  cycles, ConvertFloat10DR, direct, examine, fxtract, ebp - 10 digits
365  cycles, ConvertFloat10DRD, direct, examine, fxtract, ebp - 10 digits
365  cycles, ConvertReal10DRD, direct, examine, fxtract, ebp - 10 digits
411  cycles, ConvertFloat10DYD, direct, examine, fxtract, esp - 10 digits
414  cycles, ConvertFloat10DXD, direct, fxam, fxtract, esp - 10 digits
418  cycles, ConvertReal10DYD, direct, examine, fxtract, esp - 10 digits
423  cycles, ConvertReal10DXD, direct, fxam, fxtract, esp - 10 digits
475  cycles, ConvertFloat10DF, direct, examine, fyl2x, ebp - 10 digits
478  cycles, ConvertFloat10DS, direct, fxam, fyl2x, ebp - 10 digits
479  cycles, ConvertReal10DF, direct, examine, fyl2x, ebp - 10 digits
486  cycles, ConvertReal10DSD, direct, fxam, fyl2x, ebp - 10 digits
487  cycles, ConvertFloat10DFD, direct, examine, fyl2x, ebp - 10 digits
487  cycles, ConvertFloat10DSD, direct, fxam, fyl2x, ebp - 10 digits
489  cycles, ConvertReal10DS, direct, fxam, fyl2x, ebp - 10 digits
492  cycles, ConvertReal10DFD, direct, examine, fyl2x, ebp - 10 digits
716  cycles, ConvertFloat10CT, BCD-CT, fxam, fxtract, esp - 10 digits
724  cycles, ConvertFloat10BX, BCD, fxam, fxtract, ebp - 10 digits
729  cycles, ConvertReal10CTD, BCD-CT, fxam, fxtract, esp - 10 digits
731  cycles, ConvertReal10BX, BCD, fxam, fxtract, ebp - 10 digits
739  cycles, ConvertFloat10BY, BCD, examine, fxtract, esp - 10 digits
739  cycles, ConvertFloat10CTD, BCD-CT, fxam, fxtract, esp - 10 digits
742  cycles, ConvertReal10BFD, BCD, fxam, fxtract, esp - 10 digits
744  cycles, ConvertFloat10BF, BCD, fxam, fxtract, esp - 10 digits
744  cycles, ConvertReal10BYD, BCD, examine, fxtract, esp - 10 digits
745  cycles, ConvertFloat10BFD, BCD, fxam, fxtract, esp - 10 digits
745  cycles, ConvertFloat10BYD, BCD, examine, fxtract, esp - 10 digits
746  cycles, ConvertReal10BY, BCD, examine, fxtract, esp - 10 digits
749  cycles, ConvertReal10BXD, BCD, fxam, fxtract, ebp - 10 digits
750  cycles, ConvertFloat10BXD, BCD, fxam, fxtract, ebp - 10 digits
752  cycles, ConvertReal10BF, BCD, fxam, fxtract, esp - 10 digits
775  cycles, ConvertReal10CT, BCD-CT, fxam, fxtract, esp - 10 digits
1147  cycles, ConvertFloat10ZX, BCD - old - 10 digits
1154  cycles, ConvertFloat10Z, BCD -old - 10 digits
2704  cycles, ConvertFloat10DWD, direct,Save FPU, fxam, fxtract, esp -10 digits
2709  cycles, ConvertReal10DWD, direct,Save FPU, fxam, fxtract, esp -10 digits
2798  cycles, ConvertFloat10DW, direct,Save FPU, fxam, fxtract, esp -10 digits
2816  cycles, ConvertReal10DW, direct,Save FPU, fxam, fxtract, esp -10 digits
2989  cycles, ConvertFloat10BW, BCD, Save FPU, fxam, fxtract, ebp - 10 digits
2995  cycles, ConvertReal10BW, BCD, Save FPU, fxam, fxtract, ebp - 10 digits
3042  cycles, ConvertFloat10, BCD, Save FPU -old - 10 digits
3104  cycles, ConvertFloat10BWD, BCD, Save FPU, fxam, fxtract, ebp - 10 digits
3126  cycles, ConvertReal10BWD, BCD, Save FPU, fxam, fxtract, ebp - 10 digits
********** END **********
Title: Re: Negative offset in structure
Post by: jj2007 on June 25, 2014, 01:41:43 AM
Quote from: qWord on June 25, 2014, 12:10:49 AM
Just as side note, jWasm has the option stackbase, which allows to omit EBP as frame pointer.

Déjà vu, in a different context (http://www.masmforum.com/board/index.php?topic=11766.msg88926#msg88926). Locals relative to esp aren't any faster but [esp+n] is one byte longer than [ebp+n]. However, "frameless" seems a magic word for assembler programmers ;-)
Title: Re: Negative offset in structure
Post by: dedndave on June 25, 2014, 01:45:26 AM
CPU manufacturers have optimized use of EBP for a stack frame

however, the MASM stack frames (default prologue and epilogue) are less than ideal
i often find myself writing my own stack frame code to get what i want   :P

i may use ESP, but only if i access the stack frame a few times
Title: Re: Negative offset in structure
Post by: nidud on June 25, 2014, 02:06:37 AM
deleted
Title: Re: Negative offset in structure
Post by: RuiLoureiro on June 25, 2014, 02:07:22 AM
These are the results:

Test for correctness:
Frame OFF: 123456       61728
Frame  ON: 123456       123456

Timings:
1831    cycles for 100*call stack_frame_on
1810    cycles for 100*stack_frame_OFF

Code sizes:
Frame on:       136
Frame off:      142
--- ok ---  :t
Title: Re: Negative offset in structure
Post by: dedndave on June 25, 2014, 02:53:29 AM
a well-managed stack frame is a thing of beauty   :biggrin:

        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

FrameFunc PROC dwArg1:DWORD,dwArg2:DWORD

;-----------------------------------------------------

;use TEXTQU to create labels
;if you want to add or remove locals, just change the offsets here
;it also gives you a nice "graphic" view of the overall stack frame

_dwArg2    TEXTEQU <dword ptr [ebp+28]>  ;dwArg2
_dwArg1    TEXTEQU <dword ptr [ebp+24]>  ;dwArg1
;                             [ebp+20]   ;RETurn address
;                             [ebp+16]   ;preserved EBX
;                             [ebp+12]   ;preserved ESI
;                             [ebp+8]    ;preserved EDI
_dwEaxRet  TEXTEQU <dword ptr [ebp+4]>   ;return value for EAX
;                             [ebp]      ;preserved EBP
_dwLocal1  TEXTEQU <dword ptr [ebp-4]>   ;dwLocal1
_dwLocal2  TEXTEQU <dword ptr [ebp-8]>   ;dwLocal2
_dwLocal3  TEXTEQU <dword ptr [ebp-12]>  ;dwLocal3
_lpBuff1   TEXTEQU <dword ptr [ebp-16]>  ;lpBuff1
_lpBuff2   TEXTEQU <dword ptr [ebp-20]>  ;lpBuff2

;-----------------------------------------------------

        xor     edx,edx
        push    ebx
        push    esi
        push    edi
        push    edx
        push    ebp
        mov     ebp,esp

;code to calculate the initial value of dwLocal1 in EAX

        push    eax                      ;[EBP-4] = _dwLocal1

;code to calculate the initial value of dwLocal2 in EAX

        push    eax                      ;[EBP-8] = _dwLocal2

;code to calculate the initial value of dwLocal3 in EAX

        push    eax                      ;[EBP-12] = _dwLocal3

;create placeholders for the local buffer pointers

        push    edx                      ;[EBP-16] = _lpBuff1
        push    edx                      ;[EBP-20] = _lpBuff2

;now, we want to calculate the size of the "buffers"
;they might be string buffers, structures, or local arrays
;their sizes should create 4-aligned addresses

        mov     edx,esp

;code to calculate the size of Buff1 in EAX

        sub     edx,eax
        mov     _lpBuff1,edx             ;_lpBuff1 = address of Buff1

;code to calculate the size of Buff2 in EAX

        sub     edx,eax
        mov     _lpBuff2,edx             ;_lpBuff2 = address of Buff2

;probe the stack down to create the buffers

        ASSUME  FS:Nothing

stack_probe_loop:
        push    eax
        mov     esp,fs:[8]               ;FS:[8] = TEB.StackLimit
        cmp     edx,esp
        jb      stack_probe_loop

        ASSUME  FS:ERROR

        mov     esp,edx

;at this point, the stack frame is all set up - no need to use "[ebp-xx]"
;the local dwords may be accessed by using labels like "_dwLocal1"
;the local buffer address may be accessed by using labels like "_lpBuff1"
;the value to be returned in EAX may be accessed by "_dwEaxRet"


;body of function code


;when it comes time to exit, just execute LEAVE
;that discards all the local dwords and buffers
;the return value for EAX is set up for exit
;EBP, EDI, ESI, and EBX are restored

        leave
        pop     eax
        pop     edi
        pop     esi
        pop     ebx
        ret     8

FrameFunc ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef


EDIT: of course, if you have a fixed-size buffer, you can add that before the variable-sized ones, as well   :P
Title: Re: Negative offset in structure
Post by: RuiLoureiro on June 25, 2014, 03:15:55 AM
Hey Dave,
                 we have more things to do  :biggrin:
Title: Re: Negative offset in structure
Post by: RuiLoureiro on June 25, 2014, 03:21:51 AM
Quote from: nidud on June 25, 2014, 02:06:37 AM
It depends on what the aim of the test was, but since it included negative values I assumed locals where used, and arguments was used, so you have to take into account usage as well as calling convention
note: i have not negative values.

My point of view is this:
i prefer to test a particular well defined procedure or set of
code that we need to do something than to get a general
rule (?) from a general test...
It seems that you prefer to use your tests to show what
you want to show.
This is why i don't follow your point of view.

Title: Re: Negative offset in structure
Post by: Zen on June 25, 2014, 03:55:11 AM
DAVE !!!
Really appreciate the a well-managed stack frame is a thing of beauty (http://masm32.com/board/index.php?topic=3326.msg35046#msg35046) post. :icon_cool:
I've always wondered how to identify stack frames,...assuming that I could actually find the stack,... :biggrin:
Title: Re: Negative offset in structure
Post by: dedndave on June 25, 2014, 04:20:55 AM
this one adds a fixed-size buffer (or structure, array)

        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

FrameFunc PROC dwArg1:DWORD,dwArg2:DWORD

;-----------------------------------------------------

;use TEXTQU to create labels
;if you want to add or remove locals, just change the offsets here
;it also gives you a nice "graphic" view of the overall stack frame

_dwArg2    TEXTEQU <dword ptr [ebp+28]>  ;dwArg2
_dwArg1    TEXTEQU <dword ptr [ebp+24]>  ;dwArg1
;                             [ebp+20]   ;RETurn address
;                             [ebp+16]   ;preserved EBX
;                             [ebp+12]   ;preserved ESI
;                             [ebp+8]    ;preserved EDI
_dwEaxRet  TEXTEQU <dword ptr [ebp+4]>   ;return value for EAX
;                             [ebp]      ;preserved EBP
_dwLocal1  TEXTEQU <dword ptr [ebp-4]>   ;dwLocal1
_dwLocal2  TEXTEQU <dword ptr [ebp-8]>   ;dwLocal2
_dwLocal3  TEXTEQU <dword ptr [ebp-12]>  ;dwLocal3
_lpBuff2   TEXTEQU <dword ptr [ebp-16]>  ;lpBuff2
_lpBuff3   TEXTEQU <dword ptr [ebp-20]>  ;lpBuff3
_Buff1     TEXTEQU           <[ebp-60]>  ;Buff1 = 40 byte fixed size buffer

;-----------------------------------------------------

        xor     edx,edx
        push    ebx
        push    esi
        push    edi
        push    edx
        push    ebp
        mov     ebp,esp

;code to calculate the initial value of dwLocal1 in EAX

        push    eax                      ;[EBP-4] = _dwLocal1

;code to calculate the initial value of dwLocal2 in EAX

        push    eax                      ;[EBP-8] = _dwLocal2

;code to calculate the initial value of dwLocal3 in EAX

        push    eax                      ;[EBP-12] = _dwLocal3

;create placeholders for the variable-size local buffer pointers

        push    edx                      ;[EBP-16] = _lpBuff2
        push    edx                      ;[EBP-20] = _lpBuff3

;now, we want to calculate the size of the "buffers"
;they might be string buffers, structures, or local arrays
;their sizes should create 4-aligned addresses

        lea     edx,_Buff1               ;start with the address of the lowest fixed-size buffer

;code to calculate the size of Buff2 in EAX

        sub     edx,eax
        mov     _lpBuff2,edx             ;_lpBuff2 = address of Buff2

;code to calculate the size of Buff3 in EAX

        sub     edx,eax
        mov     _lpBuff3,edx             ;_lpBuff3 = address of Buff3

;probe the stack down to create the buffers

        ASSUME  FS:Nothing

stack_probe_loop:
        push    eax
        mov     esp,fs:[8]               ;FS:[8] = TEB.StackLimit
        cmp     edx,esp
        jb      stack_probe_loop

        ASSUME  FS:ERROR

        mov     esp,edx

;at this point, the stack frame is all set up - no need to use "[ebp-xx]"
;the local dwords may be accessed by using labels like "_dwLocal1"
;the local fixed-size buffer may be addressed directly by _Buff1, perhaps using LEA
;the local variable-size buffer addresses may be accessed by using labels like "_lpBuff2"
;the value to be returned in EAX may be accessed by "_dwEaxRet"


;body of function code


;when it comes time to exit, just execute LEAVE
;that discards all the local dwords and buffers
;the return value for EAX is set up for exit
;EBP, EDI, ESI, and EBX are restored

        leave
        pop     eax
        pop     edi
        pop     esi
        pop     ebx
        ret     8

FrameFunc ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef
Title: Re: Negative offset in structure
Post by: nidud on June 25, 2014, 05:16:10 AM
deleted
Title: Re: Negative offset in structure
Post by: nidud on June 25, 2014, 05:31:12 AM
deleted
Title: Re: Negative offset in structure
Post by: RuiLoureiro on June 25, 2014, 05:49:28 AM
Yes, i see it and i see your posts, bidud.
but i don't follow your sol. also
but nothing wrong. It seems that you
didnt see what i wrote, but you do
exactly what you want to do.

EDIT: i got some results in milliseconds
         where esp is better.
         But it says nothing to me as i said
         before. That´s all.
Title: Re: Negative offset in structure
Post by: nidud on June 25, 2014, 07:24:46 AM
deleted
Title: Re: Negative offset in structure
Post by: jj2007 on June 25, 2014, 08:27:01 AM
Quote from: nidud on June 25, 2014, 02:06:37 AMUsing ESP is slower than using other register but the call is faster if EBP is not used as base pointer.
No difference here, except for fastcall, of course:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
------------------------------------------------------
3235176 cycles - 0: standard (ebp)
3231541 cycles - 1: no pro/epilogue (esp)
3229054 cycles - 2: no pro/epilogue (eax)
2223710 cycles - 3: fastcall(eax,edx)


btw your code builds only with JWasm.
Title: Re: Negative offset in structure
Post by: nidud on June 25, 2014, 09:25:56 AM
deleted
Title: Re: Negative offset in structure
Post by: nidud on June 25, 2014, 10:28:18 AM
deleted
Title: Re: Negative offset in structure
Post by: hutch-- on June 25, 2014, 07:24:26 PM
I would not worry too much about FASTCALL, its always been easy enough to do your own. For those old enough to have written DLLs in Win16, you had very little stack space to play with so one of the many tricks apart from passing in registers was to have allocated a number of globals that you could write data to then call the DLL function. Worked fine and was fast enough.  :biggrin:
Title: Re: Negative offset in structure
Post by: Gunther on June 25, 2014, 07:25:52 PM
Quote from: hutch-- on June 25, 2014, 07:24:26 PM
I would not worry too much about FASTCALL, its always been easy enough to do your own. For those old enough to have written DLLs in Win16, you had very little stack space to play with so one of the many tricks apart from passing in registers was to have allocated a number of globals that you could write data to then call the DLL function. Worked fine and was fast enough.  :biggrin:

Right.  :t

Gunther
Title: Re: Negative offset in structure
Post by: nidud on June 25, 2014, 08:19:58 PM
deleted
Title: Re: Negative offset in structure
Post by: jj2007 on June 25, 2014, 10:18:34 PM
Fastcall is an issue for compilers. In assembler, you just use the registers you like. If we were forced to adopt one here, it should be M$, because Masm32 is clearly a Windows library. But we are not forced to adopt one, and why pick a stupid one that uses ecx instead of eax, the argument returned by Windows?
Title: Re: Negative offset in structure
Post by: nidud on June 26, 2014, 12:14:35 AM
deleted
Title: Re: Negative offset in structure
Post by: hutch-- on June 26, 2014, 01:41:13 AM
From what I can tell, the Win64 version of FASTCALL is a leftover from clapped out RISC compiler theory where some hardware had a large collection of registers to play with. It has always seemed rediculous to me that RISC theory is being taught when the majority of computers around the world are CISC and while you can waste registers in x86-64 simulating RISC, it comes at the cost of trying to make x86 like processors that are long dead.

The old STDCALL was infinitely extendable in terms of call depth being purely stack based and the only effective limit was stack memory where you run out of registers real fast with a high argument count but it also limits techniques like recursion unless you supplement the register base with stack memory. In a common algo like a quick sort, you can easily get depths of thousands and while the better designs avoid this, you just cannot write a recursive algo with registers alone.

i have tended to see Win64 as much the same mess as the old Win16, obscure and untidy function calling mechanisms, Win32 was designed by the old VAX guys and it was free of most of the Microsoft YUK but it appears that with Microsoft doing the design of what is another hybrid OS version, ( Win7 and 8 ) they have gone back to badly designed and obscure calling methods.

It will probably not be until you have a full long mode 64 bit OS version that you will have a chance to write clean code like Win32 again, do everything in 64 bit on hardware that has enough memory(Win64 does not have that yet) and you will get faster cleaner code for it. To do this well the OS will need in the future, terrabyte volumes of memory and the hardware is not yet up to this.

Think in terms of 4K video running at 60 frames a second and in the future faster again, 120 or higher and you are going to start to need far more powerful hardware than is currently available. I can run 1080 at 60 frames a second OK on my old quad (monitor does not support higher) and current video looks very good but the future is going to be much faster. By this time current 64 bit Windows will be old junk, something like all the fanfare that surrounded the introduction of the i386 so long ago.