News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

CreateThread overhead

Started by jj2007, January 31, 2021, 12:56:21 PM

Previous topic - Next topic

mikeburr

xchg rax, rsp                      ; antique junk
this is going to lock the bus ...
regards mikeb

hutch--

 :biggrin:

> Is that just your opinion, or can you prove it, maybe with a crispy example of crashing code?

No, I will just use yours. Having to match pushes and pops leaves the code open to alignment errors. Try 3 pushes and pops.

It should look like this, not the macros but the underlying mnemonics.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    call tst
    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

PROCALIGN                                                   ; align proc with no stack frame

tst proc

    USING rsi, rdi, r12, r13                                ; list regs to be saved
    LOCAL pbuf  :QWORD                                      ; buffer pointer
    LOCAL buff[128]:BYTE                                    ; buffer

    SaveRegs                                                ; save listed regs

    mov pbuf, ptr$(buff)                                    ; get pointer to buffer

    mov rsi, 1                                              ; write something to 4 regs
    mov rdi, 2
    mov r12, 3
    mov r13, 4

    mcat pbuf, str$(rsi)," ",str$(rdi)," ", \               ; convert and join 4 strings
               str$(r12)," ",str$(r13)

    rcall MessageBox,0,pbuf,"MASM64",MB_ICONINFORMATION     ; call the MessageBox function

    RestoreRegs                                             ; restore listed regs
    ret

tst endp

STACKFRAME                                                  ; restore default stack frame

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end


Disasm.

; IN
mov qword ptr [rbp+0x80], rsi
mov qword ptr [rbp+0x88], rdi
mov qword ptr [rbp+0x90], r12
mov qword ptr [rbp+0x98], r13

: OUT

mov rsi, qword ptr [rbp+0x80]
mov rdi, qword ptr [rbp+0x88]
mov r12, qword ptr [rbp+0x90]
mov r13, qword ptr [rbp+0x98]


Look MUM, no stack twiddling.  :tongue:

jj2007

Quote from: mikeburr on April 23, 2021, 01:18:33 PM
xchg rax, rsp                      ; antique junk
this is going to lock the bus ...
regards mikeb

Not correct, Mike - study the docs. The instruction is certainly slow. It will delay this proc, which prints debug output to the console, by a few cycles. Right now I am too lazy to calculate whether that's in the order of  nano- or picoseconds...

Quote from: hutch-- on April 23, 2021, 01:51:28 PM
:biggrin:

> Is that just your opinion, or can you prove it, maybe with a crispy example of crashing code?

No, I will just use yours. Having to match pushes and pops leaves the code open to alignment errors. Try 3 pushes and pops.

This is a library proc. Contrary to what you seem to believe, I am perfectly able to calculate the number of pushes required to maintain the 16-byte alignment. In this context, and only in this context, pushing the regs is the best way to save them all.

Btw you didn't prove that my code could crash. You didn't because it cannot crash.

Quote from: hutch-- on April 23, 2021, 08:15:08 AM
xchg rax, rsp                      ; antique junk
or eax, 0xffffffff
cdq                                ; more antique junk

Simple C examples and their Assembly output from GCC 4.9.0
foo(int, int):
  mov eax, edi
  cdq
  idiv  esi
  ret

foo(int, int, int):
  mov eax, edi
  mov ecx, edx
  cdq
  idiv  esi
  cdq
  idiv  ecx
  ret

hutch--

 :biggrin:

> Btw you didn't prove that my code could crash. You didn't because it cannot crash.

Unless you only use 3 instead of four pushes or pops. Manual stack twiddling is dangerous unreliable code and you should know that by now.

> Simple C examples and their Assembly output from GCC 4.9.0

Now you are trying to make me laugh, taking your instruction reference from a C compiler and GCC at that.

jj2007

Quote from: hutch-- on April 23, 2021, 06:41:07 PM
:biggrin:

> Btw you didn't prove that my code could crash. You didn't because it cannot crash.

Unless you only use 3 instead of four pushes or pops. Manual stack twiddling is dangerous unreliable code and you should know that by now.

I do know that, and I am able to count to three.

Quote from: hutch-- on April 23, 2021, 08:15:08 AM
cdq                                ; more antique junk

Quote> Simple C examples and their Assembly output from GCC 4.9.0

Now you are trying to make me laugh, taking your instruction reference from a C compiler and GCC at that.

I am so sorry that I mentioned this crappy Open Sauce compiler - my apologies! Since you are so unhappy that I don't refer to the true and only Microsoft C compiler:

VS2017 compiler emitting 2 division instructions for a division/remainder pair
00007FF790061FA0 42 8B 04 1F          mov         eax,dword ptr [rdi+r11] 
00007FF790061FA4 99                   cdq 
00007FF790061FA5 F7 7E 28             idiv        eax,dword ptr [rsi+28h] 
00007FF790061FA8 4C 63 D0             movsxd      r10,eax 
00007FF790061FAB 42 8B 04 1F          mov         eax,dword ptr [rdi+r11] 
00007FF790061FAF 99                   cdq 
00007FF790061FB0 F7 7E 28             idiv        eax,dword ptr [rsi+28h] 

hutch--

 :biggrin:

> I am so sorry that I mentioned this crappy Open Sauce compiler

No, you mentioned "crappy Open Sauce compiler". I referred to a C compiler AND GCC at that.

Since when did assembler programmers use a C compiler as their reference for writing assembler ? You may find the Intel manuals a lot more informative.

You can keep avoiding the obvious that you are trying to use an unreliable technique left over from Win32 but in Win64 you need to leave this old junk behind and write modern x64 code, not clapped out unreliable hybrids left over from Win32.

jj2007

You called cdq "antique junk", and I demonstrated that both GCC and Microsoft Visual C use what you call "antique junk" :cool:

Quote from: hutch-- on April 23, 2021, 09:04:07 PMyou are trying to use an unreliable technique left over from Win32

There is nothing unreliable about the technique I am using in the jdebP procedure - in the hands of the expert. As stated earlier, this is a library function. Newbies are not allowed to touch it.

Once upon a time, Steve Hutchesson was proud that assembler programmers could use different techniques than the dumb C compilers.

hutch--

 :biggrin:

I still fail to see why you are preaching the virtues of aping C compilers when the code you are defending is ancient junk.

Don't expect that simply because something is in a C compiler output that its good code. Over time they have produced their fair share of crap code as it gets immotalised in each generation of compiler and rarely ever gets changed.

> Once upon a time, Steve Hutchesson was proud that assembler programmers could use different techniques than the dumb C compilers.

Seems you have not learnt that lesson and want to keep aping the junky end of C compiler output.

There are a couple of things that you need to change, abandon old junk instructions and only use the fast stuff AND stop trying to marry Win32 STDCALL and x64 and only use Win64 FASTCALL where you stop modifying the stack.

jj2007


hutch--


nidud

#55
deleted

jj2007

You are kidding, nidud. I suggest you read the manual of the push instruction :cool:

nidud

#57
deleted

jj2007

I stand corrected, Nidud - congrats :thumbsup:

Yes, this stuff is almost 5 years old, and I had forgotten what a daredevil I was in May 2016 :greensml:

So, my advice: don't use the JBasic deb macro in the release version of your programs :cool:

Still, I'd be curious to see what exactly happens when an interrupt takes over, and finds rsp in the .data? section :rolleyes:

regsave db 512+reqb dup(?) ; fxsave: only 416 bytes overwritten by CPU

.CODE
jdebP proc
  mov QWORD ptr regsave, rsp
  lea rsp, regsave+reqb    ; <<<<<<<<<<<<<<< put rsp in the .data? section!

hutch--

 :biggrin:

> I'd be curious to see what exactly happens when an interrupt takes over

Aha, yet another unreliable technique in the field of contraception.  :skrewy:

What happened to that rock solid UASMBasic ?  :tongue: