Author Topic: CreateThread overhead  (Read 14874 times)

mikeburr

  • Member
  • **
  • Posts: 189
Re: CreateThread overhead
« Reply #45 on: April 23, 2021, 01:18:33 PM »
xchg rax, rsp                      ; antique junk
this is going to lock the bus ...
regards mikeb

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #46 on: April 23, 2021, 01:51:28 PM »
 :biggrin:

> Is that just your opinion, or can you prove it, maybe with a crispy example of crashing code?

No, I will just use yours. Having to match pushes and pops leaves the code open to alignment errors. Try 3 pushes and pops.

It should look like this, not the macros but the underlying mnemonics.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 entry_point proc

    call tst
    .exit

 entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

PROCALIGN                                                   ; align proc with no stack frame

tst proc

    USING rsi, rdi, r12, r13                                ; list regs to be saved
    LOCAL pbuf  :QWORD                                      ; buffer pointer
    LOCAL buff[128]:BYTE                                    ; buffer

    SaveRegs                                                ; save listed regs

    mov pbuf, ptr$(buff)                                    ; get pointer to buffer

    mov rsi, 1                                              ; write something to 4 regs
    mov rdi, 2
    mov r12, 3
    mov r13, 4

    mcat pbuf, str$(rsi)," ",str$(rdi)," ", \               ; convert and join 4 strings
               str$(r12)," ",str$(r13)

    rcall MessageBox,0,pbuf,"MASM64",MB_ICONINFORMATION     ; call the MessageBox function

    RestoreRegs                                             ; restore listed regs
    ret

tst endp

STACKFRAME                                                  ; restore default stack frame

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end


Disasm.

; IN
 mov qword ptr [rbp+0x80], rsi
 mov qword ptr [rbp+0x88], rdi
 mov qword ptr [rbp+0x90], r12
 mov qword ptr [rbp+0x98], r13

: OUT

 mov rsi, qword ptr [rbp+0x80]
 mov rdi, qword ptr [rbp+0x88]
 mov r12, qword ptr [rbp+0x90]
 mov r13, qword ptr [rbp+0x98]


Look MUM, no stack twiddling.  :tongue:
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #47 on: April 23, 2021, 05:10:33 PM »
xchg rax, rsp                      ; antique junk
this is going to lock the bus ...
regards mikeb

Not correct, Mike - study the docs. The instruction is certainly slow. It will delay this proc, which prints debug output to the console, by a few cycles. Right now I am too lazy to calculate whether that's in the order of  nano- or picoseconds...

:biggrin:

> Is that just your opinion, or can you prove it, maybe with a crispy example of crashing code?

No, I will just use yours. Having to match pushes and pops leaves the code open to alignment errors. Try 3 pushes and pops.

This is a library proc. Contrary to what you seem to believe, I am perfectly able to calculate the number of pushes required to maintain the 16-byte alignment. In this context, and only in this context, pushing the regs is the best way to save them all.

Btw you didn't prove that my code could crash. You didn't because it cannot crash.

xchg rax, rsp                      ; antique junk
 or eax, 0xffffffff
 cdq                                ; more antique junk

Simple C examples and their Assembly output from GCC 4.9.0
Code: [Select]
foo(int, int):
  mov eax, edi
  cdq
  idiv  esi
  ret
Code: [Select]
foo(int, int, int):
  mov eax, edi
  mov ecx, edx
  cdq
  idiv  esi
  cdq
  idiv  ecx
  ret

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #48 on: April 23, 2021, 06:41:07 PM »
 :biggrin:

> Btw you didn't prove that my code could crash. You didn't because it cannot crash.

Unless you only use 3 instead of four pushes or pops. Manual stack twiddling is dangerous unreliable code and you should know that by now.

> Simple C examples and their Assembly output from GCC 4.9.0

Now you are trying to make me laugh, taking your instruction reference from a C compiler and GCC at that.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #49 on: April 23, 2021, 06:54:14 PM »
:biggrin:

> Btw you didn't prove that my code could crash. You didn't because it cannot crash.

Unless you only use 3 instead of four pushes or pops. Manual stack twiddling is dangerous unreliable code and you should know that by now.

I do know that, and I am able to count to three.

cdq                                ; more antique junk

Quote
> Simple C examples and their Assembly output from GCC 4.9.0

Now you are trying to make me laugh, taking your instruction reference from a C compiler and GCC at that.

I am so sorry that I mentioned this crappy Open Sauce compiler - my apologies! Since you are so unhappy that I don't refer to the true and only Microsoft C compiler:

VS2017 compiler emitting 2 division instructions for a division/remainder pair
Code: [Select]
00007FF790061FA0 42 8B 04 1F          mov         eax,dword ptr [rdi+r11] 
00007FF790061FA4 99                   cdq 
00007FF790061FA5 F7 7E 28             idiv        eax,dword ptr [rsi+28h] 
00007FF790061FA8 4C 63 D0             movsxd      r10,eax 
00007FF790061FAB 42 8B 04 1F          mov         eax,dword ptr [rdi+r11] 
00007FF790061FAF 99                   cdq 
00007FF790061FB0 F7 7E 28             idiv        eax,dword ptr [rsi+28h] 

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #50 on: April 23, 2021, 09:04:07 PM »
 :biggrin:

> I am so sorry that I mentioned this crappy Open Sauce compiler

No, you mentioned "crappy Open Sauce compiler". I referred to a C compiler AND GCC at that.

Since when did assembler programmers use a C compiler as their reference for writing assembler ? You may find the Intel manuals a lot more informative.

You can keep avoiding the obvious that you are trying to use an unreliable technique left over from Win32 but in Win64 you need to leave this old junk behind and write modern x64 code, not clapped out unreliable hybrids left over from Win32.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #51 on: April 23, 2021, 09:06:41 PM »
You called cdq "antique junk", and I demonstrated that both GCC and Microsoft Visual C use what you call "antique junk" :cool:

you are trying to use an unreliable technique left over from Win32

There is nothing unreliable about the technique I am using in the jdebP procedure - in the hands of the expert. As stated earlier, this is a library function. Newbies are not allowed to touch it.

Once upon a time, Steve Hutchesson was proud that assembler programmers could use different techniques than the dumb C compilers.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #52 on: April 23, 2021, 09:56:30 PM »
 :biggrin:

I still fail to see why you are preaching the virtues of aping C compilers when the code you are defending is ancient junk.

Don't expect that simply because something is in a C compiler output that its good code. Over time they have produced their fair share of crap code as it gets immotalised in each generation of compiler and rarely ever gets changed.

> Once upon a time, Steve Hutchesson was proud that assembler programmers could use different techniques than the dumb C compilers.

Seems you have not learnt that lesson and want to keep aping the junky end of C compiler output.

There are a couple of things that you need to change, abandon old junk instructions and only use the fast stuff AND stop trying to marry Win32 STDCALL and x64 and only use Win64 FASTCALL where you stop modifying the stack.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #53 on: April 23, 2021, 10:06:42 PM »
Take it easy, Hutch :biggrin:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #54 on: April 23, 2021, 10:48:47 PM »
 :skrewy:
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: CreateThread overhead
« Reply #55 on: April 24, 2021, 12:48:42 AM »
deleted
« Last Edit: February 26, 2022, 04:47:17 AM by nidud »

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #56 on: April 24, 2021, 01:18:39 AM »
You are kidding, nidud. I suggest you read the manual of the push instruction :cool:

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: CreateThread overhead
« Reply #57 on: April 24, 2021, 01:43:28 AM »
deleted
« Last Edit: February 26, 2022, 04:47:29 AM by nidud »

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #58 on: April 24, 2021, 02:13:46 AM »
I stand corrected, Nidud - congrats :thumbsup:

Yes, this stuff is almost 5 years old, and I had forgotten what a daredevil I was in May 2016 :greensml:

So, my advice: don't use the JBasic deb macro in the release version of your programs :cool:

Still, I'd be curious to see what exactly happens when an interrupt takes over, and finds rsp in the .data? section :rolleyes:

Code: [Select]
regsave db 512+reqb dup(?) ; fxsave: only 416 bytes overwritten by CPU

.CODE
jdebP proc
  mov QWORD ptr regsave, rsp
  lea rsp, regsave+reqb    ; <<<<<<<<<<<<<<< put rsp in the .data? section!

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #59 on: April 24, 2021, 09:56:37 AM »
 :biggrin:

> I'd be curious to see what exactly happens when an interrupt takes over

Aha, yet another unreliable technique in the field of contraception.  :skrewy:

What happened to that rock solid UASMBasic ?  :tongue:
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy: