Author Topic: CreateThread overhead  (Read 2162 times)

jj2007

  • Member
  • *****
  • Posts: 11434
  • Assembler is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #60 on: April 24, 2021, 11:20:38 AM »
What happened to your explorative spirit of assembly programming? Nowadays scared of new techniques, and of the code police waiting for you behind the next corner?

Here is what I found on interrupts and the role of the stack; it's very little but it's sufficiently clear that the exotic technique I used in the JBasic debugging proc (jdebP) will not pose any problems. Btw JBasic (my dual 64-/32-bit assembly framework for MASM, UAsm and AsmC) has almost nothing to do with MasmBasic (32-bit only and really rock solid :biggrin:) - they share some syntax but are otherwise independent of each other.

A bit detailed info about interrupts in Windows
Discussion in 'Operating Systems' started by mbk1969, Jan 4, 2019
Quote
When a hardware exception or interrupt is generated, the processor records enough machine state on the kernel stack of the thread that’s interrupted to return to that point in the control flow and continue execution as if nothing had happened. If the thread was executing in user mode, Windows switches to the thread’s kernel-mode stack. Windows then creates a trap frame on the kernel stack of the interrupted thread into which it stores the execution state of the thread.

Is it valid to write below ESP?
Quote
Hardware interrupts can't use the user stack; that would let user-space crash the kernel with mov esp, 0, or worse take over the kernel by having another thread in the user-space process modify return addresses while an interrupt handler was running. This is why kernels always configure things so interrupt context is pushed onto the kernel stack.
...
So the question becomes: is there anything on Windows that can asynchronously run code using the user-space stack between two arbitrary instructions? (i.e. any equivalent to a Unix signal handler.)

As far as we can tell, SEH (Structured Exception Handling) is the only real obstacle to what you propose for user-space code on current 32 and 64-bit Windows.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8317
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #61 on: April 24, 2021, 11:41:32 AM »
 :biggrin:

> What happened to your explorative spirit of assembly programming? Nowadays scared of new techniques, and the code police behind the next corner?

There are two things here, creative genius at the hands of assembler language programmers and the hardware you are using to run it. To get creative genius off the ground, you have to first make the hardware happy by not feeding it garbage and this is filtered through the Operating System.

Once you get the hardware and the OS happy, THEN and only THEN do you start to develop the "explorative spirit of assembly programming" free of the hardware and OS victimising you for feeding it crap.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11434
  • Assembler is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #62 on: April 24, 2021, 11:45:38 AM »
Perfect. Do you have any evidence that the hardware might be unhappy if you use mov esp, 0 in a user mode procedure? We could write a little test, of course, in case somebody is really interested in the subject.

Raymond Chen's Why do we even need to define a red zone? Can’t I just use my stack for anything? is a good read for the scary among us (with some broken links, he's becoming sloppy). I feel tempted to write a demo but have too many other things on my plate ;-)

P.S.: I couldn't resist, here is the proggie (source and 32-/64-bit executables attached):
Code: [Select]
include \Masm32\MasmBasic\Res\JBasic.inc ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
usedeb=1 ; 1=use the deb macro
.code
DefProc SayHi proc argString:SIZE_P, argDword, argDouble:REAL8 ; #1 is a pointer size argument
Local v1, v2:REAL8, rc:RECT
  deb 4, "debug:", v1, rc.right, argDword, argDouble, $argString        ; this calls the dangerous jdebP!!!
  mov rsp, 12345 ; ***** CODE POLICE, where are you??? *****
  ret
SayHi endp
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  xor ebx, ebx
@@:
jinvoke Sleep, 100
jinvoke SetConsoleCursorPosition, rv(GetStdHandle, STD_OUTPUT_HANDLE), 040000h ; Print At(0, 4)
Print Str$("** ct=%i **\n", rbx)
jinvoke SayHi, Chr$("Argument #1 is a string"), 222222222, FP8(33333.33333)
inc ebx
cmp ebx, 99
jbe @B
  Inkey "The hardware seems happy with mov rsp, 12345"
EndOfCode

Output:
Code: [Select]
This program was assembled with ml64 in 64-bit format.

** ct=99 **
debug:
v1      0
rc.right        0
argDword        222222222
argDouble       33333.333330000
$argString      Argument #1 is a string

The hardware seems happy with mov rsp, 12345

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8317
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #63 on: April 24, 2021, 04:27:40 PM »
There are a couple of the code police that just won't listen to you, the CPU will tell you with absolutely no negotiation that you have tried to use an invalid opcode. Then the OS is even nastier, no matter what you want from it, if you stuff something up it will say horrible things about your code and make you look like a jerk.

Now other mere mortals from time to time try and help others out when it comes to problems of that type and they then become the code police if they had the audacity to try and help other understand that the CPU and OS have boundaries which both will enforce.

When I see you peddling crap coding techniques that have numbers of preconditions attached to them like having to push in pairs to maintain alignment, you are peddling the JJ ABI, not the real one.

For years I had to deal with donkeys going HE HAW HE HAW because they promoted not observing the Win32 Intel ABI because it may have worked on an old Win9x version but when the arse fell out of their code they just disappeared.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11434
  • Assembler is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #64 on: April 24, 2021, 07:18:45 PM »
When you have no proof and no arguments, what's your solution? Insults...

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8317
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #65 on: April 24, 2021, 08:14:55 PM »
 :biggrin:

Stop trying to pull my leg, while you may float the idea of the JJABI, as long as it contains crap code, it is not worth listening to. I know you can do better as I have seen your JJWASMBASIC which was robust and reliable but this dodgy unreliable 64 bit code is not up to scratch yet.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11434
  • Assembler is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #66 on: April 24, 2021, 09:07:57 PM »
while you may float the idea of the JJABI, as long as it contains crap code

I don't float the idea of a JJABI. All my 64-bit code is absolutely ABI-compliant, with the exception of the jdebP proc that makes an unorthodox use of rsp. Since it's a leaf proc and obviously in userland, this usage of rsp is not a problem, as demonstrated with the proggie posted above.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8317
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #67 on: April 24, 2021, 09:25:16 PM »
There is a simple solution to this hyatis you have arrived at, rewrite it so its not dodgy unreliable code. You can defend crap like that as being some form of exception but it needs to be changed and you can leave 32 bit STDCALL forever in 64 bit code.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8317
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: CreateThread overhead
« Reply #68 on: April 25, 2021, 12:14:10 PM »
JJ,

Here is your Christmas present for the year before last or perhaps the year before that. Each rerun of the pair, SaveEmAll and RestoreEmAll will overwrite the previous result.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

  ; --------------------------------------------------

    AllocRegSpace MACRO

      IsLoaded@@@@ equ (1)

      .data?
      .rax dq ?
      .rbx dq ?
      .rcx dq ?
      .rdx dq ?
      .rsi dq ?
      .rdi dq ?
      .r8  dq ?
      .r9  dq ?
      .r10 dq ?
      .r11 dq ?
      .r12 dq ?
      .r13 dq ?
      .r14 dq ?
      .r15 dq ?
      .rbp dq ?
      .rsp dq ?

      .code

    ENDM

  ; --------------------------------------------------

    SaveEmAll MACRO

      IFNDEF IsLoaded@@@@
        AllocRegSpace
      ENDIF

      mov .rax, rax
      mov .rbx, rbx
      mov .rcx, rcx
      mov .rdx, rdx
      mov .rsi, rsi
      mov .rdi, rdi
      mov .r8,  r8
      mov .r9,  r9
      mov .r10, r10
      mov .r11, r11
      mov .r12, r12
      mov .r13, r13
      mov .r14, r14
      mov .r15, r15
      mov .rbp, rbp
      mov .rsp, rsp
    ENDM

  ; --------------------------------------------------

    RestoreEmAll MACRO

      IFNDEF IsLoaded@@@@
        echo Register Space Not Allocated
        exitm
        .err
      ENDIF

      mov rax, .rax
      mov rbx, .rbx
      mov rcx, .rcx
      mov rdx, .rdx
      mov rsi, .rsi
      mov rdi, .rdi
      mov r8,  .r8
      mov r9,  .r9
      mov r10, .r10
      mov r11, .r11
      mov r12, .r12
      mov r13, .r13
      mov r14, .r14
      mov r15, .r15
      mov rbp, .rbp
      mov rsp, .rsp

    ENDM

  ; --------------------------------------------------

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 entry_point proc

    SaveEmAll
    RestoreEmAll

    SaveEmAll

    conout str$(rax),lf
    conout str$(rbx),lf
    conout str$(rcx),lf
    conout str$(rdx),lf
    conout str$(rsi),lf
    conout str$(rdi),lf
    conout str$(r8),lf
    conout str$(r9),lf
    conout str$(r10),lf
    conout str$(r11),lf
    conout str$(r12),lf
    conout str$(r13),lf
    conout str$(r14),lf
    conout str$(r15),lf
    conout str$(rbp),lf
    conout str$(rsp),lf,lf,lf

    mov rax, 1234
    mov rbx, 5678
    mov rcx, 9012
    mov rdx, 3456           ; change register content
    mov rsi, 7890
    mov rdi, 1234
    add rsp, rbp
    rol rdx, 4

    RestoreEmAll

    conout str$(rax),lf
    conout str$(rbx),lf
    conout str$(rcx),lf
    conout str$(rdx),lf
    conout str$(rsi),lf
    conout str$(rdi),lf
    conout str$(r8),lf
    conout str$(r9),lf
    conout str$(r10),lf
    conout str$(r11),lf
    conout str$(r12),lf
    conout str$(r13),lf
    conout str$(r14),lf
    conout str$(r15),lf
    conout str$(rbp),lf
    conout str$(rsp),lf,lf,lf

    waitkey

    .exit

 entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11434
  • Assembler is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #69 on: April 25, 2021, 12:43:40 PM »
Wow, a bit bloated but nonetheless impressive! I love global variables :thumbsup:

Code: [Select]
48:8905 38210000         | mov [14000349C],rax      |
48:891D 39210000         | mov [1400034A4],rbx      |
48:890D 3A210000         | mov [1400034AC],rcx      |
48:8915 3B210000         | mov [1400034B4],rdx      |
48:8935 3C210000         | mov [1400034BC],rsi      |
48:893D 3D210000         | mov [1400034C4],rdi      |
4C:8905 3E210000         | mov [1400034CC],r8       |
4C:890D 3F210000         | mov [1400034D4],r9       |
4C:8915 40210000         | mov [1400034DC],r10      |
4C:891D 41210000         | mov [1400034E4],r11      |
4C:8925 42210000         | mov [1400034EC],r12      |
4C:892D 43210000         | mov [1400034F4],r13      |
4C:8935 44210000         | mov [1400034FC],r14      |
4C:893D 45210000         | mov [140003504],r15      |
48:892D 46210000         | mov [14000350C],rbp      |
48:8925 47210000         | mov [140003514],rsp      |

nidud

  • Member
  • *****
  • Posts: 2143
    • https://github.com/nidud/asmc
Re: CreateThread overhead
« Reply #70 on: April 25, 2021, 09:01:26 PM »
There is however a builtin functionality in the OS to capture the context of the CPU. The most common instruction to do this is int 3.

    int 3 ; capture the context
    foo() ; change the context..
    int 3 ; capture the context

Here's a sample that capture the breakpoints and copy the current context to a double buffer.
Code: [Select]
include windows.inc
include stdio.inc

EH_STACK_INVALID    equ 08h
EH_NONCONTINUABLE   equ 01h
EH_UNWINDING        equ 02h
EH_EXIT_UNWIND      equ 04h
EH_NESTED_CALL      equ 10h

    .data
    pContext PCONTEXT 0
    pCont_id dword 0

    .code

except proc uses rsi rdi ExceptionRecord   : PEXCEPTION_RECORD,
                         EstablisherFrame  : ptr dword,
                         ContextRecord     : PCONTEXT,
                         DispatcherContext : LPDWORD

    mov eax,[rcx].EXCEPTION_RECORD.ExceptionFlags
    .switch
    .case eax & EH_UNWINDING
    .case eax & EH_EXIT_UNWIND
    .case eax & EH_STACK_INVALID
    .case eax & EH_NONCONTINUABLE
        .return ExceptionContinueSearch
    .case eax & EH_NESTED_CALL
        ExitProcess(1)
    .case pContext == NULL
    .case [rcx].EXCEPTION_RECORD.ExceptionCode != STATUS_BREAKPOINT
        .return ExceptionContinueSearch
    .endsw

    imul eax,pCont_id,CONTEXT
    xor pCont_id,1
    mov rdi,pContext
    add rdi,rax
    mov rsi,r8
    mov ecx,CONTEXT
    rep movsb
    inc [r8].CONTEXT._Rip
    mov eax,ExceptionContinueExecution
    ret

except endp

main proc frame:except

  local Context[2]:CONTEXT

    mov pContext,&Context

    int 3
    printf("Registers changed:\n")
    int 3
    for reg,<Rax,Rcx,Rdx,Rbx,Rsp,Rbp,Rsi,Rdi,R8,R9,R10,R11,R12,R13,R14,R15>
      .if Context._&reg& != Context[CONTEXT]._&reg&
          printf(" &reg&:\t%016I64X, %016I64X\n", Context._&reg&, Context[CONTEXT]._&reg& )
      .endif
      endm
    ret

main endp

    end

Registers changed:
 Rax:   000000000014F550, 0000000000000013
 Rcx:   0000000000000001, 000000000014F017
 Rdx:   0000000000525D40, 0000000000000014
 R8:    0000000000525DD0, 000000000014EF68
 R9:    000000005965AB9F, 000000000014F428
 R10:   000000000014F6F0, 0000000000000000
 R11:   0000000000000004, 0000000000000246

jj2007

  • Member
  • *****
  • Posts: 11434
  • Assembler is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #71 on: April 26, 2021, 02:05:23 AM »
That looks nice, nidud :thumbsup:

However, throwing exceptions to save some regs might be a bit of an overkill. For the 64-bit version of the deb macro, it can be done in 72 bytes (48 for the 32-bit version):
Code: [Select]
jRegSave$ equ <#rax#rcx#rdx#rbx#rsp#rbp#rsi#rdi#r8 #r9 #r10#r11#r12#r13#r14#r15#>
jdebP proc export
  mov DefSize ptr regsave, rsp ; save the current stack pointer
  lea rsp, regsave+reqb ; regsave: db 512+18*8 dup(?)
  fxsave [rsp+reqbX] ; take care of xmmregs
  is=(1+@64)*32 ; 32 for 32-bit, 64 for 64-bit code
  WHILE is gt 1
jRS3$ SUBSTR jRegSave$, is-2, 3
push jRS3$ ; push all regs to the .data? section
is=is-4 ; #r15 -> #r14 ... #rax
  ENDM
  mov rax, [rsp-2*DefSize] ; old rsp
  mov rdx, [rax]
  mov [rsp+4*DefSize], rax ; rsp
  add DefSize ptr [rsp+4*DefSize], DefSize ; correct for ret addr
  xchg rax, rsp
  sub rdx, 5 ; size of a call
  mov [rax-DefSize], rdx ; rip
  ret
jdebP endp

The macro posted by Hutch above needs 67% more space for the same job. Plus, it needs that space for every use of deb, while the unorthodox jdepP solution costs 72 bytes once, plus only 6 bytes for every call.

daydreamer

  • Member
  • *****
  • Posts: 1586
  • building nextdoor
Re: CreateThread overhead
« Reply #72 on: April 26, 2021, 08:42:47 PM »
I thought it was only me trying make 1k demo,but now I get Curious Jochen if its possible with 1k demo or 4k demo in 64bit,now when I see your code sizereduction in 64bit??? :greenclp:
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
:P

jj2007

  • Member
  • *****
  • Posts: 11434
  • Assembler is fun ;-)
    • MasmBasic
Re: CreateThread overhead
« Reply #73 on: April 26, 2021, 08:59:31 PM »
I don't practice size reduction, daydreamer, except when I can do it :badgrin: