The MASM Forum

General => The Campus => Topic started by: kcvinu on June 09, 2024, 07:59:47 AM

Title: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 07:59:47 AM
Hi all,
I started my masm64 journey with a dll which creates a simple window. This dll is called from Python with ctypes library. So far so good. But it takes almost 3 times more than my other dll which created in C3 programming language. For those of you not familiar with C3, it's a C like language. The c3 dll is taking 11-15 ms. But teh dll in masm64 is taking 40-45 ms. What should I do to get the optimal speed ?
Here is my asm code.
; Program : A dll to make a window from python.
; Author : kcvinu

include C:\masm64\include64\masm64rt.inc

.data?
hInstance     dq ?
hIcon         dq ?
hCursor       dq ?
hBrush        dq ?
     
    .data
      classname db "KCV_Window",0
      caption db "മാസം വിൻഡൊ", 0 ;  This is my native language MALAYALAM!

.CODE

; This will be exported.
NewForm Proc
mov hInstance, rv(GetModuleHandle,0)
    mov hIcon,     rv(LoadIcon,hInstance,10)
    mov hCursor,   rv(LoadCursor,0,IDC_ARROW)
    mov hBrush,    rv(CreateSolidBrush,00EEEEEEh)
    mov rax, rv(makeWindow)
RET
NewForm endp

LibMain proc instance:DWORD, reason:DWORD, unused:DWORD
    ret
LibMain endp

makeWindow proc
    LOCAL wc      :WNDCLASSEX
   
    mov wc.cbSize,         SIZEOF WNDCLASSEX
    mov wc.style,          CS_BYTEALIGNCLIENT or CS_BYTEALIGNWINDOW
    mov wc.lpfnWndProc,    ptr$(WndProc)
    mov wc.cbClsExtra,     0
    mov wc.cbWndExtra,     0
    mrm wc.hInstance,      hInstance
    mrm wc.hIcon,          hIcon
    mrm wc.hCursor,        hCursor
    mrm wc.hbrBackground,  hBrush
    mov wc.lpszMenuName,   0
    mov wc.lpszClassName,  ptr$(classname)
    mrm wc.hIconSm,        hIcon

    invoke RegisterClassEx, ADDR wc
   
    invoke CreateWindowEx, 0, \
                          ADDR classname, addr caption, \
                          WS_OVERLAPPEDWINDOW or WS_VISIBLE,\
                          100, 100, 500, 400, 0,0,hInstance,0   
    ret
makeWindow endp

; This function also exported
showForm proc handle:HWND
invoke ShowWindow, handle, 5
invoke UpdateWindow, handle
    call msgloop
ret
showForm endp

; Shamelessly copied from Examples section
msgloop proc
    LOCAL msg    :MSG
    LOCAL pmsg   :QWORD
    mov pmsg, ptr$(msg)                     ; get the msg structure address
    jmp gmsg                                ; jump directly to GetMessage()
  mloop:
    invoke TranslateMessage,pmsg
    invoke DispatchMessage,pmsg
  gmsg:
    test rax, rv(GetMessage,pmsg,0,0,0)     ; loop until GetMessage returns zero
    jnz mloop
    ret
msgloop endp

WndProc proc hWin:QWORD,uMsg:QWORD,wParam:QWORD,lParam:QWORD
.switch uMsg
        .case WM_DESTROY
            invoke PostQuitMessage, NULL
.endsw
invoke DefWindowProc, hWin, uMsg, wParam, lParam
ret
WndProc endp
End

This is my command to assemble and link.
ml64 /c /nologo py1.asm && link /ENTRY:LibMain /DLL /DEF:py1.def /OUT:py1.dll py1.obj
Please feel free to ask anything I missed in this post. I didn't include attachments. Please comment if you want my files.
 
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 09, 2024, 08:03:27 AM
Is the C3 DLL 32-bit or 64-bit?
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 08:09:10 AM
Hi NoCforMe,
The C3 dll is 64 bit.
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 09, 2024, 08:28:40 AM
It looks like it should be plenty fast.
The only things I can think of are the calls to LoadIcon(), LoadCursor() and CreateSolidBrush(), all of which will take some time. (The other DLL undoubtedly has to call LoadCursor()).
Does the other DLL load an icon?
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 08:36:40 AM
QuoteDoes the other DLL load an icon?

Yes! And it is doing some more tasks as it is a working library.
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 09, 2024, 08:47:37 AM
Just shooting in the dark here:
You can probably eliminate the calls to ShowWindow() and UpdateWindow(). I've found these to be completely unnecessary, as long as you include the WS_VISIBLE style when you create the window.
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 08:54:41 AM
Let me try that. But there is a question. I tested without using those two functions and it ran without any problem. But it was a single window. There was no controls. What if we create some controls after creating the window handle ? Then, do we need to call ShowWindow & UpdateWindow ?
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 09, 2024, 09:02:51 AM
I don't think so, but very easy to find out:
I would code it without those calls and see if the controls show up. Again, so long as they have the WS_VISIBLE style, they should show up right away.
If not, just put a call to UpdateWindow() in. But that shouldn't be necessary.

I've noticed that a lot of programmers tend to be superstitious about this, and I see needless calls like these sprinkled here and there, like some kind of charm or spell ...
Title: Re: How to optimize execution speed
Post by: zedd151 on June 09, 2024, 10:48:31 AM
One thing caught my eye, kcvinu...
; This will be exported.   
NewForm Proc
    mov hInstance, rv(GetModuleHandle,0)
    mov hIcon,    rv(LoadIcon,hInstance,10)
    mov hCursor,  rv(LoadCursor,0,IDC_ARROW)
    mov hBrush,    rv(CreateSolidBrush,00EEEEEEh)
    mov rax, rv(makeWindow)
    RET
NewForm endp

How often is this called? Are you constantly creating the same brush, or loading the same icon?
Or is that call only made once? I assume only called once, but I have to ask to be sure.

Quote; Shamelessly copied from Examples section
:biggrin:  Many of us have used code from there  at one point or another. Totally legit usage.


Another thing I had noticed are you sure that these (arguments) are all dwords?
LibMain proc instance:DWORD, reason:DWORD, unused:DWORD
    ret
LibMain endp
And no return value?
I often see 'mov rax, 1' or 'mov rax, TRUE' before the return.

From a masm64 sdk example
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD

    .if reason == DLL_PROCESS_ATTACH
      mov rax, TRUE                         ; return TRUE so DLL will start

    .elseif reason == DLL_PROCESS_DETACH

    .elseif reason == DLL_THREAD_ATTACH

    .elseif reason == DLL_THREAD_DETACH

    .endif

    ret

LibMain endp
The .if block can be omitted since we always want the dll to start.
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 11:19:10 AM
@sudoku,
Thanks for the reply.

QuoteHow often is this called?

Only one time. Right before registering the window. And FYI, I just omitted entry point function with /NOENTRY,
Title: Re: How to optimize execution speed
Post by: zedd151 on June 09, 2024, 11:42:16 AM
If you are only calling it once, I don't see how 30-ish milliseconds makes a lot of difference overall. Would be much different if you were making recursive calls to it, (where the extra waiting time is cumulative) in my opinion.

Maybe the coding gurus here have a way to speed up your code, or can offer other suggestions.
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 11:49:43 AM
Okay. I have made some changes as per your suggestions and now ended up with this.
; Program : A dll to make a window from python.
; Author : kcvinu

include C:\masm64\include64\masm64rt.inc

.data?
hInstance     dq ?
hIcon         dq ?
hCursor       dq ?
hBrush        dq ?
     
.data
  classname db "KCV_Window", 0
  caption db "Just a window", 0

.CODE

registerClass proc
LOCAL wc      :WNDCLASSEX
mov hInstance, rv(GetModuleHandle, 0)
    mov hIcon,     rv(LoadIcon,hInstance, 10)
    mov hCursor,   rv(LoadCursor, 0, IDC_ARROW)
    mov hBrush,    rv(CreateSolidBrush,00EEEEEEh)
   
    mov wc.cbSize,         SIZEOF WNDCLASSEX
    mov wc.style,          CS_OWNDC or CS_HREDRAW or CS_VREDRAW or CS_BYTEALIGNCLIENT or CS_BYTEALIGNWINDOW
    mov wc.lpfnWndProc,    ptr$(WndProc)
    mov wc.cbClsExtra,     0
    mov wc.cbWndExtra,     0
    mrm wc.hInstance,      hInstance
    mrm wc.hIcon,          hIcon
    mrm wc.hCursor,        hCursor
    mrm wc.hbrBackground,  hBrush
    mov wc.lpszMenuName,   0
    mov wc.lpszClassName,  ptr$(classname)
    mrm wc.hIconSm,        hIcon
    invoke RegisterClassEx, ADDR wc
ret
registerClass endp

NewForm Proc
invoke registerClass   
invoke CreateWindowEx, 0, \
                          ADDR classname, addr caption, \
                          WS_OVERLAPPEDWINDOW or WS_VISIBLE,\
                          100, 100, 500, 400, 0,0,hInstance,0

ret
NewForm endp

LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
    .if reason == DLL_PROCESS_ATTACH
mov rax, TRUE                         ; return TRUE so DLL will start
.endif
ret
LibMain endp


showForm proc
    LOCAL msg    :MSG
    LOCAL pmsg   :QWORD
    mov pmsg, ptr$(msg)                     ; get the msg structure address
    jmp gmsg                                ; jump directly to GetMessage()
  mloop:
    invoke TranslateMessage,pmsg
    invoke DispatchMessage,pmsg
  gmsg:
    test rax, rv(GetMessage,pmsg,0,0,0)     ; loop until GetMessage returns zero
    jnz mloop
    ret
showForm endp

WndProc proc hWin:QWORD,uMsg:QWORD,wParam:QWORD,lParam:QWORD
.switch uMsg
        .case WM_DESTROY
            invoke PostQuitMessage, NULL
.endsw
invoke DefWindowProc, hWin, uMsg, wParam, lParam
ret
WndProc endp
End

An this is the cmd
ml64 /nologo /c py1.asm && link /ENTRY:LibMain /DLL /DEF:py1.def /nologo /SUBSYSTEM:windows /OUT:py1.dll py1.obj
Now the speed is 30+ ms
Title: Re: How to optimize execution speed
Post by: zedd151 on June 09, 2024, 11:54:29 AM
30+ milliseconds relative to the first version, or 30 ish milliseconds total?

Try only
--------------
mov rax, TRUE
ret
--------------

Without the .if statement... in LibMain
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 09, 2024, 11:57:55 AM
Quote from: kcvinu on June 09, 2024, 11:49:43 AM; Program : A dll to make a window from python.
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
    .if reason == DLL_PROCESS_ATTACH
        mov rax, TRUE                        ; return TRUE so DLL will start
    .endif
    ret
LibMain endp
Just one little thing: what is RAX if reason != DLL_PROCESS_ATTACH?
Answer: undefined.
You might want to set it to zero in that case. (Remember, you're going to return something no matter what, meaning whatever is in RAX.)
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 09, 2024, 11:59:08 AM
Quote from: sudoku on June 09, 2024, 11:54:29 AM30+ milliseconds relative to the first version, or 30 ish milliseconds total?

Try only
--------------
mov rax, TRUE
ret
--------------

Without the .if statement... in LibMain
Oh, come on: you can't be serious.
A simple comparison isn't going to take any time at all.
Title: Re: How to optimize execution speed
Post by: zedd151 on June 09, 2024, 12:06:36 PM
Quote from: NoCforMe on June 09, 2024, 11:59:08 AMOh, come on: you can't be serious.
A simple comparison isn't going to take any time at all.
While I cannot speak for every .dll ever written, all of my qEditor plugins use

mov eax, TRUE ; analogous to 'mov rax, TRUE' here...
ret

With no detrimental effects. Saves a couple bytes though.
So I thought it was worth a try, as he is trying to speed up the .dll loading time.
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 12:18:25 PM
@NoCforMe

QuoteYou might want to set it to zero in that case. (Remember, you're going to return something no matter what, meaning whatever is in RAX.)

Got the point.
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 12:20:40 PM
It's my bed time here. See you all later. Thanks for help. I hope we can find the issue. Goodnight to all.
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 09, 2024, 12:32:51 PM
Quote from: sudoku on June 09, 2024, 12:06:36 PM
Quote from: NoCforMe on June 09, 2024, 11:59:08 AMOh, come on: you can't be serious.
A simple comparison isn't going to take any time at all.
While I cannot speak for every .dll ever written, all of my qEditor plugins use

mov eax, TRUE ; analogous to 'mov rax, TRUE' here...
ret

With no detrimental effects. Saves a couple bytes though.
No, that's fine, the assumption here being that the only "reason" you really care about in LibMain() is DLL_PROCESS_ATTACH, which requires a return value of TRUE in order for the DLL to load. So no problem there.

What I was objecting to was the idea that eliminating the check for the value of "reason" would make any discernible improvement in execution speed. It won't. As an old programming teacher of mine would put it, it's "in the noise". In other words, of no consequence.
Title: Re: How to optimize execution speed
Post by: TimoVJL on June 09, 2024, 06:02:27 PM
DLL_PROCESS_ATTACH gives oppornity to save dll's HINSTANCE and return TRUE
In other occasions return value isn't used.

DllMain entry point (https://learn.microsoft.com/en-us/windows/win32/dlls/dllmain)

Dynamic-Link Library Entry-Point Function (https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-entry-point-function)

If you return FALSE from DLL_PROCESS_ATTACH, will you get a DLL_PROCESS_DETACH? (https://devblogs.microsoft.com/oldnewthing/20080808-00/?p=21313)

For data / recource only dll:
/NOENTRY (No Entry Point) (https://learn.microsoft.com/en-us/cpp/build/reference/noentry-no-entry-point?view=msvc-170)
Title: Re: How to optimize execution speed
Post by: Vortex on June 09, 2024, 07:07:20 PM
Hi kcvinu,

You would like to specify relative paths as the Masm32\64 setup can be on different root partitions :

Instead of this :

include C:\masm64\include64\masm64rt.inc
this one is preferable :

include \masm64\include64\masm64rt.inc
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 11:06:59 PM
@TimoVJL,
Thanks for the links. Let me check that.
QuoteDLL_PROCESS_ATTACH gives oppornity to save dll's HINSTANCE and return TRUE
That's a nice idea, I can avoid the GetModuleHandle call.
Title: Re: How to optimize execution speed
Post by: kcvinu on June 09, 2024, 11:09:12 PM
@Vortex,
Is it ? Thanks, let me try. But one problem. I have installed VS2022. So When I type ml64 in cmd, it starts the ml64.exe from VS's tools directory.
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 10, 2024, 03:38:12 AM
;====================================
; DLL main entry point proc
;====================================

DLLmain PROC hInstDLL:HINSTANCE, reason:DWORD, reserved:DWORD
; Store instance handle where we can get at it:
MOV EAX, hInstDLL
MOV InstanceHandle, EAX
MOV EAX, TRUE
RET

DLLmain ENDP

Since the instance handle is one of the parameters to that function, might as well use it.
Title: Re: How to optimize execution speed
Post by: kcvinu on June 10, 2024, 05:10:48 AM
Oh, but I was used this.
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
    .if reason == DLL_PROCESS_ATTACH
mov hInstance, rcx
mov rax, TRUE                         
.endif
ret
LibMain endp
Title: Re: How to optimize execution speed
Post by: TimoVJL on June 10, 2024, 05:43:58 AM
Quote from: kcvinu on June 10, 2024, 05:10:48 AMOh, but I was used this.
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
    .if reason == DLL_PROCESS_ATTACH
        mov hInstance, rcx
        mov rax, TRUE                       
    .endif
    ret
LibMain endp
normal way to handle it, but mov rax, TRUE can be after .endif

Title: Re: How to optimize execution speed
Post by: NoCforMe on June 10, 2024, 05:49:31 AM
Quote from: kcvinu on June 10, 2024, 05:10:48 AMOh, but I was used this.
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
    .if reason == DLL_PROCESS_ATTACH
        mov hInstance, rcx
        mov rax, TRUE                       
    .endif
    ret
LibMain endp
Yeah, that's probably safer.
Though my code works fine for me, since by the time any "reason" other than DLL_PROCESS_ATTACH is called, the instance handle has been safely stored and isn't used again. But this way is better.

But move the mov rax, TRUE to after the .endif as Timo suggested.
Title: Re: How to optimize execution speed
Post by: Vortex on June 10, 2024, 05:52:38 AM
Hi kcvinu,

You can copy ml64.exe to the \masm64\bin64 folder and use a batch file to build your project :

\masm64\bin64\ml64 /c Source.asm
\masm64\bin64\polink /SUBSYSTEM:WINDOWS /LARGEADDRESSAWARE /ENTRY:start Source.obj
Title: Re: How to optimize execution speed
Post by: kcvinu on June 10, 2024, 06:51:55 AM
@NoCforMe,
QuoteBut move the mov rax, TRUE to after the .endif as Timo suggested.

Okay.

@Vortex
QuoteYou can copy ml64.exe to the \masm64\bin64 folder and use a batch file to build your project :
Yeah, but I am using cmder. We can use aliases in cmder or even make a Task. Tasks are more like batch files. But most of the time, I like to use aliases. Since we can club the two commands with "&&" operator, we can easily make our alias for this. Since commands contains absolute paths, we can run this aliases from any where. We don't need "CD" to go to project dir, or neither need to start the cmd window from project folder. 
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 10, 2024, 07:43:38 AM
Quote from: kcvinu on June 10, 2024, 06:51:55 AMYeah, but I am using cmder.
Does cmder = Commander?
Title: Re: How to optimize execution speed
Post by: kcvinu on June 10, 2024, 08:25:53 AM
@NoCforMe,
QuoteDoes cmder = Commander?
No, It's a wrapper for famous console emulator conEmu. This is the page.
Cmder Home page (https://cmder.app/)
Title: Re: How to optimize execution speed
Post by: NoCforMe on June 10, 2024, 08:30:50 AM
So I take it you come from a *nix background?
Title: Re: How to optimize execution speed
Post by: kcvinu on June 10, 2024, 09:14:06 AM
Quote from: NoCforMe on June 10, 2024, 08:30:50 AMSo I take it you come from a *nix background?

No! I am a Windows user for more than a decade. I had only few days of **nix experience and I don't like it. But I like the way they use a terminal for everything. So I learned about the Windows alternatives for that.