Hi all,
I started my masm64 journey with a dll which creates a simple window. This dll is called from Python with ctypes library. So far so good. But it takes almost 3 times more than my other dll which created in C3 programming language. For those of you not familiar with C3, it's a C like language. The c3 dll is taking 11-15 ms. But teh dll in masm64 is taking 40-45 ms. What should I do to get the optimal speed ?
Here is my asm code.
; Program : A dll to make a window from python.
; Author : kcvinu
include C:\masm64\include64\masm64rt.inc
.data?
hInstance dq ?
hIcon dq ?
hCursor dq ?
hBrush dq ?
.data
classname db "KCV_Window",0
caption db "മാസം വിൻഡൊ", 0 ; This is my native language MALAYALAM!
.CODE
; This will be exported.
NewForm Proc
mov hInstance, rv(GetModuleHandle,0)
mov hIcon, rv(LoadIcon,hInstance,10)
mov hCursor, rv(LoadCursor,0,IDC_ARROW)
mov hBrush, rv(CreateSolidBrush,00EEEEEEh)
mov rax, rv(makeWindow)
RET
NewForm endp
LibMain proc instance:DWORD, reason:DWORD, unused:DWORD
ret
LibMain endp
makeWindow proc
LOCAL wc :WNDCLASSEX
mov wc.cbSize, SIZEOF WNDCLASSEX
mov wc.style, CS_BYTEALIGNCLIENT or CS_BYTEALIGNWINDOW
mov wc.lpfnWndProc, ptr$(WndProc)
mov wc.cbClsExtra, 0
mov wc.cbWndExtra, 0
mrm wc.hInstance, hInstance
mrm wc.hIcon, hIcon
mrm wc.hCursor, hCursor
mrm wc.hbrBackground, hBrush
mov wc.lpszMenuName, 0
mov wc.lpszClassName, ptr$(classname)
mrm wc.hIconSm, hIcon
invoke RegisterClassEx, ADDR wc
invoke CreateWindowEx, 0, \
ADDR classname, addr caption, \
WS_OVERLAPPEDWINDOW or WS_VISIBLE,\
100, 100, 500, 400, 0,0,hInstance,0
ret
makeWindow endp
; This function also exported
showForm proc handle:HWND
invoke ShowWindow, handle, 5
invoke UpdateWindow, handle
call msgloop
ret
showForm endp
; Shamelessly copied from Examples section
msgloop proc
LOCAL msg :MSG
LOCAL pmsg :QWORD
mov pmsg, ptr$(msg) ; get the msg structure address
jmp gmsg ; jump directly to GetMessage()
mloop:
invoke TranslateMessage,pmsg
invoke DispatchMessage,pmsg
gmsg:
test rax, rv(GetMessage,pmsg,0,0,0) ; loop until GetMessage returns zero
jnz mloop
ret
msgloop endp
WndProc proc hWin:QWORD,uMsg:QWORD,wParam:QWORD,lParam:QWORD
.switch uMsg
.case WM_DESTROY
invoke PostQuitMessage, NULL
.endsw
invoke DefWindowProc, hWin, uMsg, wParam, lParam
ret
WndProc endp
End
This is my command to assemble and link.
ml64 /c /nologo py1.asm && link /ENTRY:LibMain /DLL /DEF:py1.def /OUT:py1.dll py1.obj
Please feel free to ask anything I missed in this post. I didn't include attachments. Please comment if you want my files.
Is the C3 DLL 32-bit or 64-bit?
Hi NoCforMe,
The C3 dll is 64 bit.
It looks like it should be plenty fast.
The only things I can think of are the calls to LoadIcon(), LoadCursor() and CreateSolidBrush(), all of which will take some time. (The other DLL undoubtedly has to call LoadCursor()).
Does the other DLL load an icon?
QuoteDoes the other DLL load an icon?
Yes! And it is doing some more tasks as it is a working library.
Just shooting in the dark here:
You can probably eliminate the calls to ShowWindow() and UpdateWindow(). I've found these to be completely unnecessary, as long as you include the WS_VISIBLE style when you create the window.
Let me try that. But there is a question. I tested without using those two functions and it ran without any problem. But it was a single window. There was no controls. What if we create some controls after creating the window handle ? Then, do we need to call ShowWindow & UpdateWindow ?
I don't think so, but very easy to find out:
I would code it without those calls and see if the controls show up. Again, so long as they have the WS_VISIBLE style, they should show up right away.
If not, just put a call to UpdateWindow() in. But that shouldn't be necessary.
I've noticed that a lot of programmers tend to be superstitious about this, and I see needless calls like these sprinkled here and there, like some kind of charm or spell ...
One thing caught my eye, kcvinu...
; This will be exported.
NewForm Proc
mov hInstance, rv(GetModuleHandle,0)
mov hIcon, rv(LoadIcon,hInstance,10)
mov hCursor, rv(LoadCursor,0,IDC_ARROW)
mov hBrush, rv(CreateSolidBrush,00EEEEEEh)
mov rax, rv(makeWindow)
RET
NewForm endp
How often is this called? Are you constantly creating the same brush, or loading the same icon?
Or is that call only made once? I assume only called once, but I have to ask to be sure.
Quote; Shamelessly copied from Examples section
:biggrin: Many of us have used code from there at one point or another. Totally legit usage.
Another thing I had noticed are you sure that these (arguments) are all dwords?
LibMain proc instance:DWORD, reason:DWORD, unused:DWORD
ret
LibMain endp
And no return value?
I often see 'mov rax, 1' or 'mov rax, TRUE' before the return.
From a masm64 sdk example
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
.if reason == DLL_PROCESS_ATTACH
mov rax, TRUE ; return TRUE so DLL will start
.elseif reason == DLL_PROCESS_DETACH
.elseif reason == DLL_THREAD_ATTACH
.elseif reason == DLL_THREAD_DETACH
.endif
ret
LibMain endp
The .if block can be omitted since we always want the dll to start.
@sudoku,
Thanks for the reply.
QuoteHow often is this called?
Only one time. Right before registering the window. And FYI, I just omitted entry point function with /NOENTRY,
If you are only calling it once, I don't see how 30-ish milliseconds makes a lot of difference overall. Would be much different if you were making recursive calls to it, (where the extra waiting time is cumulative) in my opinion.
Maybe the coding gurus here have a way to speed up your code, or can offer other suggestions.
Okay. I have made some changes as per your suggestions and now ended up with this.
; Program : A dll to make a window from python.
; Author : kcvinu
include C:\masm64\include64\masm64rt.inc
.data?
hInstance dq ?
hIcon dq ?
hCursor dq ?
hBrush dq ?
.data
classname db "KCV_Window", 0
caption db "Just a window", 0
.CODE
registerClass proc
LOCAL wc :WNDCLASSEX
mov hInstance, rv(GetModuleHandle, 0)
mov hIcon, rv(LoadIcon,hInstance, 10)
mov hCursor, rv(LoadCursor, 0, IDC_ARROW)
mov hBrush, rv(CreateSolidBrush,00EEEEEEh)
mov wc.cbSize, SIZEOF WNDCLASSEX
mov wc.style, CS_OWNDC or CS_HREDRAW or CS_VREDRAW or CS_BYTEALIGNCLIENT or CS_BYTEALIGNWINDOW
mov wc.lpfnWndProc, ptr$(WndProc)
mov wc.cbClsExtra, 0
mov wc.cbWndExtra, 0
mrm wc.hInstance, hInstance
mrm wc.hIcon, hIcon
mrm wc.hCursor, hCursor
mrm wc.hbrBackground, hBrush
mov wc.lpszMenuName, 0
mov wc.lpszClassName, ptr$(classname)
mrm wc.hIconSm, hIcon
invoke RegisterClassEx, ADDR wc
ret
registerClass endp
NewForm Proc
invoke registerClass
invoke CreateWindowEx, 0, \
ADDR classname, addr caption, \
WS_OVERLAPPEDWINDOW or WS_VISIBLE,\
100, 100, 500, 400, 0,0,hInstance,0
ret
NewForm endp
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
.if reason == DLL_PROCESS_ATTACH
mov rax, TRUE ; return TRUE so DLL will start
.endif
ret
LibMain endp
showForm proc
LOCAL msg :MSG
LOCAL pmsg :QWORD
mov pmsg, ptr$(msg) ; get the msg structure address
jmp gmsg ; jump directly to GetMessage()
mloop:
invoke TranslateMessage,pmsg
invoke DispatchMessage,pmsg
gmsg:
test rax, rv(GetMessage,pmsg,0,0,0) ; loop until GetMessage returns zero
jnz mloop
ret
showForm endp
WndProc proc hWin:QWORD,uMsg:QWORD,wParam:QWORD,lParam:QWORD
.switch uMsg
.case WM_DESTROY
invoke PostQuitMessage, NULL
.endsw
invoke DefWindowProc, hWin, uMsg, wParam, lParam
ret
WndProc endp
End
An this is the cmd
ml64 /nologo /c py1.asm && link /ENTRY:LibMain /DLL /DEF:py1.def /nologo /SUBSYSTEM:windows /OUT:py1.dll py1.obj
Now the speed is 30+ ms
30+ milliseconds relative to the first version, or 30 ish milliseconds total?
Try only
--------------
mov rax, TRUE
ret
--------------
Without the .if statement... in LibMain
Quote from: kcvinu on June 09, 2024, 11:49:43 AM; Program : A dll to make a window from python.
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
.if reason == DLL_PROCESS_ATTACH
mov rax, TRUE ; return TRUE so DLL will start
.endif
ret
LibMain endp
Just one little thing: what is RAX if
reason != DLL_PROCESS_ATTACH?
Answer:
undefined.
You might want to set it to zero in that case. (Remember, you're going to return
something no matter what, meaning whatever is in RAX.)
Quote from: sudoku on June 09, 2024, 11:54:29 AM30+ milliseconds relative to the first version, or 30 ish milliseconds total?
Try only
--------------
mov rax, TRUE
ret
--------------
Without the .if statement... in LibMain
Oh, come on: you can't be serious.
A simple comparison isn't going to take any time at all.
Quote from: NoCforMe on June 09, 2024, 11:59:08 AMOh, come on: you can't be serious.
A simple comparison isn't going to take any time at all.
While I cannot speak for every .dll ever written, all of my qEditor plugins use
mov eax, TRUE ; analogous to 'mov rax, TRUE' here...
ret
With no detrimental effects. Saves a couple bytes though.
So I thought it was worth a try, as he is trying to speed up the .dll loading time.
@NoCforMe
QuoteYou might want to set it to zero in that case. (Remember, you're going to return something no matter what, meaning whatever is in RAX.)
Got the point.
It's my bed time here. See you all later. Thanks for help. I hope we can find the issue. Goodnight to all.
Quote from: sudoku on June 09, 2024, 12:06:36 PMQuote from: NoCforMe on June 09, 2024, 11:59:08 AMOh, come on: you can't be serious.
A simple comparison isn't going to take any time at all.
While I cannot speak for every .dll ever written, all of my qEditor plugins use
mov eax, TRUE ; analogous to 'mov rax, TRUE' here...
ret
With no detrimental effects. Saves a couple bytes though.
No, that's fine, the assumption here being that the only "reason" you really care about in
LibMain() is
DLL_PROCESS_ATTACH, which requires a return value of
TRUE in order for the DLL to load. So no problem there.
What I was objecting to was the idea that eliminating the check for the value of "reason" would make any discernible improvement in execution speed. It won't. As an old programming teacher of mine would put it, it's "in the noise". In other words, of no consequence.
DLL_PROCESS_ATTACH gives oppornity to save dll's HINSTANCE and return TRUE
In other occasions return value isn't used.
DllMain entry point (https://learn.microsoft.com/en-us/windows/win32/dlls/dllmain)
Dynamic-Link Library Entry-Point Function (https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-entry-point-function)
If you return FALSE from DLL_PROCESS_ATTACH, will you get a DLL_PROCESS_DETACH? (https://devblogs.microsoft.com/oldnewthing/20080808-00/?p=21313)
For data / recource only dll:
/NOENTRY (No Entry Point) (https://learn.microsoft.com/en-us/cpp/build/reference/noentry-no-entry-point?view=msvc-170)
Hi kcvinu,
You would like to specify relative paths as the Masm32\64 setup can be on different root partitions :
Instead of this :
include C:\masm64\include64\masm64rt.inc
this one is preferable :
include \masm64\include64\masm64rt.inc
@TimoVJL,
Thanks for the links. Let me check that.
QuoteDLL_PROCESS_ATTACH gives oppornity to save dll's HINSTANCE and return TRUE
That's a nice idea, I can avoid the GetModuleHandle call.
@Vortex,
Is it ? Thanks, let me try. But one problem. I have installed VS2022. So When I type ml64 in cmd, it starts the ml64.exe from VS's tools directory.
;====================================
; DLL main entry point proc
;====================================
DLLmain PROC hInstDLL:HINSTANCE, reason:DWORD, reserved:DWORD
; Store instance handle where we can get at it:
MOV EAX, hInstDLL
MOV InstanceHandle, EAX
MOV EAX, TRUE
RET
DLLmain ENDP
Since the instance handle is one of the parameters to that function, might as well use it.
Oh, but I was used this.
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
.if reason == DLL_PROCESS_ATTACH
mov hInstance, rcx
mov rax, TRUE
.endif
ret
LibMain endp
Quote from: kcvinu on June 10, 2024, 05:10:48 AMOh, but I was used this.
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
.if reason == DLL_PROCESS_ATTACH
mov hInstance, rcx
mov rax, TRUE
.endif
ret
LibMain endp
normal way to handle it, but mov rax, TRUE can be after .endif
Quote from: kcvinu on June 10, 2024, 05:10:48 AMOh, but I was used this.
LibMain proc instance:QWORD,reason:QWORD,unused:QWORD
.if reason == DLL_PROCESS_ATTACH
mov hInstance, rcx
mov rax, TRUE
.endif
ret
LibMain endp
Yeah, that's probably safer.
Though my code works fine for me, since by the time any "reason" other than
DLL_PROCESS_ATTACH is called, the instance handle has been safely stored and isn't used again. But this way is better.
But move the
mov rax, TRUE to after the
.endif as Timo suggested.
Hi kcvinu,
You can copy ml64.exe to the \masm64\bin64 folder and use a batch file to build your project :
\masm64\bin64\ml64 /c Source.asm
\masm64\bin64\polink /SUBSYSTEM:WINDOWS /LARGEADDRESSAWARE /ENTRY:start Source.obj
@NoCforMe,
QuoteBut move the mov rax, TRUE to after the .endif as Timo suggested.
Okay.
@Vortex
QuoteYou can copy ml64.exe to the \masm64\bin64 folder and use a batch file to build your project :
Yeah, but I am using cmder. We can use aliases in cmder or even make a Task. Tasks are more like batch files. But most of the time, I like to use aliases. Since we can club the two commands with "&&" operator, we can easily make our alias for this. Since commands contains absolute paths, we can run this aliases from any where. We don't need "CD" to go to project dir, or neither need to start the cmd window from project folder.
Quote from: kcvinu on June 10, 2024, 06:51:55 AMYeah, but I am using cmder.
Does cmder = Commander?
@NoCforMe,
QuoteDoes cmder = Commander?
No, It's a wrapper for famous console emulator conEmu. This is the page.
Cmder Home page (https://cmder.app/)
So I take it you come from a *nix background?
Quote from: NoCforMe on June 10, 2024, 08:30:50 AMSo I take it you come from a *nix background?
No! I am a Windows user for more than a decade. I had only few days of **nix experience and I don't like it. But I like the way they use a terminal for everything. So I learned about the Windows alternatives for that.