Just for fun, a speed test dynamic (LoadLibrary+GetProcAddress) vs static linking (*.lib).
The guinea pig is GetTickCount itself, simply because it's the only sufficiently short and fast API around.
The test uses
a) ordinary MasmBasic with static linking
b) the dual assembly variant JBasic, which uses a stub that loads all addresses at runtime.
The only real difference is that the static version uses a jump table, while the dynamic version calls GetTickCount directly. The difference is remarkable, though: 546 to 469 ticks.
Results:Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with ML in 32-bit format using MasmBasic
100000000 iterations
562 ticks
546 ticks
530 ticks
562 ticks
546 ticks
577 ticks
546 ticks
546 ticks
546 ticks
546 ticks
Assembled with ML in 32-bit format using JBasic
100000000 iterations
484 ticks
483 ticks
468 ticks
469 ticks
483 ticks
468 ticks
468 ticks
484 ticks
483 ticks
468 ticks
For comparison, the 64-bit version:Assembled with ml64 in 64-bit format using JBasic
100000000 iterations
343 ticks
328 ticks
343 ticks
343 ticks
343 ticks
344 ticks
343 ticks
343 ticks
343 ticks
343 ticks
P.S.: Just in case you feel confirmed that 64-bit code is faster...:
000007FEFCED1120 | 8B 0C 25 04 00 FE 7F | mov ecx, dword ptr ds:[7FFE0004] |
000007FEFCED1127 | 48 8B 04 25 20 03 FE 7F | mov rax, qword ptr ds:[7FFE0320] |
000007FEFCED112F | 48 0F AF C1 | imul rax, rcx |
000007FEFCED1133 | 48 C1 E8 18 | shr rax, 18 |
000007FEFCED1137 | C3 | ret |
GetTickCount /$ /EB 02 jmp short 768B8FD8
768B8FD6 |> |F3: prefix rep:
768B8FD7 |. |90 nop
768B8FD8 |> \8B0D 2403FE7F mov ecx, [7FFE0324]
768B8FDE |. 8B15 2003FE7F mov edx, [7FFE0320]
768B8FE4 |. A1 2803FE7F mov eax, [7FFE0328]
768B8FE9 |. 3BC8 cmp ecx, eax
768B8FEB |.^ 75 E9 jnz short 768B8FD6
768B8FED |. A1 0400FE7F mov eax, [7FFE0004]
768B8FF2 |. F7E2 mul edx
768B8FF4 |. C1E1 08 shl ecx, 8
768B8FF7 |. 0FAF0D 0400FE7F imul ecx, [7FFE0004]
768B8FFE |. 0FACD0 18 shrd eax, edx, 18
768B9002 |. C1EA 18 shr edx, 18
768B9005 |. 03C1 add eax, ecx
768B9007 \. C3 retn
The 32-bit version is much longer, it's a miracle that it doesn't take twice as long :bgrin:
Nice work and thanks! :icon14:
So, the code it's the function GetTickCount from the api in both 32 and 64 bits versions? :redface:
Quote from: felipe on April 10, 2017, 02:04:57 AM
Nice work and thanks! :icon14:
So, the code it's the function GetTickCount from the api in both 32 and 64 bits versions? :redface:
Yes, as you can see from the addresses, e.g. 000007FEFCED1120 - that's 64 bits wide and in the kernel.
That's great, thanks a lot! :biggrin:
Tks for the test, JJ. That´s what i thought. Static linking is faster than using it dynamically.
Results:
Static
AMD Ryzen 5 2400G with Radeon Vega Graphics
Assembled with ML in 32-bit format using MasmBasic
100000000 iterations
390 ticks
375 ticks
406 ticks
375 ticks
375 ticks
360 ticks
375 ticks
375 ticks
390 ticks
391 ticks
Dinamic
Assembled with ML in 32-bit format using JBasic
100000000 iterations
438 ticks
453 ticks
438 ticks
437 ticks
422 ticks
469 ticks
437 ticks
422 ticks
438 ticks
421 ticks
In import section OS create IAT for imported functions.
So use proper import library for dlls and stop wasting program user time with so called clever code with own dynamic dll loading, that is useful, when dll is only used, when they are needed, like delayed linking.
Quote from: TimoVJL on April 03, 2025, 07:53:28 AMSo use proper import library for dlls and stop wasting program user time with so called clever code with own dynamic dll loading...
If done properly, it is transparent to the program user. Why would they care?
And it works. What is wrong with using this method?
qeditor.exe (The editor for masm32 SDK) has been using this method to load user plugins since at least "9 October, 2008" when the qeditor.chm help file (in the masm32 SDK) for qeditor version 4 was written - with no issues.
The names of the user plugins of course were not yet known during the assembly of the program, but supplied to qeditor in "menus.ini" by the user and loaded by qeditor using LoadLibrary and GetProcAddress. :smiley:
This topic is
8 years old btw, nothing new. :wink2:
Also, this test is posted in the Laboratory to test the speed difference, not as a recommendation for every day usage. The first line in the topic states "Just for fun".
You have waited 8 years to share your opinion? :joking: You must have thought that this topic was created just today. :smiley:
Quote from: TimoVJL on April 03, 2025, 07:53:28 AMIn import section OS create IAT for imported functions.
So use proper import library for dlls and stop wasting program user time with so called clever code with own dynamic dll loading, that is useful, when dll is only used, when they are needed, like delayed linking.
Me ??? I rarely use dynamic dlls (with GetProcAddress etc). I tested it now due to the other thread we were talking about here (https://masm32.com/board/index.php?topic=12674.msg137802#msg137802)
I don't think he was addressing you, guga.
You were just submitting your test results - although, a bit late it seems. :biggrin:
Here's mine. :tongue: Better late than never. :biggrin:
Dynamic 32 bit
Assembled with ML in 32-bit format using JBasic
100000000 iterations
265 ticks
266 ticks
266 ticks
265 ticks
266 ticks
266 ticks
265 ticks
266 ticks
265 ticks
266 ticks
Static 32 bit
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Assembled with ML in 32-bit format using MasmBasic
100000000 iterations
282 ticks
265 ticks
266 ticks
281 ticks
281 ticks
266 ticks
266 ticks
281 ticks
281 ticks
266 ticks
Dynamic 64 bit
Assembled with HJWasm32 in 64-bit format using JBasic
100000000 iterations
125 ticks
125 ticks
125 ticks
110 ticks
125 ticks
125 ticks
109 ticks
125 ticks
156 ticks
125 ticks
Both 32 bit versions about the same. 64 bit slightly more than twice as fast for me. :biggrin:
I could not run the three programs concurrently using a batch file, btw jj. I had to run them seperately. They did not return to run the next one.
Quote from: guga on April 03, 2025, 01:38:46 AMThat´s what i thought. Static linking is faster than using it dynamically
My tests said the opposite :cool:
In real life, results should be identical because you end up using identical kernel etc functions.
Quote from: jj2007 on April 04, 2025, 08:33:13 AMQuote from: guga on April 03, 2025, 01:38:46 AMThat´s what i thought. Static linking is faster than using it dynamically
My tests said the opposite :cool:
In real life, results should be identical because you end up using identical kernel etc functions.
Indeed, your tests are different than mine. I wonder why it happens. Perhaps AMD handles those a bit different ?
Test code in C#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <stdio.h>
int __cdecl main(void)
{
int i;
DWORD dw1, dw2;
dw1 = GetTickCount();
for (i=0; i< 100000000; i++)
dw2 = GetTickCount();
printf("%d ticks static\n", dw2-dw1);
HMODULE hMod = LoadLibrary("kernel32.dll");
FARPROC pGetTickCount = GetProcAddress(hMod, "GetTickCount");
dw1 = GetTickCount();
for (i=0; i< 100000000; i++)
dw2 = pGetTickCount();
printf("%d ticks dynamic\n", dw2-dw1);
return 0;
}
32-bit640 ticks static
562 ticks dynamic
64-bit499 ticks static
422 ticks dynamic
516 ticks static
531 ticks dynamic
532 ticks static
531 ticks dynamic
547 ticks static
531 ticks dynamic
---- more iterations: ----
5312 ticks static
5422 ticks dynamic
5328 ticks static
5344 ticks dynamic
5438 ticks static
5312 ticks dynamic
5328 ticks static
5312 ticks dynamic
Quote from: TimoVJL on April 04, 2025, 07:28:26 PMTest code in C
32 bit:
480 ticks static
492 ticks dynamic
64 bit:
322 ticks static
317 ticks dynamic
Quote from: TimoVJL on April 03, 2025, 07:53:28 AMSo use proper import library for dlls and stop wasting program user time with so called clever code with own dynamic dll loading
irony? :biggrin:
Quote from: zedd151 on April 04, 2025, 10:41:43 PMirony? :biggrin:
but i at least really test things ?
Earlier i was thinking about those additional time to get function addresses from dll, LoadLibrary / GetProcAddress.
Those are useful many times, like to avoid dll presence at runtime.
:smiley: No FreeLibrary call?
That should be considered too, if calculating the extra time spent using this method, imo.
No need, as if someone know how things goes.
FreeLibrary() is only needed, if want to free some resources, if that a dll doesn't used anymore.
Quote from: TimoVJL on Today at 12:17:00 AMNo need, as if someone know how things goes.
FreeLibrary() is only needed, if want to free some resources, if that a dll doesn't used anymore.
I thought it was worth a mention, in any case. :smiley:
Quote from: TimoVJL on April 04, 2025, 11:11:02 PMadditional time to get function addresses from dll, LoadLibrary / GetProcAddress
I timed that for my Windows GUI template: about 0.6 milliseconds, once at program start :cool:
Re FreeLibrary: only necessary if you need to free resources but keep the program running - a rather exotic requirement. A simple ExitProcess takes care of libraries.
Quote from: jj2007 on Today at 02:58:40 AMRe FreeLibrary: only necessary if you need to free resources but keep the program running - a rather exotic requirement. A simple ExitProcess takes care of libraries.
Oh. That, I did not know. I thought that any libraries opened with LoadLibrary needed to be freed with FreeLibrary explicitly, before exiting. :toothy: It was only in special cases that I had ever used it. I.e., temporarily loading a plugin for one of my editors for example.
Raymond Chen on ExitProcess (https://masm32.com/board/index.php?topic=6053.0)