News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Dynamic vs static linking

Started by jj2007, April 08, 2017, 06:56:56 PM

Previous topic - Next topic

jj2007

Just for fun, a speed test dynamic (LoadLibrary+GetProcAddress) vs static linking (*.lib).

The guinea pig is GetTickCount itself, simply because it's the only sufficiently short and fast API around.

The test uses
a) ordinary MasmBasic with static linking
b) the dual assembly variant JBasic, which uses a stub that loads all addresses at runtime.

The only real difference is that the static version uses a jump table, while the dynamic version calls GetTickCount directly. The difference is remarkable, though: 546 to 469 ticks.

Results:Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with ML in 32-bit format using MasmBasic
100000000 iterations
562 ticks
546 ticks
530 ticks
562 ticks
546 ticks
577 ticks
546 ticks
546 ticks
546 ticks
546 ticks


Assembled with ML in 32-bit format using JBasic
100000000 iterations
484 ticks
483 ticks
468 ticks
469 ticks
483 ticks
468 ticks
468 ticks
484 ticks
483 ticks
468 ticks


For comparison, the 64-bit version:Assembled with ml64 in 64-bit format using JBasic
100000000 iterations
343 ticks
328 ticks
343 ticks
343 ticks
343 ticks
344 ticks
343 ticks
343 ticks
343 ticks
343 ticks


P.S.: Just in case you feel confirmed that 64-bit code is faster...:
000007FEFCED1120 | 8B 0C 25 04 00 FE 7F     | mov ecx, dword ptr ds:[7FFE0004]   |
000007FEFCED1127 | 48 8B 04 25 20 03 FE 7F  | mov rax, qword ptr ds:[7FFE0320]   |
000007FEFCED112F | 48 0F AF C1              | imul rax, rcx                      |
000007FEFCED1133 | 48 C1 E8 18              | shr rax, 18                        |
000007FEFCED1137 | C3                       | ret                                |


GetTickCount              /$ /EB 02                      jmp short 768B8FD8
768B8FD6                  |> |F3:                        prefix rep:
768B8FD7                  |. |90                         nop
768B8FD8                  |> \8B0D 2403FE7F              mov ecx, [7FFE0324]
768B8FDE                  |.  8B15 2003FE7F              mov edx, [7FFE0320]
768B8FE4                  |.  A1 2803FE7F                mov eax, [7FFE0328]
768B8FE9                  |.  3BC8                       cmp ecx, eax
768B8FEB                  |.^ 75 E9                      jnz short 768B8FD6
768B8FED                  |.  A1 0400FE7F                mov eax, [7FFE0004]
768B8FF2                  |.  F7E2                       mul edx
768B8FF4                  |.  C1E1 08                    shl ecx, 8
768B8FF7                  |.  0FAF0D 0400FE7F            imul ecx, [7FFE0004]
768B8FFE                  |.  0FACD0 18                  shrd eax, edx, 18
768B9002                  |.  C1EA 18                    shr edx, 18
768B9005                  |.  03C1                       add eax, ecx
768B9007                  \.  C3                         retn


The 32-bit version is much longer, it's a miracle that it doesn't take twice as long :bgrin:

felipe

Nice work and thanks!  :icon14:
So, the code it's the function GetTickCount from the api in both 32 and 64 bits versions?  :redface:

jj2007

Quote from: felipe on April 10, 2017, 02:04:57 AM
Nice work and thanks!  :icon14:
So, the code it's the function GetTickCount from the api in both 32 and 64 bits versions?  :redface:

Yes, as you can see from the addresses, e.g. 000007FEFCED1120 - that's 64 bits wide and in the kernel.

felipe


guga

Tks for the test, JJ. That´s what i thought. Static linking is faster than using it dynamically.

Results:
Static
AMD Ryzen 5 2400G with Radeon Vega Graphics
Assembled with ML in 32-bit format using MasmBasic
100000000 iterations
390 ticks
375 ticks
406 ticks
375 ticks
375 ticks
360 ticks
375 ticks
375 ticks
390 ticks
391 ticks

Dinamic
Assembled with ML in 32-bit format using JBasic
100000000 iterations
438 ticks
453 ticks
438 ticks
437 ticks
422 ticks
469 ticks
437 ticks
422 ticks
438 ticks
421 ticks
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TimoVJL

In import section OS create IAT for imported functions.
So use proper import library for dlls and stop wasting program user time with so called clever code with own dynamic dll loading, that is useful, when dll is only used, when they are needed, like delayed linking.
May the source be with you

zedd151

#6
Quote from: TimoVJL on April 03, 2025, 07:53:28 AMSo use proper import library for dlls and stop wasting program user time with so called clever code with own dynamic dll loading...

If done properly, it is transparent to the program user. Why would they care?
And it works. What is wrong with using this method?

qeditor.exe (The editor for masm32 SDK) has been using this method to load user plugins since at least "9 October, 2008" when the qeditor.chm help file (in the masm32 SDK) for qeditor version 4 was written - with no issues.
The names of the user plugins of course were not yet known during the assembly of the program, but supplied to qeditor in "menus.ini" by the user and loaded by qeditor using LoadLibrary and GetProcAddress.  :smiley:

This topic is 8 years old btw, nothing new.  :wink2:
Also, this test is posted in the Laboratory to test the speed difference, not as a recommendation for every day usage. The first line in the topic states "Just for fun".

You have waited 8 years to share your opinion?  :joking:  You must have thought that this topic was created just today.  :smiley:
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

guga

Quote from: TimoVJL on April 03, 2025, 07:53:28 AMIn import section OS create IAT for imported functions.
So use proper import library for dlls and stop wasting program user time with so called clever code with own dynamic dll loading, that is useful, when dll is only used, when they are needed, like delayed linking.


Me ??? I rarely use dynamic dlls (with GetProcAddress etc). I tested it now due to the other thread we were talking about here
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

#8
I don't think he was addressing you, guga.

You were just submitting your test results - although, a bit late it seems.  :biggrin:

Here's mine.  :tongue:  Better late than never.  :biggrin:
Dynamic 32 bit
Assembled with ML in 32-bit format using JBasic
100000000 iterations
265 ticks
266 ticks
266 ticks
265 ticks
266 ticks
266 ticks
265 ticks
266 ticks
265 ticks
266 ticks

Static 32 bit
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Assembled with ML in 32-bit format using MasmBasic
100000000 iterations
282 ticks
265 ticks
266 ticks
281 ticks
281 ticks
266 ticks
266 ticks
281 ticks
281 ticks
266 ticks

Dynamic 64 bit
Assembled with HJWasm32 in 64-bit format using JBasic
100000000 iterations
125 ticks
125 ticks
125 ticks
110 ticks
125 ticks
125 ticks
109 ticks
125 ticks
156 ticks
125 ticks

Both 32 bit versions about the same. 64 bit slightly more than twice as fast for me.  :biggrin:
I could not run the three programs concurrently using a batch file, btw jj. I had to run them seperately. They did not return to run the next one.
¯\_(ツ)_/¯   :azn:

'As we don't do "requests", show us your code first.'  -  hutch—

jj2007

Quote from: guga on April 03, 2025, 01:38:46 AMThat´s what i thought. Static linking is faster than using it dynamically

My tests said the opposite :cool:

In real life, results should be identical because you end up using identical kernel etc functions.