News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Dynamic vs static linking

Started by jj2007, April 08, 2017, 06:56:56 PM

Previous topic - Next topic

jj2007

Just for fun, a speed test dynamic (LoadLibrary+GetProcAddress) vs static linking (*.lib).

The guinea pig is GetTickCount itself, simply because it's the only sufficiently short and fast API around.

The test uses
a) ordinary MasmBasic with static linking
b) the dual assembly variant JBasic, which uses a stub that loads all addresses at runtime.

The only real difference is that the static version uses a jump table, while the dynamic version calls GetTickCount directly. The difference is remarkable, though: 546 to 469 ticks.

Results:Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with ML in 32-bit format using MasmBasic
100000000 iterations
562 ticks
546 ticks
530 ticks
562 ticks
546 ticks
577 ticks
546 ticks
546 ticks
546 ticks
546 ticks


Assembled with ML in 32-bit format using JBasic
100000000 iterations
484 ticks
483 ticks
468 ticks
469 ticks
483 ticks
468 ticks
468 ticks
484 ticks
483 ticks
468 ticks


For comparison, the 64-bit version:Assembled with ml64 in 64-bit format using JBasic
100000000 iterations
343 ticks
328 ticks
343 ticks
343 ticks
343 ticks
344 ticks
343 ticks
343 ticks
343 ticks
343 ticks


P.S.: Just in case you feel confirmed that 64-bit code is faster...:
000007FEFCED1120 | 8B 0C 25 04 00 FE 7F     | mov ecx, dword ptr ds:[7FFE0004]   |
000007FEFCED1127 | 48 8B 04 25 20 03 FE 7F  | mov rax, qword ptr ds:[7FFE0320]   |
000007FEFCED112F | 48 0F AF C1              | imul rax, rcx                      |
000007FEFCED1133 | 48 C1 E8 18              | shr rax, 18                        |
000007FEFCED1137 | C3                       | ret                                |


GetTickCount              /$ /EB 02                      jmp short 768B8FD8
768B8FD6                  |> |F3:                        prefix rep:
768B8FD7                  |. |90                         nop
768B8FD8                  |> \8B0D 2403FE7F              mov ecx, [7FFE0324]
768B8FDE                  |.  8B15 2003FE7F              mov edx, [7FFE0320]
768B8FE4                  |.  A1 2803FE7F                mov eax, [7FFE0328]
768B8FE9                  |.  3BC8                       cmp ecx, eax
768B8FEB                  |.^ 75 E9                      jnz short 768B8FD6
768B8FED                  |.  A1 0400FE7F                mov eax, [7FFE0004]
768B8FF2                  |.  F7E2                       mul edx
768B8FF4                  |.  C1E1 08                    shl ecx, 8
768B8FF7                  |.  0FAF0D 0400FE7F            imul ecx, [7FFE0004]
768B8FFE                  |.  0FACD0 18                  shrd eax, edx, 18
768B9002                  |.  C1EA 18                    shr edx, 18
768B9005                  |.  03C1                       add eax, ecx
768B9007                  \.  C3                         retn


The 32-bit version is much longer, it's a miracle that it doesn't take twice as long :bgrin:

felipe

Nice work and thanks!  :icon14:
So, the code it's the function GetTickCount from the api in both 32 and 64 bits versions?  :redface:

jj2007

Quote from: felipe on April 10, 2017, 02:04:57 AM
Nice work and thanks!  :icon14:
So, the code it's the function GetTickCount from the api in both 32 and 64 bits versions?  :redface:

Yes, as you can see from the addresses, e.g. 000007FEFCED1120 - that's 64 bits wide and in the kernel.

felipe