**** generating 10000 random numbers & writing them to a buffer ****
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
++-++++++11 of 20 tests valid,
11906 kCycles for 10 * random Str$() MasmBasic
18391 kCycles for 10 * random dwtoa Masm32 SDK
16720 kCycles for 10 * random Str$() MasmBasic with saving
23448 kCycles for 10 * random dwtoa Masm32 SDK with saving
139015 kCycles for 10 * random sprintf CRT
156212 kCycles for 10 * random sprintf CRT with saving
11292 kCycles for 10 * random Str$() MasmBasic
18338 kCycles for 10 * random dwtoa Masm32 SDK
16328 kCycles for 10 * random Str$() MasmBasic with saving
29636 kCycles for 10 * random dwtoa Masm32 SDK with saving
139799 kCycles for 10 * random sprintf CRT
145937 kCycles for 10 * random sprintf CRT with saving
13457 kCycles for 10 * random Str$() MasmBasic
24111 kCycles for 10 * random dwtoa Masm32 SDK
17040 kCycles for 10 * random Str$() MasmBasic with saving
23754 kCycles for 10 * random dwtoa Masm32 SDK with saving
139204 kCycles for 10 * random sprintf CRT
121430 kCycles for 10 * random sprintf CRT with saving
101 bytes for random Str$() MasmBasic
113 bytes for random dwtoa Masm32 SDK
89 bytes for random Str$() MasmBasic with saving
90 bytes for random dwtoa Masm32 SDK with saving
121 bytes for random sprintf CRT
89 bytes for random sprintf CRT with saving
edi points to a buffer and gets filled with strings of the format n<tab>random number<CrLf>:
0 220383915
1 771014011
2 2113869234
3 901510269
4 1077232086
5 1507169316
etc
MakeStringsMB proc uses edi
xor ecx, ecx
.Repeat
Str$(ecx, dest:edi) ; write #number to edi
mov edi, edx ; edx points to the end of the string
mov al, 9
stosb
Str$(Rand(7fffffffh), dest:edi) ; write #random to edi
mov edi, edx
mov ax, 0A0Dh
stosw
inc ecx
.Until ecx>numstrings
xor eax, eax
stosb
ret
MakeStringsMB endp
MakeStringsM32 proc uses edi ebx
xor ebx, ebx
.Repeat
invoke dwtoa, ebx, edi ; pretty fast Masm32 library algo
add edi, len(edi) ; dwtoa doesn't tell us how many bytes were copied...
mov al, 9
stosb
invoke dwtoa, Rand(7fffffffh), edi ; use MasmBasic Rand() to make sure the choice of PRNG does not influence timings
add edi, len(edi)
mov ax, 0A0Dh
stosw
inc ebx
.Until ebx>numstrings
xor eax, eax
stosb
ret
MakeStringsM32 endp
MakeStringsCRT proc uses edi ebx
xor ebx, ebx
.Repeat
invoke crt_sprintf, edi, chr$("%i", 9), ebx ; slow CRT
add edi, rv(lstrlen, edi) ; sprintf doesn't tell us how many bytes were copied...
invoke crt_sprintf, edi, chr$("%i", 13, 10), Rand(7fffffffh)
add edi, rv(lstrlen, edi)
inc ebx
.Until ebx>numstrings
xor eax, eax
stosb
ret
MakeStringsCRT endp
Hi JJ !!
What happen if you put MasmBasic basic routine in a dll?
Regards, HSE.
Quote from: HSE on May 05, 2022, 11:31:15 AM
What happen if you put MasmBasic basic routine in a dll?
Hi Hector,
There is an even better solution: the static library in \Masm32\MasmBasic\MasmBasic.lib :thumbsup:
:biggrin: :biggrin: :biggrin: I'm thinking in fair timings! CRT functions are in DLL.
Quote from: HSE on May 05, 2022, 08:48:09 PM
:biggrin: :biggrin: :biggrin: I'm thinking in fair timing! CRT functions are in DLL.
So what? The CRT DLL resides in the address space of your process. You may have two cycles more for an additional jmp, but that's all.
Quote from: jj2007 on May 05, 2022, 08:51:04 PM
So what? The CRT DLL resides in the address space of your process. You may have two cycles more for an additional jmp, but that's all.
Yes, everybody say that. I don't know. I don't remember that timings,
A DLL proc is never as fast as a local proc. You can test this by using an identical procedure locally and in a DLL.
Quote from: hutch-- on May 05, 2022, 09:09:45 PM
A DLL proc is never as fast as a local proc. You can test this by using an identical procedure locally and in a DLL.
:thumbsup:
Quote from: hutch-- on May 05, 2022, 09:09:45 PM
A DLL proc is never as fast as a local proc. You can test this by using an identical procedure locally and in a DLL.
Oompf.... - you are right, Hutch. But why is that so? Both routines are being called in exactly the same way... :rolleyes:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
6852 cycles for 100 * CRT strlen
5609 cycles for 100 * CRT strlen local
2017 cycles for 100 * MasmBasic Len
5397 cycles for 100 * Masm32 StrLen
8678 cycles for 100 * Masm32 len
6906 cycles for 100 * CRT strlen
5571 cycles for 100 * CRT strlen local
2012 cycles for 100 * MasmBasic Len
5396 cycles for 100 * Masm32 StrLen
8728 cycles for 100 * Masm32 len
6853 cycles for 100 * CRT strlen
5571 cycles for 100 * CRT strlen local
2021 cycles for 100 * MasmBasic Len
5405 cycles for 100 * Masm32 StrLen
8692 cycles for 100 * Masm32 len
6847 cycles for 100 * CRT strlen
5613 cycles for 100 * CRT strlen local
2009 cycles for 100 * MasmBasic Len
5397 cycles for 100 * Masm32 StrLen
8687 cycles for 100 * Masm32 len
14 bytes for CRT strlen
137 bytes for CRT strlen local
10 bytes for MasmBasic Len
10 bytes for Masm32 StrLen
10 bytes for Masm32 len
100 = eax CRT strlen
100 = eax CRT strlen local
100 = eax MasmBasic Len
100 = eax Masm32 StrLen
100 = eax Masm32 len
I guess it has to do with the load address and calling method of a DLL function. If it was a larger function, the call overhead would matter less and less but in practice, that was the result I found.
I see the result but have difficulties to believe it :sad:
What happens between the call near msvcrt.strlen and the arrival at 76C443D3? Olly says nothing happens in between :cool:
00401040 /$ BB 63000000 mov ebx, 63
00401045 |. 8D49 00 lea ecx, [ecx]
00401048 |> CC /int3
00401049 |. 68 2E804000 |push offset 0040802E ; /string = "Hello, this is a simple string intended for testing string algos. It has 100 characters without zero"
0040104E |. FF15 AC8F4000 |call near [<&msvcrt.strlen>] ; \MSVCRT.strlen
00401054 |. 83C4 04 |add esp, 4
00401057 |. 4B |dec ebx
00401058 |.^ 79 EE \jns short 00401048
0040105A \. C3 retn
...
76C443D2 \. C3 retn
76C443D3 /$ 8B4C24 04 mov ecx, [string] ; ASCII "Hello, this is a simple string intended for testing string algos. It has 100 characters without zero"
76C443D7 |. F7C1 03000000 test ecx, 00000003
76C443DD |. 74 1A jz short 76C443F9
76C443DF |> 8A01 /mov al, [ecx]
76C443E1 |. 83C1 01 |add ecx, 1
76C443E4 |. 84C0 |test al, al
76C443E6 |. 74 4C |jz short 76C44434
76C443E8 |. F7C1 03000000 |test ecx, 00000003
76C443EE |.^ 75 EF \jnz short 76C443DF
76C443F0 |. 83C0 00 add eax, 0
I doubt that a disassembly will show you what is happening. The mechanism of a DLL is more complex than an executable, preferred load address, calling machanism and its all OS based, not simple mnemonics. It probably means you are only beating it by 9.9^
I made another test:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
7054 cycles for 100 * CRT strlen
5627 cycles for 100 * CRT strlen local
383 cycles for 100 * DoNothing local
379 cycles for 100 * DoNothing DLL
6854 cycles for 100 * CRT strlen
5587 cycles for 100 * CRT strlen local
382 cycles for 100 * DoNothing local
379 cycles for 100 * DoNothing DLL
6856 cycles for 100 * CRT strlen
5608 cycles for 100 * CRT strlen local
386 cycles for 100 * DoNothing local
386 cycles for 100 * DoNothing DLL
6876 cycles for 100 * CRT strlen
5573 cycles for 100 * CRT strlen local
382 cycles for 100 * DoNothing local
379 cycles for 100 * DoNothing DLL
There is something special about the CRT: The ratio strlen DLL : strlen local is 1.23 :cool:
DLL attached - see yourself.
AMD Athlon(tm) II X2 220 Processor (SSE3)
13350 cycles for 100 * CRT strlen
11621 cycles for 100 * CRT strlen local
4554 cycles for 100 * MasmBasic Len
7781 cycles for 100 * Masm32 StrLen
14032 cycles for 100 * Masm32 len
13175 cycles for 100 * CRT strlen
12431 cycles for 100 * CRT strlen local
4564 cycles for 100 * MasmBasic Len
7804 cycles for 100 * Masm32 StrLen
19606 cycles for 100 * Masm32 len
13177 cycles for 100 * CRT strlen
11576 cycles for 100 * CRT strlen local
4543 cycles for 100 * MasmBasic Len
7807 cycles for 100 * Masm32 StrLen
19776 cycles for 100 * Masm32 len
13126 cycles for 100 * CRT strlen
11224 cycles for 100 * CRT strlen local
4552 cycles for 100 * MasmBasic Len
7781 cycles for 100 * Masm32 StrLen
14100 cycles for 100 * Masm32 len
14 bytes for CRT strlen
137 bytes for CRT strlen local
10 bytes for MasmBasic Len
10 bytes for Masm32 StrLen
10 bytes for Masm32 len
100 = eax CRT strlen
100 = eax CRT strlen local
100 = eax MasmBasic Len
100 = eax Masm32 StrLen
100 = eax Masm32 len
--- ok ---
Crashnot found (line 49): InstrDLL
AMD Athlon(tm) II X2 220 Processor (SSE3)
13127 cycles for 100 * CRT strlen
10990 cycles for 100 * CRT strlen local
4523 cycles for 100 * MasmBasic Len
7781 cycles for 100 * Masm32 StrLen
14378 cycles for 100 * Masm32 len
40584 cycles for 100 * Instr (statically linked)
**** generating 10000 random numbers & writing them to a buffer ****
11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz (SSE4)
+++++++++++9 of 20 tests valid,
4686 kCycles for 10 * random Str$() MasmBasic
7285 kCycles for 10 * random dwtoa Masm32 SDK
11436 kCycles for 10 * random Str$() MasmBasic with saving
11748 kCycles for 10 * random dwtoa Masm32 SDK with saving
47242 kCycles for 10 * random sprintf CRT
53419 kCycles for 10 * random sprintf CRT with saving
4745 kCycles for 10 * random Str$() MasmBasic
7507 kCycles for 10 * random dwtoa Masm32 SDK
9678 kCycles for 10 * random Str$() MasmBasic with saving
12300 kCycles for 10 * random dwtoa Masm32 SDK with saving
45147 kCycles for 10 * random sprintf CRT
53920 kCycles for 10 * random sprintf CRT with saving
4928 kCycles for 10 * random Str$() MasmBasic
7463 kCycles for 10 * random dwtoa Masm32 SDK
9880 kCycles for 10 * random Str$() MasmBasic with saving
11934 kCycles for 10 * random dwtoa Masm32 SDK with saving
59330 kCycles for 10 * random sprintf CRT
52693 kCycles for 10 * random sprintf CRT with saving
101 bytes for random Str$() MasmBasic
113 bytes for random dwtoa Masm32 SDK
89 bytes for random Str$() MasmBasic with saving
90 bytes for random dwtoa Masm32 SDK with saving
121 bytes for random sprintf CRT
89 bytes for random sprintf CRT with saving
--- ok ---
Quote from: TimoVJL on May 06, 2022, 07:57:19 PM
Crashnot found (line 49): InstrDLL
Sorry - wrong archive. Attached the good one, the DLL must be extracted to the executable's folder.
@LiaoMi: Thanks :thup: