In an attempt to create a faster ascii to dword algo, I stumbled over a little bug in \Masm32\m32lib\atodw.asm:
71 bytes for s2d
54 bytes for atodw2
Testing 1234567890
162 ms for StringToDword result=1234567890
145 ms for atodw (old) result=1234567890
143 ms for atodw (new) result=1234567890
165 ms for StringToDword result=1234567890
140 ms for atodw (old) result=1234567890
140 ms for atodw (new) result=1234567890
179 ms for StringToDword result=1234567890
144 ms for atodw (old) result=1234567890
140 ms for atodw (new) result=1234567890
Testing -1234567890
165 ms for StringToDword result=-1234567890
152 ms for atodw (old) result=1498830546
144 ms for atodw (new) result=-1234567890
162 ms for StringToDword result=-1234567890
151 ms for atodw (old) result=1498830546
143 ms for atodw (new) result=-1234567890
163 ms for StringToDword result=-1234567890
152 ms for atodw (old) result=1498830546
142 ms for atodw (new) result=-1234567890
The atodw algo fails for negative numbers. The bug should not be attributed to the author of the algo, but rather to the authors of various assemblers that allow a wrong syntax :cool:
P.S.: Timings welcome, on my machine the new algo is slower, but maybe there are positive surprises with other CPUs :biggrin:
Try the signed version, its a later faster version in the library.
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
.486 ; force 32 bit code
.model flat, stdcall ; memory model & calling convention
option casemap :none ; case sensitive
atol PROTO :DWORD
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
atol proc lpSrc:DWORD
xor eax, eax ; zero EAX
mov edx, [esp+4]
movzx ecx, BYTE PTR [edx]
add edx, 1
cmp ecx, "-" ; test for sign
jne lbl0
add eax, 1 ; set EAX if sign
movzx ecx, BYTE PTR [edx]
add edx, 1
lbl0:
push eax ; store sign on stack
xor eax, eax ; so eax*10 will be 0 for first digit
lbl1:
sub ecx, 48
jc lbl2
lea eax, [eax+eax*4] ; mul eax by 5
lea eax, [ecx+eax*2] ; mul eax by 2 and add digit value
movzx ecx, BYTE PTR [edx] ; get next digit
add edx, 1
jmp lbl1
lbl2:
pop ecx ; retrieve sign
test ecx, ecx
jnz lbl3
ret 4
lbl3:
neg eax ; negative return value is sign set
ret 4
atol endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end
atol works correctly :thumbsup:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Testing 1234567890
167 ms for StringToDword result=1234567890
148 ms for atodw (old) result=1234567890
139 ms for atodw (new) result=1234567890
108 ms for atol result=1234567890
167 ms for StringToDword result=1234567890
147 ms for atodw (old) result=1234567890
154 ms for atodw (new) result=1234567890
105 ms for atol result=1234567890
161 ms for StringToDword result=1234567890
146 ms for atodw (old) result=1234567890
140 ms for atodw (new) result=1234567890
108 ms for atol result=1234567890
Testing -1234567890
164 ms for StringToDword result=-1234567890
158 ms for atodw (old) result=1498830546
146 ms for atodw (new) result=-1234567890
104 ms for atol result=-1234567890
170 ms for StringToDword result=-1234567890
160 ms for atodw (old) result=1498830546
163 ms for atodw (new) result=-1234567890
113 ms for atol result=-1234567890
164 ms for StringToDword result=-1234567890
158 ms for atodw (old) result=1498830546
149 ms for atodw (new) result=-1234567890
108 ms for atol result=-1234567890
Here is my algo, unfortunately not faster than the two Masm32 library algos :sad:
.data
align 16
num0 db "****************" ; Ascii 42, just below dot, minus and plus sign
.code
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
StringToDword proc pString ; uses esi pString
pop edx
pop eax ; pString
push edx
push esi
movaps xmm1, oword ptr num0 ; get stars as mask
movzx edx, byte ptr [eax]
cmp dl, "-"
sete dl
push edx ; flag negate
add eax, edx ; +1 if negative
movups xmm0, oword ptr [eax]
pcmpgtb xmm0, xmm1
pmovmskb edx, xmm0
not edx
bsf edx, edx
add eax, edx ; pointer to end of number
xor esi, esi
neg edx
align 2
@@:
movzx ecx, byte ptr [eax+edx] ; there is no faster solution
and ecx, 15
if 0
imul esi, esi, 10
add esi, ecx
else ; a tick faster
lea esi, [esi+4*esi] ; *5
lea esi, [ecx+2*esi] ; *10+ecx
endif
inc edx
js @B
xchg eax, esi
pop edx
dec edx ; flag negate
.if Zero?
neg eax
.endif
pop esi
ret
StringToDword endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
71 bytes for s2d
54 bytes for atodw2
AMD Ryzen 5 3400G with Radeon Vega Graphics
Testing 1234567890
118 ms for StringToDword result=1234567890
82 ms for atodw (old) result=1234567890
81 ms for atodw (new) result=1234567890
69 ms for atol result=1234567890
118 ms for StringToDword result=1234567890
82 ms for atodw (old) result=1234567890
81 ms for atodw (new) result=1234567890
70 ms for atol result=1234567890
118 ms for StringToDword result=1234567890
82 ms for atodw (old) result=1234567890
81 ms for atodw (new) result=1234567890
72 ms for atol result=1234567890
Testing -1234567890
121 ms for StringToDword result=-1234567890
89 ms for atodw (old) result=1498830546
85 ms for atodw (new) result=-1234567890
80 ms for atol result=-1234567890
121 ms for StringToDword result=-1234567890
90 ms for atodw (old) result=1498830546
84 ms for atodw (new) result=-1234567890
81 ms for atol result=-1234567890
121 ms for StringToDword result=-1234567890
89 ms for atodw (old) result=1498830546
85 ms for atodw (new) result=-1234567890
80 ms for atol result=-1234567890
--- hit any key ---
71 bytes for s2d
54 bytes for atodw2
AMD Ryzen 9 5950X 16-Core Processor
Testing 1234567890
53 ms for StringToDword result=1234567890
53 ms for atodw (old) result=1234567890
51 ms for atodw (new) result=1234567890
45 ms for atol result=1234567890
53 ms for StringToDword result=1234567890
53 ms for atodw (old) result=1234567890
51 ms for atodw (new) result=1234567890
40 ms for atol result=1234567890
54 ms for StringToDword result=1234567890
53 ms for atodw (old) result=1234567890
58 ms for atodw (new) result=1234567890
40 ms for atol result=1234567890
Testing -1234567890
55 ms for StringToDword result=-1234567890
59 ms for atodw (old) result=1498830546
57 ms for atodw (new) result=-1234567890
40 ms for atol result=-1234567890
55 ms for StringToDword result=-1234567890
60 ms for atodw (old) result=1498830546
58 ms for atodw (new) result=-1234567890
43 ms for atol result=-1234567890
55 ms for StringToDword result=-1234567890
58 ms for atodw (old) result=1498830546
57 ms for atodw (new) result=-1234567890
43 ms for atol result=-1234567890
--- hit any key ---
@fearless :thumbsup:
AMD-Ryzen-9-5950X-vs-AMD-Ryzen-5-3400G (https://cpu.userbenchmark.com/Compare/AMD-Ryzen-9-5950X-vs-AMD-Ryzen-5-3400G/4086vsm825156)
With the two algos, the very old atodw was for unsigned conversions, atol was for signed.
122 ms for StringToDword result=-1234567890
101 ms for atodw (old) result=1498830546
93 ms for atodw (new) result=-1234567890
69 ms for atol result=-1234567890
--- hit any key ---
Quote from: hutch-- on November 18, 2021, 02:00:59 AM
atodw was for unsigned conversions
\Masm32\m32lib\atodw.asm
atodw proc String:DWORD
; ----------------------------------------
; Convert decimal string into dword value
; return value in eax
; ----------------------------------------
push esi
push edi
xor eax, eax
mov esi, [String]
xor ecx, ecx
xor edx, edx
mov al, [esi]
inc esi
cmp al, 2D ; <<<<<<<<<<<<<<<<<<<<<<< the bug <<<<<<<<<<<<<<<<<<<<
jne proceed
mov al, byte ptr [esi]
not edx
inc esi
jmp proceed
@@:
sub al, 30h
lea ecx, dword ptr [ecx+4*ecx]
lea ecx, dword ptr [eax+2*ecx]
mov al, byte ptr [esi]
inc esi
proceed:
or al, al
jne @B
lea eax, dword ptr [edx+ecx]
xor eax, edx
pop edi
pop esi
ret
atodw endp
2Dh is the minus sign;
cmp al, 2D translates as
cmp al, 2
:thumbsup:
Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz
89 ms for StringToDword result=1234567890
89 ms for atodw (new) result=1234567890
61 ms for atol result=1234567890
96 ms for StringToDword result=-1234567890
91 ms for atodw (new) result=-1234567890
60 ms for atol result=-1234567890
:arrow_down:
71 bytes for s2d
54 bytes for atodw2
11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
Testing 1234567890
74 ms for StringToDword result=1234567890
75 ms for atodw (old) result=1234567890
73 ms for atodw (new) result=1234567890
55 ms for atol result=1234567890
74 ms for StringToDword result=1234567890
77 ms for atodw (old) result=1234567890
71 ms for atodw (new) result=1234567890
45 ms for atol result=1234567890
74 ms for StringToDword result=1234567890
76 ms for atodw (old) result=1234567890
73 ms for atodw (new) result=1234567890
45 ms for atol result=1234567890
Testing -1234567890
76 ms for StringToDword result=-1234567890
82 ms for atodw (old) result=1498830546
75 ms for atodw (new) result=-1234567890
47 ms for atol result=-1234567890
75 ms for StringToDword result=-1234567890
82 ms for atodw (old) result=1498830546
74 ms for atodw (new) result=-1234567890
46 ms for atol result=-1234567890
76 ms for StringToDword result=-1234567890
84 ms for atodw (old) result=1498830546
79 ms for atodw (new) result=-1234567890
48 ms for atol result=-1234567890
--- hit any key ---
LiaoMi,
What turbo frequency does the i7 run at. I gather its a low power notebook CPU which accounts for the low base frequency, I was just curious at what turbo speed it ran at under load.
Quote from: hutch-- on November 18, 2021, 10:17:09 AM
LiaoMi,
What turbo frequency does the i7 run at. I gather its a low power notebook CPU which accounts for the low base frequency, I was just curious at what turbo speed it ran at under load.
Hi hutch,
"The 11800H brings 8 cores and 16 threads with 24 MB of L3 cache, a 2.3 GHz base clock at 45W, plus turbo frequencies that range from 4.6 GHz on up to 2 cores, to 4.2 GHz all-core." - https://www.techspot.com/review/2262-intel-core-i7-11800h/
The
Intel Core i7-11800H is a high-end octa-core SoC for gaming laptops and mobile workstations. It is belongs to the Tiger Lake H45 family and was announced in mid 2021. It integrates eight Willow Cove processor cores (16 threads thanks to Hyper-Threading). The base clock speed depends on the TDP setting and is
2.3 GHz at 45 W. The
single core Boost can be as high as 4.6 GHz while all cores can run at up to 4.2 GHz. The CPU offers 24 MB of Level 3 cache and supports DDR4-3200 memory.
vs
The
Intel Core i9-11900H is a high end octa core SoC for gaming laptops and mobile workstations. It is based on the Tiger Lake H45 generation and was announced in mid 2021. It integrates eight Willow Cove processor cores (16 threads thanks to HyperThreading). The base clock speed depends on the TDP setting and
at 45 Watt is at 2.5 GHz. The
single core boost speed can reach up to 4.9 GHz, all cores can reach up to 4.5 GHz. The CPU offers 24 MB level 3 cache and supports DDR4-3200 memory.
https://www.notebookcheck.net/i5-11300H-vs-i9-11900H-vs-i7-11800H_13058_13143_13145.247596.0.html
:tongue:
Looks good, with a laptop, is there any tricks for cooling ? Most computer use is not demanding but if you start thrashing the CPU on all cores, you need some trick cooling solution. With modern laptops, can you access the guts to put a source of air flow into it. Just recently I hooked a USB charged air blower and while I use it to blow dust out of things, I wondered if you could ge a small blower that improved the cooling inside the laptop case.
Quote from: hutch-- on November 21, 2021, 03:19:31 PM
Looks good, with a laptop, is there any tricks for cooling ? Most computer use is not demanding but if you start thrashing the CPU on all cores, you need some trick cooling solution. With modern laptops, can you access the guts to put a source of air flow into it. Just recently I hooked a USB charged air blower and while I use it to blow dust out of things, I wondered if you could ge a small blower that improved the cooling inside the laptop case.
Hi Hutch,
in my case, the design and materials of heat conductors were improved:
(https://www.excaliberpc.com/images/resources/746896/bfb6197ddbec491aab4ad46de57bf6f4.png)
The use of additional turbo boosters for cooling will not bring big changes, in dynamic tests a slight difference of 3 - 5 degrees was noticeable, it doesn't really affect the whole picture. I have a laptop stand: https://www.amazon.co.uk/Cooler-Master-Adjustable-Ergonomic-R9-NBC-U3PK-GP/dp/B00ED3WMTC
The cooling in my laptop can be compared with the Noctua version of the fan (https://tech4gamers.com/noctua-nh-l9a-am4-low-profile-cpu-cooler-review/ - https://www.youtube.com/watch?v=AAkXPsLwtsY), in my case there will be two fans, which means the cooling rates will be slightly higher.
On economy profile, the results will be as follows: https://i.ibb.co/RSt8r2q/Fur-Mark-STR.jpg
On turbo profile, the results will be as follows: https://i.ibb.co/JyLFqx6/Fur-Mark-STR2-Turbo.jpg
It seems that on all sites there is an error in the description, for the reason that I could not raise the frequency to 4.2 GHz. Here's another description, which is closer to reality:
Frequency 2.30 GHz
CPU Cores 8
CPU Threads 16
Turbo (1 Core) 4.60 GHz
Turbo (8 Cores): 3.40 GHz
https://cpu-comparison.com/intel-core-i7-11800h/
Most likely this frequency 4.2 GHz refers only to temporal acceleration:
CPU Power Limit 1 (Long Duration) 200 W / 56.00 sec (Unlocked)
Maybe somewhere else there is a power limitation :tongue: :azn: