News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Masm32 SDK: bug in atodw

Started by jj2007, November 17, 2021, 12:25:46 PM

Previous topic - Next topic

jj2007

In an attempt to create a faster ascii to dword algo, I stumbled over a little bug in \Masm32\m32lib\atodw.asm:
71      bytes for s2d
54      bytes for atodw2

Testing 1234567890

162 ms for StringToDword        result=1234567890
145 ms for atodw (old)          result=1234567890
143 ms for atodw (new)          result=1234567890

165 ms for StringToDword        result=1234567890
140 ms for atodw (old)          result=1234567890
140 ms for atodw (new)          result=1234567890

179 ms for StringToDword        result=1234567890
144 ms for atodw (old)          result=1234567890
140 ms for atodw (new)          result=1234567890

Testing -1234567890

165 ms for StringToDword        result=-1234567890
152 ms for atodw (old)          result=1498830546
144 ms for atodw (new)          result=-1234567890

162 ms for StringToDword        result=-1234567890
151 ms for atodw (old)          result=1498830546
143 ms for atodw (new)          result=-1234567890

163 ms for StringToDword        result=-1234567890
152 ms for atodw (old)          result=1498830546
142 ms for atodw (new)          result=-1234567890


The atodw algo fails for negative numbers. The bug should not be attributed to the author of the algo, but rather to the authors of various assemblers that allow a wrong syntax :cool:

P.S.: Timings welcome, on my machine the new algo is slower, but maybe there are positive surprises with other CPUs :biggrin:

hutch--

Try the signed version, its a later faster version in the library.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

    .486                      ; force 32 bit code
    .model flat, stdcall      ; memory model & calling convention
    option casemap :none      ; case sensitive

    atol PROTO :DWORD

    .code

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 4

atol proc lpSrc:DWORD

    xor eax, eax                ; zero EAX
    mov edx, [esp+4]
    movzx ecx, BYTE PTR [edx]
    add edx, 1
    cmp ecx, "-"                ; test for sign
    jne lbl0
    add eax, 1                  ; set EAX if sign
    movzx ecx, BYTE PTR [edx]
    add edx, 1

  lbl0:
    push eax                    ; store sign on stack
    xor eax, eax                ; so eax*10 will be 0 for first digit

  lbl1:
    sub ecx, 48
    jc  lbl2
    lea eax, [eax+eax*4]        ; mul eax by 5
    lea eax, [ecx+eax*2]        ; mul eax by 2 and add digit value
    movzx ecx, BYTE PTR [edx]   ; get next digit
    add edx, 1
    jmp lbl1

  lbl2:
    pop ecx                     ; retrieve sign
    test ecx, ecx
    jnz lbl3
    ret 4

  lbl3:
    neg eax                     ; negative return value is sign set
    ret 4

atol endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end

jj2007

atol works correctly :thumbsup:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

Testing 1234567890

167 ms for StringToDword        result=1234567890
148 ms for atodw (old)          result=1234567890
139 ms for atodw (new)          result=1234567890
108 ms for atol                 result=1234567890

167 ms for StringToDword        result=1234567890
147 ms for atodw (old)          result=1234567890
154 ms for atodw (new)          result=1234567890
105 ms for atol                 result=1234567890

161 ms for StringToDword        result=1234567890
146 ms for atodw (old)          result=1234567890
140 ms for atodw (new)          result=1234567890
108 ms for atol                 result=1234567890

Testing -1234567890

164 ms for StringToDword        result=-1234567890
158 ms for atodw (old)          result=1498830546
146 ms for atodw (new)          result=-1234567890
104 ms for atol                 result=-1234567890

170 ms for StringToDword        result=-1234567890
160 ms for atodw (old)          result=1498830546
163 ms for atodw (new)          result=-1234567890
113 ms for atol                 result=-1234567890

164 ms for StringToDword        result=-1234567890
158 ms for atodw (old)          result=1498830546
149 ms for atodw (new)          result=-1234567890
108 ms for atol                 result=-1234567890


Here is my algo, unfortunately not faster than the two Masm32 library algos :sad:

.data
align 16
num0 db "****************" ; Ascii 42, just below dot, minus and plus sign
.code
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
StringToDword proc pString ; uses esi pString
  pop edx
  pop eax ; pString
  push edx
  push esi
  movaps xmm1, oword ptr num0 ; get stars as mask
  movzx edx, byte ptr [eax]
  cmp dl, "-"
  sete dl
  push edx ; flag negate
  add eax, edx ; +1 if negative
  movups xmm0, oword ptr [eax]
  pcmpgtb xmm0, xmm1
  pmovmskb edx, xmm0
  not edx
  bsf edx, edx
  add eax, edx ; pointer to end of number
  xor esi, esi
  neg edx
  align 2
@@:
  movzx ecx, byte ptr [eax+edx] ; there is no faster solution
  and ecx, 15
  if 0
imul esi, esi, 10
add esi, ecx
  else ; a tick faster
lea esi, [esi+4*esi] ; *5
lea esi, [ecx+2*esi] ; *10+ecx
  endif
  inc edx
  js @B
  xchg eax, esi
  pop edx
  dec edx ; flag negate
  .if Zero?
neg eax
  .endif
  pop esi
  ret
StringToDword endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

TimoVJL

71      bytes for s2d
54      bytes for atodw2
AMD Ryzen 5 3400G with Radeon Vega Graphics

Testing 1234567890

118 ms for StringToDword        result=1234567890
82 ms for atodw (old)           result=1234567890
81 ms for atodw (new)           result=1234567890
69 ms for atol                  result=1234567890

118 ms for StringToDword        result=1234567890
82 ms for atodw (old)           result=1234567890
81 ms for atodw (new)           result=1234567890
70 ms for atol                  result=1234567890

118 ms for StringToDword        result=1234567890
82 ms for atodw (old)           result=1234567890
81 ms for atodw (new)           result=1234567890
72 ms for atol                  result=1234567890

Testing -1234567890

121 ms for StringToDword        result=-1234567890
89 ms for atodw (old)           result=1498830546
85 ms for atodw (new)           result=-1234567890
80 ms for atol                  result=-1234567890

121 ms for StringToDword        result=-1234567890
90 ms for atodw (old)           result=1498830546
84 ms for atodw (new)           result=-1234567890
81 ms for atol                  result=-1234567890

121 ms for StringToDword        result=-1234567890
89 ms for atodw (old)           result=1498830546
85 ms for atodw (new)           result=-1234567890
80 ms for atol                  result=-1234567890
--- hit any key ---
May the source be with you

fearless

71      bytes for s2d
54      bytes for atodw2
AMD Ryzen 9 5950X 16-Core Processor

Testing 1234567890

53 ms for StringToDword         result=1234567890
53 ms for atodw (old)           result=1234567890
51 ms for atodw (new)           result=1234567890
45 ms for atol                  result=1234567890

53 ms for StringToDword         result=1234567890
53 ms for atodw (old)           result=1234567890
51 ms for atodw (new)           result=1234567890
40 ms for atol                  result=1234567890

54 ms for StringToDword         result=1234567890
53 ms for atodw (old)           result=1234567890
58 ms for atodw (new)           result=1234567890
40 ms for atol                  result=1234567890

Testing -1234567890

55 ms for StringToDword         result=-1234567890
59 ms for atodw (old)           result=1498830546
57 ms for atodw (new)           result=-1234567890
40 ms for atol                  result=-1234567890

55 ms for StringToDword         result=-1234567890
60 ms for atodw (old)           result=1498830546
58 ms for atodw (new)           result=-1234567890
43 ms for atol                  result=-1234567890

55 ms for StringToDword         result=-1234567890
58 ms for atodw (old)           result=1498830546
57 ms for atodw (new)           result=-1234567890
43 ms for atol                  result=-1234567890
--- hit any key ---

TimoVJL

May the source be with you

hutch--

With the two algos, the very old atodw was for unsigned conversions, atol was for signed.

122 ms for StringToDword        result=-1234567890
101 ms for atodw (old)          result=1498830546
93 ms for atodw (new)           result=-1234567890
69 ms for atol                  result=-1234567890
--- hit any key ---

jj2007

#7
Quote from: hutch-- on November 18, 2021, 02:00:59 AM
atodw was for unsigned conversions

\Masm32\m32lib\atodw.asm
atodw proc String:DWORD

  ; ----------------------------------------
  ; Convert decimal string into dword value
  ; return value in eax
  ; ----------------------------------------

    push esi
    push edi

    xor eax, eax
    mov esi, [String]
    xor ecx, ecx
    xor edx, edx
    mov al, [esi]
    inc esi
    cmp al, 2D   ; <<<<<<<<<<<<<<<<<<<<<<< the bug <<<<<<<<<<<<<<<<<<<<
    jne proceed
    mov al, byte ptr [esi]
    not edx
    inc esi
    jmp proceed

  @@:
    sub al, 30h
    lea ecx, dword ptr [ecx+4*ecx]
    lea ecx, dword ptr [eax+2*ecx]
    mov al, byte ptr [esi]
    inc esi

  proceed:
    or al, al
    jne @B
    lea eax, dword ptr [edx+ecx]
    xor eax, edx

    pop edi
    pop esi

    ret

atodw endp


2Dh is the minus sign; cmp al, 2D translates as cmp al, 2

HSE

 :thumbsup:

Intel(R) Core(TM) i3-10100 CPU @ 3.60GHz

89 ms for StringToDword         result=1234567890
89 ms for atodw (new)           result=1234567890
61 ms for atol                  result=1234567890

96 ms for StringToDword         result=-1234567890
91 ms for atodw (new)           result=-1234567890
60 ms for atol                  result=-1234567890
Equations in Assembly: SmplMath

LiaoMi

 :arrow_down:
71      bytes for s2d
54      bytes for atodw2
11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz

Testing 1234567890

74 ms for StringToDword result=1234567890
75 ms for atodw (old)           result=1234567890
73 ms for atodw (new)           result=1234567890
55 ms for atol                  result=1234567890

74 ms for StringToDword result=1234567890
77 ms for atodw (old)           result=1234567890
71 ms for atodw (new)           result=1234567890
45 ms for atol                  result=1234567890

74 ms for StringToDword result=1234567890
76 ms for atodw (old)           result=1234567890
73 ms for atodw (new)           result=1234567890
45 ms for atol                  result=1234567890

Testing -1234567890

76 ms for StringToDword result=-1234567890
82 ms for atodw (old)           result=1498830546
75 ms for atodw (new)           result=-1234567890
47 ms for atol                  result=-1234567890

75 ms for StringToDword result=-1234567890
82 ms for atodw (old)           result=1498830546
74 ms for atodw (new)           result=-1234567890
46 ms for atol                  result=-1234567890

76 ms for StringToDword result=-1234567890
84 ms for atodw (old)           result=1498830546
79 ms for atodw (new)           result=-1234567890
48 ms for atol                  result=-1234567890
--- hit any key ---

hutch--

LiaoMi,

What turbo frequency does the i7 run at. I gather its a low power notebook CPU which accounts for the low base frequency, I was just curious at what turbo speed it ran at under load.

LiaoMi

Quote from: hutch-- on November 18, 2021, 10:17:09 AM
LiaoMi,

What turbo frequency does the i7 run at. I gather its a low power notebook CPU which accounts for the low base frequency, I was just curious at what turbo speed it ran at under load.

Hi hutch,

"The 11800H brings 8 cores and 16 threads with 24 MB of L3 cache, a 2.3 GHz base clock at 45W, plus turbo frequencies that range from 4.6 GHz on up to 2 cores, to 4.2 GHz all-core." - https://www.techspot.com/review/2262-intel-core-i7-11800h/

The Intel Core i7-11800H is a high-end octa-core SoC for gaming laptops and mobile workstations. It is belongs to the Tiger Lake H45 family and was announced in mid 2021. It integrates eight Willow Cove processor cores (16 threads thanks to Hyper-Threading). The base clock speed depends on the TDP setting and is 2.3 GHz at 45 W. The single core Boost can be as high as 4.6 GHz while all cores can run at up to 4.2 GHz. The CPU offers 24 MB of Level 3 cache and supports DDR4-3200 memory.
vs
The Intel Core i9-11900H is a high end octa core SoC for gaming laptops and mobile workstations. It is based on the Tiger Lake H45 generation and was announced in mid 2021. It integrates eight Willow Cove processor cores (16 threads thanks to HyperThreading). The base clock speed depends on the TDP setting and at 45 Watt is at 2.5 GHz. The single core boost speed can reach up to 4.9 GHz, all cores can reach up to 4.5 GHz. The CPU offers 24 MB level 3 cache and supports DDR4-3200 memory.
https://www.notebookcheck.net/i5-11300H-vs-i9-11900H-vs-i7-11800H_13058_13143_13145.247596.0.html

:tongue:

hutch--

Looks good, with a laptop, is there any tricks for cooling ? Most computer use is not demanding but if you start thrashing the CPU on all cores, you need some trick cooling solution. With modern laptops, can you access the guts to put a source of air flow into it. Just recently I hooked a USB charged air blower and while I use it to blow dust out of things, I wondered if you could ge a small blower that improved the cooling inside the laptop case.

LiaoMi

Quote from: hutch-- on November 21, 2021, 03:19:31 PM
Looks good, with a laptop, is there any tricks for cooling ? Most computer use is not demanding but if you start thrashing the CPU on all cores, you need some trick cooling solution. With modern laptops, can you access the guts to put a source of air flow into it. Just recently I hooked a USB charged air blower and while I use it to blow dust out of things, I wondered if you could ge a small blower that improved the cooling inside the laptop case.

Hi Hutch,

in my case, the design and materials of heat conductors were improved:


The use of additional turbo boosters for cooling will not bring big changes, in dynamic tests a slight difference of 3 - 5 degrees was noticeable, it doesn't really affect the whole picture. I have a laptop stand: https://www.amazon.co.uk/Cooler-Master-Adjustable-Ergonomic-R9-NBC-U3PK-GP/dp/B00ED3WMTC

The cooling in my laptop can be compared with the Noctua version of the fan (https://tech4gamers.com/noctua-nh-l9a-am4-low-profile-cpu-cooler-review/ - https://www.youtube.com/watch?v=AAkXPsLwtsY), in my case there will be two fans, which means the cooling rates will be slightly higher.

On economy profile, the results will be as follows: https://i.ibb.co/RSt8r2q/Fur-Mark-STR.jpg
On turbo profile, the results will be as follows: https://i.ibb.co/JyLFqx6/Fur-Mark-STR2-Turbo.jpg

It seems that on all sites there is an error in the description, for the reason that I could not raise the frequency to 4.2 GHz. Here's another description, which is closer to reality:

Frequency   2.30 GHz
CPU Cores   8
CPU Threads   16
Turbo (1 Core)   4.60 GHz
Turbo (8 Cores):   3.40 GHz
https://cpu-comparison.com/intel-core-i7-11800h/

Most likely this frequency 4.2 GHz refers only to temporal acceleration:
CPU Power Limit 1 (Long Duration)   200 W / 56.00 sec  (Unlocked)

Maybe somewhere else there is a power limitation  :tongue: :azn: