The MASM Forum

General => The Laboratory => Topic started by: hutch-- on January 25, 2013, 01:58:28 PM

Title: Ascii to DWORD replacement
Post by: hutch-- on January 25, 2013, 01:58:28 PM
What I am after is a good clean replacement for Iczelion's old algo that has known problems. I loath to touch heirlooms and while I am happy enough with the extended version, it would be useful to have a conventional version as well. What I would be interested in seeing is a modern version that does exactly the same thing and can be used as a drop in replacement for Iczelion's antique.

Requirement is no stack frame, no ancient string instructions and no high level masm operators, just genuine low level fast code.

This is the original spec.


atodw

atodw proc uses edi esi String:PTR BYTE

Description
atodw converts a decimal string to dword.
Note that the parameter String is an address of DWORD size.

Parameter
1. String The address of the decimal string to convert

Return Value
The DWORD value is returned in eax.

Example
invoke atodw,ADDR MyDecimalString
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 25, 2013, 02:03:34 PM
Quote
no ancient string instructions

 :biggrin:

crap - i was gonna use LODSB, STOSB, and LOOP
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 25, 2013, 02:53:42 PM
Requirement is no stack frame, no ancient string instructions and no high level masm operators, just genuine low level fast code.

What else? Should fit in one para aka 16 bytes, at least ten times as fast as the current version?
Be more specific, Hutch :biggrin:

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
atodwJJ proc Src
  push esi
  xor edx, edx        ; will be integer value
  mov esi, [esp+4]
  lea ecx, [esi+9]    ; max address
@@:
  movzx eax, byte ptr [esi]   ; load a char
  lea edx, [edx+4*edx]        ; edx=5*edx
  lea edx, [2*edx+eax-48]     ; edx=10*edx+(eax-48)
  inc esi
  cmp ecx, esi        ; >9 digits means we need the FPU; 4294967295 aka 2^32 is the limit
  jne @B
  pop esi
  xchg eax, edx
  retn 4

atodwJJ endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 25, 2013, 03:17:55 PM
here is my first whack at it
it will handle signed or unsigned
and, it's UNICORN aware   :biggrin:
no error checking
Code: [Select]
        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

a2dw    PROC    lpAscStr:LPSTR

        push    esi
        mov     esi,[esp+8]
        xor     eax,eax
        movzx   edx,byte ptr [esi]
        mov     cl,10
        cmp     dl,'-'
        push    edx
        jnz     a2dw02

        IFDEF __UNICODE__
            add     esi,2
        ELSE
            inc     esi
        ENDIF
        jmp short a2dw01

a2dw00: mul     cl
        lea     eax,[eax+edx-30h]

a2dw01: mov     dl,[esi]

a2dw02:
        IFDEF __UNICODE__
            add     esi,2
        ELSE
            inc     esi
        ENDIF
        or      dl,dl
        jnz     a2dw00

        pop     edx
        cmp     dl,'-'
        jnz     a2dw03

        neg     eax

a2dw03: pop     esi
        ret     4

a2dw    ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  PROLOGUE:EpilogueDef

it's untested, but gives you the concept

EDIT: removed an unneeded line of code
changed MOV CL,10 to MOV ECX,10
rearranged a couple instructions in the preamble

EDIT: oops - went back to MOV CL,10 and MUL CL
that's not going to work - lol
i need another register   :(
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 25, 2013, 03:53:33 PM
ok - tested it, this time - lol
signed or unsigned, UNICORN, no error checking
Code: [Select]
;***********************************************************************************************

        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

a2dwDave PROC   lpAscStr:LPSTR

        push    esi
        push    ebx
        mov     esi,[esp+12]
        xor     eax,eax
        movzx   ebx,byte ptr [esi]
        mov     ecx,10
        cmp     bl,'-'
        push    ebx
        jnz     a2dw02

        IFDEF __UNICODE__
            add     esi,2
        ELSE
            inc     esi
        ENDIF
        jmp short a2dw01

a2dw00: mul     ecx
        lea     eax,[eax+ebx-30h]

a2dw01: mov     bl,[esi]

a2dw02:
        IFDEF __UNICODE__
            add     esi,2
        ELSE
            inc     esi
        ENDIF
        or      bl,bl
        jnz     a2dw00

        pop     edx
        cmp     dl,'-'
        jnz     a2dw03

        neg     eax

a2dw03: pop     ebx
        pop     esi
        ret     4

a2dwDave ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef

;***********************************************************************************************
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 25, 2013, 04:08:27 PM
i get about 130 cycles on my prescott for '4294967295',0 or '-2147483648',0
doesn't seem to care if it's UNICORN or not
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 25, 2013, 04:33:30 PM
i changed this
Code: [Select]
a2dw00: mul     ecx
        lea     eax,[eax+ebx-30h]
to this
Code: [Select]
a2dw00: lea     eax,[4*eax+eax]
        lea     eax,[2*eax+ebx-30h]

almost twice as fast, 79 clock cycles
that means i don't need an extra register
new version coming...
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 25, 2013, 04:41:02 PM
73 cycles for '4294967295',0 or '-2147483648',0
Code: [Select]
;***********************************************************************************************

        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

a2dwDave PROC   lpszAscStr:LPSTR

;Ascii to Dword, Signed or Unsigned, UNICODE Aware, No Error Checking
;DednDave, 1-2013

;-------------------------------------------------

        mov     edx,[esp+4]
        xor     eax,eax
        movzx   ecx,byte ptr [edx]
        cmp     cl,'-'
        push    ecx
        jnz     a2dw02

        IFDEF __UNICODE__
            add     edx,2
        ELSE
            inc     edx
        ENDIF
        jmp short a2dw01

a2dw00: lea     eax,[4*eax+eax]
        lea     eax,[2*eax+ecx-30h]

a2dw01: mov     cl,[edx]

a2dw02:
        IFDEF __UNICODE__
            add     edx,2
        ELSE
            inc     edx
        ENDIF
        or      cl,cl
        jnz     a2dw00

        pop     edx
        cmp     dl,'-'
        jnz     a2dw03

        neg     eax

a2dw03: ret     4

a2dwDave ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef

;***********************************************************************************************
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 25, 2013, 05:16:59 PM
I just found the version that was supposed to replace the old one. It has not been renamed to match the old one.



; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

    align 16

atou proc String:DWORD

  ; ------------------------------------------------
  ; Convert decimal string into UNSIGNED DWORD value
  ; ------------------------------------------------

    mov edx, [esp+4]
    xor ecx, ecx
    movzx eax, BYTE PTR [edx]
    test eax, eax
    jz quit

  lpst:
    add edx, 1
    lea ecx, [ecx+ecx*4]            ; mul ecx * 5
    lea ecx, [eax+ecx*2-48]
    movzx eax, BYTE PTR [edx]
    test eax, eax
    jz quit

    add edx, 1
    lea ecx, [ecx+ecx*4]            ; mul ecx * 5
    lea ecx, [eax+ecx*2-48]
    movzx eax, BYTE PTR [edx]
    test eax, eax
    jnz lpst

  quit:
    lea eax, [ecx]

    ret 4

atou endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 26, 2013, 12:51:31 AM
JJ,

The new version is much faster than an old version you posted some time ago but the new version does not return the correct results.



OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

atodJJ proc String:DWORD

   mov edx, [esp+4]
   xor eax, eax

@@:   movzx ecx, byte ptr [edx]
   inc edx
   lea eax, [eax+eax*4]
   cmp byte ptr [edx], ch
   lea eax, [ecx+eax*2-30h]
   jnz @b
   ret 4

atodJJ   endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

 ; OPTION PROLOGUE:NONE
 ; OPTION EPILOGUE:NONE
 ;
 ; atodJJ proc String:DWORD   ; Src
 ;
 ;   push esi
 ;   xor edx, edx                  ; will be integer value
 ;   mov esi, [esp+4]
 ;   lea ecx, [esi+9]              ; max address
 ;
 ; @@:
 ;   movzx eax, byte ptr [esi]     ; load a char
 ;   lea edx, [edx+4*edx]          ; edx=5*edx
 ;   lea edx, [2*edx+eax-48]       ; edx=10*edx+(eax-48)
 ;   inc esi
 ;   cmp ecx, esi                  ; >9 digits means we need the FPU; 4294967295 aka 2^32 is the limit
 ;   jne @B
 ;
 ;   pop esi
 ;   xchg eax, edx
 ;   retn 4
 ;
 ; atodJJ endp
 ;
 ; OPTION PROLOGUE:PrologueDef
 ; OPTION EPILOGUE:EpilogueDef
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 26, 2013, 01:28:42 AM
JJ,

The new version is much faster than an old version you posted some time ago but the new version does not return the correct results.

Hutch,
I have not tested it much, little time for that, and there is no error checking yet, but the few strings I tried were ok... probably because they were exactly 8 characters long :bgrin:

Thanks,
JJ
Title: Re: Ascii to DWORD replacement
Post by: qWord on January 26, 2013, 01:44:21 AM
maybe a bit off topic, but thus an short algo can also be simply inlined:
Code: [Select]
; return: EAX = number
; tchr2d psz
; tchr2dw &sz
; tchr2dw ADDR sz
; tchr2dw Addr sz
; tchr2dw addr sz
tchr2dw macro psz:req
LOCAL l1,l2
% FOR arg,<reparg(psz)>
IF @InStr(1,<&arg>,<ADDR >) OR @InStr(1,<&arg>,<Addr >) OR @InStr(1,<&arg>,<addr >)
lea ecx,@SubStr(<&arg>,5)
ELSEIFIDN @SubStr(<&arg>,1,1),<!&>
lea ecx,@SubStr(<&arg>,2)
ELSE
mov ecx,arg
ENDIF
EXITM
ENDM

    xor eax,eax
l1: movzx edx,TCHAR ptr [ecx]
test edx,edx
jz l2
lea eax,[eax+eax*4]
lea eax,[eax*2+edx-'0']
lea ecx,[ecx+SIZEOF TCHAR]
jmp l1
l2:
endm
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 26, 2013, 01:56:22 AM
Jochen's routine does not check for null termination
looks ok, otherwise
although, you could swap usage of EAX and EDX and save the XCHG instruction   :P

i like qWord's idea   :P
saves the call/ret overhead
but, i would mod it a little
Code: [Select]
        mov     edx,arg
        xor     eax,eax
        jmp short loop01

loop00: lea     eax,[eax+eax*4]
        lea     eax,[eax*2+ecx-'0']

loop01: movzx   ecx,TCHAR ptr [edx]
        test    ecx,ecx
        lea     edx,[edx+sizeof TCHAR]
        jnz     loop00

add a sign check and you've got a cool macro   :biggrin:
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 26, 2013, 03:21:47 AM
Hi all,
Here is a new version with correct results and error checks. Of course, it's now slow and bloated :(

testing 1:      1        ok
testing -1:     -1       ok
testing 123:    123      ok
testing 123456789:      123456789        ok
testing 1234567890:     source too long
testing -9876x54321:    x is an invalid character
testing -987.654321:    . is an invalid character
testing -987654321:     -987654321       ok
testing -9876543210:    source too long
testing -987654h:       h is an invalid character
48      bytes for atodw
11      bytes for Dec2Dword
6       bytes for Dec2Dword2
62      bytes for atodwJJ

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
93      cycles for 10 * Dec2Dword
724     cycles for 10 * atodwJJ
3554    cycles for 10 * a2dw

93      cycles for 10 * Dec2Dword
724     cycles for 10 * atodwJJ
3558    cycles for 10 * a2dw

93      cycles for 10 * Dec2Dword
725     cycles for 10 * atodwJJ
3553    cycles for 10 * a2dw

92      cycles for 10 * Dec2Dword
725     cycles for 10 * atodwJJ
3556    cycles for 10 * a2dw

93      cycles for 10 * Dec2Dword
724     cycles for 10 * atodwJJ
3693    cycles for 10 * a2dw


@Dave: Eliminating xchg eax, edx is not a good idea. Two bytes longer and many cycles slower...
Title: Re: Ascii to DWORD replacement
Post by: Gunther on January 26, 2013, 06:46:47 AM
My results:

Code: [Select]
testing 1: 1 ok
testing -1: -1 ok
testing 123: 123 ok
testing 123456789: 123456789 ok
testing 1234567890: source too long
testing -9876x54321: x is an invalid character
testing -987.654321: . is an invalid character
testing -987654321: -987654321 ok
testing -9876543210: source too long
testing -987654h: h is an invalid character
48 bytes for atodw
11 bytes for Dec2Dword
6 bytes for Dec2Dword2
62 bytes for atodwJJ

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 35/10 cycles


23 cycles for 10 * Dec2Dword
1015 cycles for 10 * atodwJJ
1570 cycles for 10 * a2dw

24 cycles for 10 * Dec2Dword
407 cycles for 10 * atodwJJ
1615 cycles for 10 * a2dw

23 cycles for 10 * Dec2Dword
720 cycles for 10 * atodwJJ
1575 cycles for 10 * a2dw

23 cycles for 10 * Dec2Dword
1022 cycles for 10 * atodwJJ
2183 cycles for 10 * a2dw

24 cycles for 10 * Dec2Dword
1018 cycles for 10 * atodwJJ
2180 cycles for 10 * a2dw

--- ok ---

Gunther
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 26, 2013, 07:30:47 AM
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

680     cycles for 10 * atodwJJ
3967    cycles for 10 * a2dw

682     cycles for 10 * atodwJJ
3860    cycles for 10 * a2dw


A factor five, not bad ;-)
Title: Re: Ascii to DWORD replacement
Post by: frktons on January 26, 2013, 09:01:04 AM
Quote
testing 1:      1        ok
testing -1:     -1       ok
testing 123:    123      ok
testing 123456789:      123456789        ok
testing 1234567890:     source too long
testing -9876x54321:    x is an invalid character
testing -987.654321:    . is an invalid character
testing -987654321:     -987654321       ok
testing -9876543210:    source too long
testing -987654h:       h is an invalid character
48      bytes for atodw
11      bytes for Dec2Dword
6       bytes for Dec2Dword2
62      bytes for atodwJJ

Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE3)
++++++++++++++++++++
269     cycles for 10 * Dec2Dword
1921    cycles for 10 * atodwJJ
7708    cycles for 10 * a2dw

269     cycles for 10 * Dec2Dword
1911    cycles for 10 * atodwJJ
7478    cycles for 10 * a2dw

266     cycles for 10 * Dec2Dword
1949    cycles for 10 * atodwJJ
7799    cycles for 10 * a2dw

266     cycles for 10 * Dec2Dword
1919    cycles for 10 * atodwJJ
7819    cycles for 10 * a2dw

270     cycles for 10 * Dec2Dword
1926    cycles for 10 * atodwJJ
7735    cycles for 10 * a2dw
Title: Re: Ascii to DWORD replacement
Post by: FORTRANS on January 27, 2013, 12:35:55 AM
Hi,

#1 factor of ~6.4
#2 factor of ~5.3
#3 factor of ~5.9

Regards,

Steve N.

Code: [Select]
testing 1: 1 ok
testing -1: -1 ok
testing 123: 123 ok
testing 123456789: 123456789 ok
testing 1234567890: source too long
testing -9876x54321: x is an invalid character
testing -987.654321: . is an invalid character
testing -987654321: -987654321 ok
testing -9876543210: source too long
testing -987654h: h is an invalid character
48 bytes for atodw
11 bytes for Dec2Dword
6 bytes for Dec2Dword2
62 bytes for atodwJJ

pre-P4 (SSE1)
loop overhead is approx. 28/10 cycles


107 cycles for 10 * Dec2Dword
781 cycles for 10 * atodwJJ
4991 cycles for 10 * a2dw

107 cycles for 10 * Dec2Dword
781 cycles for 10 * atodwJJ
4992 cycles for 10 * a2dw

107 cycles for 10 * Dec2Dword
781 cycles for 10 * atodwJJ
4992 cycles for 10 * a2dw

107 cycles for 10 * Dec2Dword
781 cycles for 10 * atodwJJ
4994 cycles for 10 * a2dw

107 cycles for 10 * Dec2Dword
781 cycles for 10 * atodwJJ
4996 cycles for 10 * a2dw


--- ok --- testing 1: 1 ok
testing -1: -1 ok
testing 123: 123 ok
testing 123456789: 123456789 ok
testing 1234567890: source too long
testing -9876x54321: x is an invalid character
testing -987.654321: . is an invalid character
testing -987654321: -987654321 ok
testing -9876543210: source too long
testing -987654h: h is an invalid character
48 bytes for atodw
11 bytes for Dec2Dword
6 bytes for Dec2Dword2
62 bytes for atodwJJ

pre-P4++++++++++++++++++++
164 cycles for 10 * Dec2Dword
1285 cycles for 10 * atodwJJ
6793 cycles for 10 * a2dw

163 cycles for 10 * Dec2Dword
1277 cycles for 10 * atodwJJ
6908 cycles for 10 * a2dw

162 cycles for 10 * Dec2Dword
1276 cycles for 10 * atodwJJ
6784 cycles for 10 * a2dw

162 cycles for 10 * Dec2Dword
1277 cycles for 10 * atodwJJ
6797 cycles for 10 * a2dw

161 cycles for 10 * Dec2Dword
1272 cycles for 10 * atodwJJ
6791 cycles for 10 * a2dw


--- ok --- testing 1: 1 ok
testing -1: -1 ok
testing 123: 123 ok
testing 123456789: 123456789 ok
testing 1234567890: source too long
testing -9876x54321: x is an invalid character
testing -987.654321: . is an invalid character
testing -987654321: -987654321 ok
testing -9876543210: source too long
testing -987654h: h is an invalid character
48 bytes for atodw
11 bytes for Dec2Dword
6 bytes for Dec2Dword2
62 bytes for atodwJJ

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
loop overhead is approx. 17/10 cycles


63 cycles for 10 * Dec2Dword
682 cycles for 10 * atodwJJ
4038 cycles for 10 * a2dw

63 cycles for 10 * Dec2Dword
683 cycles for 10 * atodwJJ
4037 cycles for 10 * a2dw

63 cycles for 10 * Dec2Dword
690 cycles for 10 * atodwJJ
4093 cycles for 10 * a2dw

64 cycles for 10 * Dec2Dword
684 cycles for 10 * atodwJJ
3935 cycles for 10 * a2dw

63 cycles for 10 * Dec2Dword
690 cycles for 10 * atodwJJ
4016 cycles for 10 * a2dw


--- ok ---
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 27, 2013, 02:16:40 AM
it would be nice if we were also testing "atou", posted by Hutch
and "a2dwDave", posted by me   :P

http://masm32.com/board/index.php?topic=1357.msg13648#msg13648 (http://masm32.com/board/index.php?topic=1357.msg13648#msg13648)
http://masm32.com/board/index.php?topic=1357.msg13646#msg13646 (http://masm32.com/board/index.php?topic=1357.msg13646#msg13646)

i ran a simple test of these 2, and "atou" is ~12 cycles faster than "a2dwDave" (4294967295)
but, a2dwDave handles signed/unsigned, and is UNICODE aware

Code: [Select]
    atou: 52bytes
a2dwDave: 43bytes
    atou: 64 64 64 63 63
a2dwDave: 75 75 75 75 75
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 27, 2013, 06:45:53 AM
Dave,

At the going rate the code in "atou" is probably what I will end up using, I have a very fast version by Lingo as well and either of these will work as an exact drop in replacement for Iczelion's old algo. It part of the old rule that you never ever ever change the functionality of a published function as it breaks many folks code. I have versions that do both signed and unsigned but they are not drop in replacements for the old proc.
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 27, 2013, 07:00:56 AM
atou is certainly fast, but... no negative numbers, no error check ::)

testing a2dwDave, 1:      1
testing -1:     -1
testing 123:    123
testing 123456789:      123456789
testing 1234567890:     1234567890
testing -9876x54321:    -1293319729
testing -987.654321:    -1278719729
testing -987654321:     -987654321
testing -9876543210:    -1286608618
testing -987654h:       -9876596

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 17/10 cycles


322     cycles for 10 * atou
516     cycles for 10 * a2dwDave
677     cycles for 10 * atodwJJ
3991    cycles for 10 * a2dw

322     cycles for 10 * atou
517     cycles for 10 * a2dwDave
677     cycles for 10 * atodwJJ
4026    cycles for 10 * a2dw

321     cycles for 10 * atou
515     cycles for 10 * a2dwDave
677     cycles for 10 * atodwJJ
4005    cycles for 10 * a2dw

323     cycles for 10 * atou
515     cycles for 10 * a2dwDave
677     cycles for 10 * atodwJJ
4035    cycles for 10 * a2dw

321     cycles for 10 * atou
515     cycles for 10 * a2dwDave
677     cycles for 10 * atodwJJ
4034    cycles for 10 * a2dw
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 27, 2013, 12:12:08 PM
i understand, Hutch - no prob
i just saw how easy it was to add that support - lol
i guess i could take out the sign check and it would fit in there
UNICODE support would not affect any pre-existing software

prescott w/htt
Code: [Select]
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
loop overhead is approx. 29/10 cycles

559     cycles for 10 * atou
742     cycles for 10 * a2dwDave
1291    cycles for 10 * atodwJJ
4607    cycles for 10 * a2dw

559     cycles for 10 * atou
774     cycles for 10 * a2dwDave
1421    cycles for 10 * atodwJJ
5146    cycles for 10 * a2dw

559     cycles for 10 * atou
843     cycles for 10 * a2dwDave
1227    cycles for 10 * atodwJJ
4560    cycles for 10 * a2dw

563     cycles for 10 * atou
834     cycles for 10 * a2dwDave
1407    cycles for 10 * atodwJJ
5041    cycles for 10 * a2dw

563     cycles for 10 * atou
789     cycles for 10 * a2dwDave
1231    cycles for 10 * atodwJJ
4651    cycles for 10 * a2dw
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 27, 2013, 12:41:28 PM
...exact drop in replacement for Iczelion's old algo. It part of the old rule that you never ever ever change the functionality of a published function as it breaks many folks code.

Masmlib.chm:
Quote
atodw

atodw proc uses edi esi String:PTR BYTE

Description
atodw converts a decimal string to dword.

Note that the parameter String is an address of DWORD size.

Parameter
1. String The address of the decimal string to convert

Return Value
The DWORD value is returned in eax.

The algo I posted does exactly what the documentation says. In addition, it has two new features:

a) error checking:
   print src, ":", 9
   invoke atodwJJ, src
   .if Sign?
      print "source too long", 13, 10
   .else
      add edx, "0"
      .if !Zero?
         push edx
         print esp,  " is an invalid character", 13, 10
         pop edx
      .else
         print str$(eax), 9, " ok", 13, 10
      .endif
   .endif


b) it reads negative numbers correctly, where the old algo would silently produce an error, a behaviour that is not documented in masmlib.chm

How that can break existing code remains a mystery for a poor noob like me ;)

(besides, lazy as I am, Val (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1179) and MovVal (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1180) offer better value for money :icon_mrgreen:)
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 27, 2013, 05:21:58 PM
 :biggrin:

Unsigned DWORDs don't have negative numbers. While I have nothing against multi-purpose functions, I am of the view that existing functions should never suffer from "function creep" as it ends up being the source of broken code.
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 27, 2013, 06:11:45 PM
Unsigned DWORDs don't have negative numbers.

.if Instr("decimal string", "unsigned")
   Print "you are right"
.else
   Print "RTFM (i.e. \Masm32\help\Masmlib.chm)"
.endif
 ;)

(strange that a phrase search for "undocumented non-feature" gives only 18 hits ::))
Title: Re: Ascii to DWORD replacement
Post by: MichaelW on January 27, 2013, 07:36:54 PM
It seems to me that "decimal string" specifies a string of decimal digits, and “signed decimal string” specifies a string of decimal digits with a sign. And there is nothing in the atodw source to deal with a sign.

Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 27, 2013, 08:32:07 PM
It seems to me that "decimal string" specifies a string of decimal digits, and “signed decimal string” specifies a string of decimal digits with a sign.

Michael,

Show me one example on the World Wide Web where the absence of the word "signed" implies that the conversion is unsigned.

Or one HLL that converts "-123" to 253123

Or one example of code that gets broken because it relies on invoke atodw, chr$("-123") returning
253123

But this is stuff for one of the endless religious debates on whether or eax, eax destroys a precious register or not, so I better pull out here :lol:

testing :       0 is an invalid character
testing 0:      0        ok
testing 1:      1        ok
testing -1:     -1       ok
testing 123:    123      ok
testing +123:   123      ok
testing          +123:  123      ok
testing 123456789:      123456789        ok
testing 1234567890:     1234567890       ok
testing 9876543210:     source too long
testing -9876x54321:    x is an invalid character
testing -987.654321:    . is an invalid character
testing -987654321:     -987654321       ok
testing -9876543210:    source too long
testing -987654h:       h is an invalid character
80      bytes for atodwJJ

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 17/10 cycles


324     cycles for 10 * atou
519     cycles for 10 * a2dwDave
699     cycles for 10 * atodwJJ
3930    cycles for 10 * a2dw

323     cycles for 10 * atou
518     cycles for 10 * a2dwDave
698     cycles for 10 * atodwJJ
3970    cycles for 10 * a2dw


EDIT: Attached code renamed to "Asc2DwSafe". Actually, since it's still more than a factor five faster than the current Masm32 a2dw, it should correctly be named Asc2DwSafeAndFast ;-)
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 27, 2013, 08:42:08 PM
Have a look at Iczelion's old algo and you will see that it does not test for sign, it is a DWORD algo and that means unsigned. Now I am sure you could add all sorts of functionality to any algo you like, check the registry for a valid registration number, use a lookup table to test if the string is in EBCDIC format but the original request was for a drop in replacement for Iczelion's old algo, not a redesign full of stuff that the old algo did not have.
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 27, 2013, 09:46:21 PM
well - the question is - does it break existing code
certainly, when we went from v10 to v11, several functions were updated to allow unicode support
the behaviour of many things changed - the "drop-in" functionality rule was bent all to hell

personally, this is a function that i rarely use
i can see using it for user input, primarily
which means that, before i pass a string to the function,
i have to do a lot of pre-parsing to ensure the user doesn't think "xyzzy" is a number
i have to test or limit the number of digits and, in some cases, the format of the string

i might also use a function like this to convert text file info
i think the same rules would apply

at least, that's if i am writing a real-world app
if i am writing a trivial, the rules can be loser   :P
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 27, 2013, 11:30:13 PM
The catch is with this view that it justifies function creep where my own view is if you want a function with extra functionality, you write a new one. As far as UNICODE support, I have tended to write 2 functions, one ASCII, the other UNICODE and switch between them based on the equate. This allows better code for each than switching between them.

Rather than a pseudo philosophical debate I was hoping that someone had a faster one than "atou" or Lingo's faster one, the only reason why I have not used Lingo's algo is it does an untidy exit with a JMP that messes up the call/ret pairing and through testing interferes with the following code.

There are basically two ideas of library design being applied here, I see library modules as components where the alternative is trying to build objects, the difference between assembler programming and high level languages.  :P
Title: Re: Ascii to DWORD replacement
Post by: MichaelW on January 28, 2013, 01:58:17 AM
It seems to me that "decimal string" specifies a string of decimal digits, and “signed decimal string” specifies a string of decimal digits with a sign.

Michael,

Show me one example on the World Wide Web where the absence of the word "signed" implies that the conversion is unsigned.

Actually I had intended to put an appropriate smiley after that sentence, but forgot :icon_confused:
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 04:32:15 AM
well - i am with Hutch on one thing
we can attempt to improve the algo without regard to the added bells and whistles
for example, my routine could easily be modified to not handle signed values or unicode
conversely, support for unicode or signed strings can easily be added to any of the others   :P

what is left is the inner loop that converts ASCII chars into binary via Horner's Rule
the atou algo provided by Iczelion simply unrolls the loop once to achieve a little gain

one way we try to speed things up is by using an LUT
that approach doesn't seem to apply, in this case

another way to speed it up is by handling a series of items en mass, in parallel
it may be practical to grab 2 chars and multiply the first by
100 and the second by 10, or something along that line
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 28, 2013, 05:01:35 AM
I consider the "added bells and whistles" pretty important. Few programmers, including those who read manuals, would expect "-123" to be translated to 253123 - imagine that "horrible coding error" happens in a real life paid for application. Imagine (quoting "it is a DWORD algo and that means unsigned") that the Microsoft Macro Assembler aka Masm would assign 253123 to
.data
MyVar   DWORD -1

... without even issuing a warning ::)

It is a pity that Iczelion is not here to comment on this debate. Unless he has become the director of a museum, I believe he would be in favour of improving his algo.

P.S.: In the previous version of the algo posted above, I had forgotten the range "2147483648" ... "4294967295". These strings are now correctly interpreted (one byte more, one cycle less :biggrin:)
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 05:21:55 AM
well - i tend to write my own code for apps, anyways   :P
each case is a little different and i can optimize in the direction according to need
i can also add or remove features that each specific case dictates
i use the masm32 library for trivials and mostly for test/debug - very handy

back to the basic algo....
here is a little teaser to inspire thought   :biggrin:
Code: [Select]
        add     ecx,eax               ;ECX = ECX + EAX
        lea     eax,[32*eax+eax]      ;EAX = 33 * EAX
        add     ecx,eax               ;ECX = ECX + (34 * EAX)
        lea     eax,[2*eax+ecx]       ;EAX = (66 * EAX) + ECX + (34 * EAX) = (100 * EAX) + ECX

well - it looked good, on paper - lol
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 06:33:37 AM
another idea is to approach it as a set of cases
we might select the "major" case set based on which dword the first null is found
Code: [Select]
case0_0 db 0        ;invalid case, but we should probably return 0
case0_1 db '1',0
case0_2 db '12',0
case0_3 db '123',0

case1_4 db '1234',0
case1_5 db '12345',0
case1_6 db '123456',0
case1_7 db '1234567',0

case2_8 db '12345678',0
case2_9 db '123456789',0
case2_0 db '1234567890',0
case2_x db '12345678901'     ;illegal case

then, we can use the "fast null finder" algo....
Code: [Select]
        mov     ecx,[edx]
        and     ecx,7F7F7F7Fh
        sub     ecx,01010101h
        and     ecx,80808080h
        jnz     test_one_of_the_bytes_is_null_or_80h
that algo is fast because it assumes that 80h is rare in the string
that particularly applies in this application, because valid string characters are never 80h

now, we want to process 0 to 4 decimal chars at a time
i.e., 5 "sub-cases" - one of which is "we are done"
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 10:19:59 AM
i devised a little test to make comparison somewhat "real world"
i use 6 test strings, then divide the result by 6.....
Code: [Select]
        ALIGN 4
szTest01 db '123',0
        ALIGN 4
szTest02 db '1234567',0
        ALIGN 4
szTest03 db '1234567890',0

        ALIGN 4
         db 0                ;mis-align
szTest04 db '123',0
        ALIGN 4
         db 0                ;mis-align
szTest05 db '1234567',0
        ALIGN 4
         db 0                ;mis-align
szTest06 db '1234567890',0

the atou function runs about 52 cycles average on the 6 test strings
that's a little under 8 cycles per char
hard to beat that, really   :P
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 28, 2013, 11:58:20 AM
 :biggrin:

Quote
I consider the "added bells and whistles" pretty important. Few programmers, including those who read manuals, would expect "-123" to be translated to 253123 - imagine that "horrible coding error" happens in a real life paid for application. Imagine (quoting "it is a DWORD algo and that means unsigned") that the Microsoft Macro Assembler aka Masm would assign 253123 to
.data
MyVar   DWORD -1
... without even issuing a warning

Assembler programmers are supposed to know the difference between signed and unsigned, in various higher level languages the distinction between DWORD and LONG, if they don't they find out REAL SOON.  :P Way better for it to go BANG when you get it wrong than hide behind pseudo objects that hide the blunders.

The debate is not about improving Iczelion's old algo, everything and its dog has done that, it is what will be used as a drop in replacement and here changing the functionality of the algo is a big NO NO as it risks breaking someone's code. Writing a "different" algo is uncontentious, get it going, make sure its reliable and document it properly.

What I refuse to do is get involved in function creep, add this, add that, more bells and whistles and you eventually end up with crap like most higher level languages, a function that is both slow and does everything badly but HAY, its got all sorts of bells and whistles. (Yeah yeah, but you can always get a faster box to make crap look like its fast)  :badgrin:

I do have a couple of faster versions, the "_ex" version outperforms it over a wide range of test conditions but at the cost of it being larger and Lingo's version is also faster but has a scruffy exit that messes up the call / ret pairing, I was foolish enough to expect that a few of the folks here may have had a faster one than atou, I don't need a slower different version.
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 12:10:13 PM
well....
the ability to support signed values, in this case, does not violate the ability to support unsigned values
this is one of the rare cases where that statement can be made

still, i can see the case, in terms of backward compatibility, as well
if someone writes code with this version of the library, we shouldn't cause it to not work on the previous one
while that isn't always true, we don't have to got out of our way to cause it to happen - lol

at any rate....
let's see the algo that lingo had
it shouldn't be too hard to fix the call/ret code and see how it performs
either way, it would be interesting to see the algo, from a design point of view
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 01:39:20 PM
I was foolish enough to expect that a few of the folks here may have
had a faster one than atou, I don't need a slower different version.

that's pretty crappy
really, if you don't want our help, don't ask for it - we'll all be happier
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 28, 2013, 03:51:56 PM
Tolerate me here, when I posted the original question I was not looking for "help", I have only been writing assembler for the last 20 years and 32 bit for over 15 years, what I asked if any of our members had a fast algo that I could use as a drop in for the old one. It was an entirely reasonable peer request to what used to be a range of members who wrote fast code. pseudo philosophical debate, re-interpretations and function redesign just make the task really hard to keep track of and its why I rarely post much code these days as the signal to noise ratio exceeds the value of what you get back from it.

I did not write the old algo and I loath to touch an heirloom like that so I went looking for a drop in replacement given that once we had members who did write fast code and had a range of algos floating around that were useful with such requests.
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 04:08:39 PM
the original post doesn't come off that way

i was willing to set the bells and whistles and philosophical discussion aside and try a few twists on algo design
that is the part that i find interesting, to be honest

at the end of the day, you are going to use the Iczelion replacement code, anyways
it was a waste of my time

and, tolerate me, here, but this isn't a very kind statement to aim at people who are trying to be helpful
Quote
I was foolish enough to expect that a few of the folks here may have had a faster one than atou

i don't really care which routine you use - lol
as i mentioned, i usually like to write my own stuff, anyways

the way it is, i wouldn't use this routine in a real-world app
because it eats up erroneous strings and spits out a smiley face
if the string has a non-numeric, is too long, is null, you won't know about it
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 04:19:46 PM
this shaves a few clock cycles off, overall about 5% faster on a variety of strings

Code: [Select]
    mov     edx, [esp+4]
    movzx   eax, byte ptr [edx]
    add     edx, 1
    test    eax, eax
    jnz     entry

    jmp short quit

lpst:
    lea     eax, [eax+eax*4]            ; mul eax * 5
    add     edx, 1
    lea     eax, [ecx+eax*2-48]

entry:
    movzx   ecx, byte ptr [edx]
    test    ecx, ecx
    jz      quit

    lea     eax, [eax+eax*4]            ; mul eax * 5
    add     edx, 1
    lea     eax, [ecx+eax*2-48]
    movzx   ecx, byte ptr [edx]
    test    ecx, ecx
    jnz     lpst

quit:
    ret     4
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 05:02:34 PM
a while back, i needed a function like this to process user input (console)
in this case, i wanted to allow unsigned values, only
the routine that did the conversion was actually a helper routine
but - it would return error status for the top-level function
Code: [Select]
;******************************************************************

Str2Uint PROC    lpString:LPSTR

;Convert an ASCII Decimal String to a 32-Bit Unsigned Integer
;
;Call With: lpString = address of zero-terminated ASCII numeric string
;
;  Returns: EAX = status:
;                 0 = no error, EDX is valid
;                 1 = "value too large" error
;                 2 = "non-numeric character in string" error
;           EDX = 32-bit unsigned integer if EAX = 0
;                 undefined if EAX is not 0
;
;Also Uses: ECX, all other registers are preserved

;------------------------------------------------------------------

the top-level function could then display error strings and re-prompt the user for input
Code: [Select]
;******************************************************************

ReadUnsigned PROC

;Read an Unsigned Integer from Console Keyboard
;
;  If the user enters a value that is too large or any non-numeric
;characters, a re-try prompt is displayed.
;
;Call With: Nothing
;
;  Returns: EAX = EDX = 32-bit unsigned integer value
;
;Also Uses: ECX, all other registers are preserved

;------------------------------------------------------------------

speed isn't a big deal - how fast can the guy enter numbers, anyways - lol
but, handling the cases was important

because i only wanted to allow unsigned integers, i could have filtered out non-numerics at the keyboard
but - you still have to know if the value is too large
4294967296 should not return a 0
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 28, 2013, 08:58:23 PM
I was foolish enough to expect that a few of the folks here may have had a faster one than atou, I don't need a slower different version.

Hutch,

I understand your concerns about breaking existing code. You asked for an atodw replacement, we tried to offer a version that does exactly the same but faster.

What I posted returns exactly the same in eax when fed valid positive strings, so it should not break any existing code. It's also a factor five faster.

What you chose in the end returns exactly the same in eax when fed valid positive strings, is even faster than mine, but if the user has any doubt about the validity of the strings (leading whitespace, + and -, dots in 123.456), then the error checking will eat up the advantage.

But OK, there are cases where you have a huge text file to read in and no doubts about the format, so it will be convenient to have a really fast algo like atou. Nonetheless, it would enhance the usability of Masm32 if you could add a line to the documentation saying "no signs, no whitespace allowed". The mere assumption that assembler programmers see DWORD and understand unsigned is risky, given that ML.EXE interprets DWORD -1 as ffffffffh.

 :icon14:
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 28, 2013, 11:50:41 PM
i can see where handling signed values is a problem
on previous versions, these input values were not supported
if you now allow it, then 4294967295 = -1
well - that would be a problem if previous versions denied the minus sign

Quote
But OK, there are cases where you have a huge text file to read in and no doubts about the format
i can't think of any practical cases where that is true
if it's a text file, then the user may have edited it
unless - your application just wrote that file - then you know the format
but - oops - if i just wrote a file with a large number of dword values, i should have stored them as binary dwords
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 29, 2013, 12:38:21 AM
> The mere assumption that assembler programmers see DWORD and understand unsigned is risky, given that ML.EXE interprets DWORD -1 as ffffffffh.

This difference has some to do with where we come from, in mnemonic notation ML sees 32 bits as 32 bits without interpreting it as either signed or unsigned. As of course you would be aware, it is how you evaluate a 32 bit value that determines if its signed or unsigned. Some higher level languages name the distinction as DWORD versus LONG but assembler has 32 bit memory and register locations, nothing more.

Iczelion's old ago was never a signed version, it only ever produced unsigned results and when I asked about a drop in replacement, it was to replace that functionality, not add to it or improve its functionality list. It may be appropriate in a high level language to keep adding baggage to hold the hands of the inexperienced ALA VC, VB etc .... but with assembler, its just extra junk that has no place in the typical targets for assembler code. Disassemble core OS code or very high performance critical code and you don't find junk in it at all, this is what the masm32 project was pointed at.

The problem for me maintaining 240 odd modules in a library is the signal to noise ratio, in the past in this forum I could ask members if they had code or algorithms to perform particular functions and they understood what was being targeted but of late all I get is pseudo philosophical debate, attempts to re-interpret the task and very little grasp of what it takes to maintain a library of procedures that have been used for years by a massive number of people. This is why I rarely ever post code any longer, the effort I put in and the difficulty tracking the results make it unviable.
Title: Re: Ascii to DWORD replacement
Post by: MichaelW on January 29, 2013, 01:37:47 AM
the way it is, i wouldn't use this routine in a real-world app
because it eats up erroneous strings and spits out a smiley face
if the string has a non-numeric, is too long, is null, you won't know about it

FWIW, for the heavily used CRT conversion functions:
Quote
The function stops reading the input string at the first character that it cannot recognize as part of a number.

Even for QuickBASIC and VB, hold-your-hand HLLs aimed at non-programmers, the VAL function implemented this same behavior, although like the CRT functions, it would skip over leading whitespace, and flag out-of-range values as an error.


Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 29, 2013, 01:51:54 AM
I don't mind if

include \masm32\include\masm32rt.inc

.code
start:
   MsgBox 0, str$(rv(atodw, " 123 ")), "Should be 123:", MB_OK
   exit
end start

shows me 2401470 but at least the manual should explain why it does so :biggrin:
Title: Re: Ascii to DWORD replacement
Post by: Tedd on January 29, 2013, 02:24:24 AM
Just to add more noise.. :badgrin:

DWORD is non-signed -- it's neither explicitly signed nor unsigned, it comes down to interpretation within the context.
"atou32" is unambiguously unsigned, and "atoi32" is unambiguously signed. Future functions should use these names instead, and deprecate atodw (atodw should still be available for compatibility, but the other two should be preferred for new code.)

As for the atodw replacement, the biggest issue that it doesn't attempt to validate its input and instead produces incorrect results. I don't believe replacing a function that does no validation with one that does will break anything, as any program relying on this function will already have had to check its input separately (or simply behave erroneously when fed bad input.) So checking the input again will not break anything, but could be argued to waste multiple clock cycles.

However, it's generally more efficient to check the input as you convert it, rather than in a separate step. Of course, this will slow down the function itself, though not the operation as a whole. The only case where it's a downside is when you have 10,000 numbers to convert which are already known to be inherently valid and checking them again would waste entire milliseconds; but I don't expect this usage appears much outside of contrived test cases. The common usage is converting a single user inputted number, and the function should do that correctly.

In any case, the functions should clearly document what they do so these issues don't come as a surprise -- that means the authors should document their own functions!
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 29, 2013, 03:05:14 AM
The distinction here is still between "component" and "object", the new algo does what its supposed to do, it converts an ascii string within the DWORD integer range into a DWORD and it does not pretend to do anything else. Now if your task is converting user input its easy enough to exhaustively filter the input to ensure that what you feed into the conversion is purely numeric and within range but there are enough other tasks where you don't want the extra padding. try ripping the guts out of a massive log file, parsing the Nth word which is numbers then feeding it to a conversion and the result fed into an array. The last thing you need is to filter it again, especially if the log file runs into the many millions of entries.

This is why you design a high performance library as components which you then use to construct objects if you need them. The problem with constructing objects first up is they often don't fit other tasks. The disease that afflicts many high level languages is exactly the failure to understand why you isolate components so that you don't end up with bloated bundles of junk. VB, VC, Pascal, Java and so on is full of chyte like this and it is the main contributor to bloated slow sloppy and unreliable code.

Components have a very good characteristic when it comes to reliability of code, if you get it wrong it goes BANG (OS says naughty things about your app etc ...) and you only have one option, get it right. This may be anathema in VB or JAVA but this is an assembler library and low level code is where the action is in terms of performance. If you want high level library functions that hold you poor hot sweaty little hand and don't let you make mistakes, try JAVA or VB, that is what they are there for, those folks who don't want to know what a pointer is can safely dump the contents into an array rather than directly address the data.  :P
Title: Re: Ascii to DWORD replacement
Post by: Tedd on January 29, 2013, 03:44:35 AM
The distinction here is still between "component" and "object", the new algo does what its supposed to do, it converts an ascii string within the DWORD integer range into a DWORD and it does not pretend to do anything else.
It also does what it's not supposed to do, it converts nonsense "*$&%!" into a supposedly valid DWORD with no indication.

Quote
Now if your task is converting user input its easy enough to exhaustively filter the input to ensure that what you feed into the conversion is purely numeric and within range but there are enough other tasks where you don't want the extra padding.
True, it's easy, but if you do it in more than one place then it makes sense to do it at the same time and avoid unnecessary code duplication.

Quote
try ripping the guts out of a massive log file, parsing the Nth word which is numbers then feeding it to a conversion and the result fed into an array. The last thing you need is to filter it again, especially if the log file runs into the many millions of entries.
Why would you need to filter it again? If you know your input is already valid, you do no further checking and accept the return value as-is. Usage is the same in this case. As for slow-down, millions of 'extra' cycles still only account for a few seconds at most, and this is an insignificant portion of the processing time.


You can rant about hand-holding and going bang if there's an error, but sanity checking is advisable. Programs should not throw an exception and die whenever they encounter a typo.

Obviously the choice is yours in the end, but functions should at least have documentation on their limitations, e.g. "Note: this function does not check input, it will happily convert 'dfghjkl' into a dword."
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 29, 2013, 04:51:55 AM
well - that function returns the value in EAX - no changing that
to conform with "standards", EAX would be used for status

maybe the best thing you could do is to return 0 for all non-valid strings
because the EDX register returned nothing on older versions, it could be used for status
but - we are back to discussing design philosophy - lol

i look at it this way.....
whatever they were using a2dw for, before, didn't need validation
for the most part, i suspect that's trivials and test pieces
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 29, 2013, 10:37:37 AM
There are times when I feel like a voice crying in the wilderness, "don't bloat this stuff", "don't write crap code", "don't go down the wide and easy path to destruction" etc etc ....

Lets face it, a conversion IS a conversion and like the vast number of other functions used in Windows programming, it requires user controlled input, in this case a string comprised of ascii numbers only. Now like most other functions you can pass the wrong data to it and get nonsense results but we are talking about assembler programmers here, not learner VB or similar.

Input control is no big deal and it varies from place to place, from a GUI application, most often you filter the edit control so that only numbers can be entered, if its floating point you also allow a period. If the input is from the console which is an ever decreasing task in modern times, you get the string from StdIn and have a look at it first and squark an error if it has non-numeric characters in it. What you don't do is put this string filtering crap in the conversion because unless it handles every case of what can be fed to it as string, you end up with duplication or redundancy when you need a different case.

You make objects from components, do it the other way around and you end up with VB, VC, Delphi and similar crap that tries unsuccessfully to hold the hand of the inexperienced.

Now notwithstanding such weighty considerations, the masm32 library will remain a component library as per its original design but I would not want to stifle the creativity of folks who want to do it differently, I have been encouraging people to do exactly that for many years now, if it "don't fit" roll your own.  :biggrin:
Title: Re: Ascii to DWORD replacement
Post by: mineiro on January 29, 2013, 04:14:50 PM
I have tried this this night, return eax=0 if error on a2dw_min, and a not checking version follows too:
;----------------
Edited after: I removed the algo because does not check if have more than 10 digits, and shl by 3 does not catch some carry; sorry for the incovenience. The algo that does not check anything is:
Code: [Select]
atou_min proc String:DWORD

  ;mov edx,[esp+4]
pop edx
pop edx
  xor eax,eax
i = 0
repeat 10
movzx ecx,byte ptr [edx+i]
test ecx,ecx
jz @F
lea ecx,[eax*8+ecx-30h]
lea eax,[eax*2+ecx]
i = i+1
endm
align 4
@@:
jmp dword ptr [esp-8]
;ret 4 
atou_min endp
Title: Re: Ascii to DWORD replacement
Post by: sinsi on January 29, 2013, 04:39:39 PM
I would agree that it's better that the caller validate the string first, because they control that part.
It might not need validation so the overhead disappears, whereas if it's in the conversion routine it's redundant.
Where does validation end? Can we skip spaces/tabs? What about a null pointer?

In the old days we would use the carry flag to indicate an error, why did MS get rid of that  ::)
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 29, 2013, 09:52:07 PM
yah - it made it easy - lol
i guess they figure they can fit more info into EAX
then - they use 32 bits to return 0 or 1 - if you want the error code, call GetLastError   :lol:
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 29, 2013, 10:41:11 PM
I would agree that it's better that the caller validate the string first, because they control that part.
It might not need validation so the overhead disappears, whereas if it's in the conversion routine it's redundant.
Where does validation end? Can we skip spaces/tabs? What about a null pointer?

Put things into perspective:
atodwJJ:        0.042 seconds per 1000000 loops
Val()  :        0.136 seconds per 1000000 loops

The "slow" one skips leading and trailing spaces and tabs, and it doesn't care if the string contains a dot, or if it ends with "h" or "b" or "y" or "d" or "e2", or if it starts with "0x" or "$". But it does throw an error if the string ends with "x" or if it finds other stuff that is not in our list of valid number formats.

This kind of algo is what coders need 99% of their time - except in those cases where 0.136 seconds per Million invokes is too slow, and where they can be absolutely sure that the format is always a correct positive decimal string.

Both kind of algos have their place, and I agree with Hutch that a really fast replacement for atodw doesn't need the bells and whistles. The only point of contention is how drastic the warnings in the documentation should be, to prevent that beginners use atodw as if it was a fool-proof Val(whatever).
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 30, 2013, 01:37:22 AM
Here is a quickly written scruffy to test user input from a source like the console StdIn. (Warning, this is a 1:30am model)  :biggrin:



IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    is_str_int PROTO :DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    mov edx, rv(is_str_int,"  -12345678")
    print ustr$(edx),13,10

    mov edx, rv(is_str_int,"  12345678")
    print ustr$(edx),13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

is_str_int proc pstring:DWORD

    mov eax, [esp+4]
    sub eax, 1

  tlb:                      ; trim any leading garbage
    add eax, 1
    cmp BYTE PTR [eax], 32
    je tlb
    cmp BYTE PTR [eax], 9
    je tlb

    sub eax, 1

  chlp:                     ; test against integer character range
    add eax, 1
    cmp BYTE PTR [eax], 0
    je iszero
    cmp BYTE PTR [eax], 48
    jb invalid_char
    cmp BYTE PTR [eax], 57
    jna chlp

  invalid_char:
    xor eax, eax            ; return zero if invalid character
    ret 4

  iszero:
    mov eax, 1              ; return non zero if integer string
    ret 4

is_str_int endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 30, 2013, 02:18:36 AM
if you are going to pass through the string, you may as well process it   :biggrin:
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 30, 2013, 09:57:50 AM
 :biggrin:

Yeah, but which string, unsigned, signed, floating point, various scientific notation etc etc .... Are you going to fill a conversion with every possible case of data error that can be fed to it ? Some experience in library design will help you here, the reason why you write re-usable components is to avoid duplication and redundancy (too many tests for the same thing. Why would you add that much crap to a conversion if you are writing a GUI input where the filtering is done in the edit control ?

You are confusing objects (by a particular theory) with components. If you are writing a library that has many objects, often doing similar things, then you write re-usable components and call them from your higher level objects, this way the finer granularity of your library yields smaller more efficient executables.
Title: Re: Ascii to DWORD replacement
Post by: frktons on January 30, 2013, 10:19:14 AM
I think the check of input values has to be done before calling the conversion
routine. No need to duplicate them.

Title: Re: Ascii to DWORD replacement
Post by: dedndave on January 30, 2013, 11:03:09 AM
:biggrin:

Yeah, but which string, unsigned, signed, floating point, various scientific notation etc etc ....
Are you going to fill a conversion with every possible case of data error that can be fed to it ?

i think Jochen has a nice routine like that   :t
it would be a nice addition to the masm32 library
Title: Re: Ascii to DWORD replacement
Post by: hutch-- on January 30, 2013, 12:32:33 PM
 :biggrin:

> i think Jochen has a nice routine like that it would be a nice addition to the masm32 library

No, its a nice addition to MasmBasic, that is what JJ wrote it for. A library for MASM needs much finer granularity to avoid duplication and redundancy, the difference is the conceptual difference between a high level language and a low level language, small isolated procedures built as separate object modules that can be combined in very flexible ways to produce a wide range of functionality.
Title: Re: Ascii to DWORD replacement
Post by: jj2007 on January 30, 2013, 07:33:49 PM
:biggrin:

> i think Jochen has a nice routine like that it would be a nice addition to the masm32 library

No, its a nice addition to MasmBasic, that is what JJ wrote it for. A library for MASM needs much finer granularity to avoid duplication and redundancy, ...

Mixing Masm32 and MasmBasic is perhaps not a good idea, I agree. Val (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1179)(My$) (and MovVal (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1180)) are "all you can eat" macros that emulate the behaviour of the corresponding HLL (in this case, GfaBasic (http://www.cheek.org/theview/2008/20081110.htm)).

There is an issue with the design logic, however.

First, we tend to say "it's better to have a high-speed specialised algo than a slow allrounder". Right, but MasmBasic's Val/MovVal, while able to "eat" decimal, bin and two hex formats, and correctly handling whitespace, is five times as fast as the current atodw... Now that will change with the superfast new Masm32 atodw, fine.

Second, "I want to be sure the algo does only what I want it to do". I.e. atodw should not convert "-123" if the documentation says "unsigned decimal" (and one can argue whether writing "DWORD" is sufficient). However, if you feed "-123" to atodw, then the algo should complain, otherwise that "design advantage" is nil.

Last but not least, I still believe that even the experienced programmer who starts with Masm32 needs a very explicit warning that atodw, if fed with " 123 ", produces rubbish. Not knowing that behaviour produces the kind of bug that makes us spend sleepless nights, and it would be easy to avoid it.
Title: Re: Ascii to DWORD replacement
Post by: MichaelW on January 30, 2013, 08:34:00 PM
I think the check of input values has to be done before calling the conversion routine. No need to duplicate them.

I agree. A separate general-purpose “filter” implemented as a procedure, or better, as a macro that generates procedures that are specific to the filtering requirements (IOW that have one argument, the address of the string, and where the specific “filter” is hard coded).