News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Ascii to DWORD replacement

Started by hutch--, January 25, 2013, 01:58:28 PM

Previous topic - Next topic

hutch--

What I am after is a good clean replacement for Iczelion's old algo that has known problems. I loath to touch heirlooms and while I am happy enough with the extended version, it would be useful to have a conventional version as well. What I would be interested in seeing is a modern version that does exactly the same thing and can be used as a drop in replacement for Iczelion's antique.

Requirement is no stack frame, no ancient string instructions and no high level masm operators, just genuine low level fast code.

This is the original spec.


atodw

atodw proc uses edi esi String:PTR BYTE

Description
atodw converts a decimal string to dword.
Note that the parameter String is an address of DWORD size.

Parameter
1. String The address of the decimal string to convert

Return Value
The DWORD value is returned in eax.

Example
invoke atodw,ADDR MyDecimalString

dedndave

Quoteno ancient string instructions

:biggrin:

crap - i was gonna use LODSB, STOSB, and LOOP

jj2007

Quote from: hutch-- on January 25, 2013, 01:58:28 PM
Requirement is no stack frame, no ancient string instructions and no high level masm operators, just genuine low level fast code.

What else? Should fit in one para aka 16 bytes, at least ten times as fast as the current version?
Be more specific, Hutch :biggrin:

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
atodwJJ proc Src
  push esi
  xor edx, edx        ; will be integer value
  mov esi, [esp+4]
  lea ecx, [esi+9]    ; max address
@@:
  movzx eax, byte ptr [esi]   ; load a char
  lea edx, [edx+4*edx]        ; edx=5*edx
 
lea edx, [2*edx+eax-48]     ; edx=10*edx+(eax-48)
  inc esi
  cmp ecx, esi        ; >9 digits means we need the FPU; 4294967295 aka 2^32 is the limit
  jne @B
  pop esi
  xchg eax, edx
  retn 4

atodwJJ endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

dedndave

here is my first whack at it
it will handle signed or unsigned
and, it's UNICORN aware   :biggrin:
no error checking
        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

a2dw    PROC    lpAscStr:LPSTR

        push    esi
        mov     esi,[esp+8]
        xor     eax,eax
        movzx   edx,byte ptr [esi]
        mov     cl,10
        cmp     dl,'-'
        push    edx
        jnz     a2dw02

        IFDEF __UNICODE__
            add     esi,2
        ELSE
            inc     esi
        ENDIF
        jmp short a2dw01

a2dw00: mul     cl
        lea     eax,[eax+edx-30h]

a2dw01: mov     dl,[esi]

a2dw02:
        IFDEF __UNICODE__
            add     esi,2
        ELSE
            inc     esi
        ENDIF
        or      dl,dl
        jnz     a2dw00

        pop     edx
        cmp     dl,'-'
        jnz     a2dw03

        neg     eax

a2dw03: pop     esi
        ret     4

a2dw    ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  PROLOGUE:EpilogueDef


it's untested, but gives you the concept

EDIT: removed an unneeded line of code
changed MOV CL,10 to MOV ECX,10
rearranged a couple instructions in the preamble

EDIT: oops - went back to MOV CL,10 and MUL CL
that's not going to work - lol
i need another register   :(

dedndave

ok - tested it, this time - lol
signed or unsigned, UNICORN, no error checking
;***********************************************************************************************

        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

a2dwDave PROC   lpAscStr:LPSTR

        push    esi
        push    ebx
        mov     esi,[esp+12]
        xor     eax,eax
        movzx   ebx,byte ptr [esi]
        mov     ecx,10
        cmp     bl,'-'
        push    ebx
        jnz     a2dw02

        IFDEF __UNICODE__
            add     esi,2
        ELSE
            inc     esi
        ENDIF
        jmp short a2dw01

a2dw00: mul     ecx
        lea     eax,[eax+ebx-30h]

a2dw01: mov     bl,[esi]

a2dw02:
        IFDEF __UNICODE__
            add     esi,2
        ELSE
            inc     esi
        ENDIF
        or      bl,bl
        jnz     a2dw00

        pop     edx
        cmp     dl,'-'
        jnz     a2dw03

        neg     eax

a2dw03: pop     ebx
        pop     esi
        ret     4

a2dwDave ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef

;***********************************************************************************************

dedndave

i get about 130 cycles on my prescott for '4294967295',0 or '-2147483648',0
doesn't seem to care if it's UNICORN or not

dedndave

i changed this
a2dw00: mul     ecx
        lea     eax,[eax+ebx-30h]

to this
a2dw00: lea     eax,[4*eax+eax]
        lea     eax,[2*eax+ebx-30h]


almost twice as fast, 79 clock cycles
that means i don't need an extra register
new version coming...

dedndave

73 cycles for '4294967295',0 or '-2147483648',0
;***********************************************************************************************

        OPTION  PROLOGUE:None
        OPTION  EPILOGUE:None

a2dwDave PROC   lpszAscStr:LPSTR

;Ascii to Dword, Signed or Unsigned, UNICODE Aware, No Error Checking
;DednDave, 1-2013

;-------------------------------------------------

        mov     edx,[esp+4]
        xor     eax,eax
        movzx   ecx,byte ptr [edx]
        cmp     cl,'-'
        push    ecx
        jnz     a2dw02

        IFDEF __UNICODE__
            add     edx,2
        ELSE
            inc     edx
        ENDIF
        jmp short a2dw01

a2dw00: lea     eax,[4*eax+eax]
        lea     eax,[2*eax+ecx-30h]

a2dw01: mov     cl,[edx]

a2dw02:
        IFDEF __UNICODE__
            add     edx,2
        ELSE
            inc     edx
        ENDIF
        or      cl,cl
        jnz     a2dw00

        pop     edx
        cmp     dl,'-'
        jnz     a2dw03

        neg     eax

a2dw03: ret     4

a2dwDave ENDP

        OPTION  PROLOGUE:PrologueDef
        OPTION  EPILOGUE:EpilogueDef

;***********************************************************************************************

hutch--

I just found the version that was supposed to replace the old one. It has not been renamed to match the old one.



; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

    align 16

atou proc String:DWORD

  ; ------------------------------------------------
  ; Convert decimal string into UNSIGNED DWORD value
  ; ------------------------------------------------

    mov edx, [esp+4]
    xor ecx, ecx
    movzx eax, BYTE PTR [edx]
    test eax, eax
    jz quit

  lpst:
    add edx, 1
    lea ecx, [ecx+ecx*4]            ; mul ecx * 5
    lea ecx, [eax+ecx*2-48]
    movzx eax, BYTE PTR [edx]
    test eax, eax
    jz quit

    add edx, 1
    lea ecx, [ecx+ecx*4]            ; mul ecx * 5
    lea ecx, [eax+ecx*2-48]
    movzx eax, BYTE PTR [edx]
    test eax, eax
    jnz lpst

  quit:
    lea eax, [ecx]

    ret 4

atou endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

hutch--

JJ,

The new version is much faster than an old version you posted some time ago but the new version does not return the correct results.



OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

atodJJ proc String:DWORD

   mov edx, [esp+4]
   xor eax, eax

@@:   movzx ecx, byte ptr [edx]
   inc edx
   lea eax, [eax+eax*4]
   cmp byte ptr [edx], ch
   lea eax, [ecx+eax*2-30h]
   jnz @b
   ret 4

atodJJ   endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; OPTION PROLOGUE:NONE
; OPTION EPILOGUE:NONE
;
; atodJJ proc String:DWORD   ; Src
;
;   push esi
;   xor edx, edx                  ; will be integer value
;   mov esi, [esp+4]
;   lea ecx, [esi+9]              ; max address
;
; @@:
;   movzx eax, byte ptr [esi]     ; load a char
;   lea edx, [edx+4*edx]          ; edx=5*edx
;   lea edx, [2*edx+eax-48]       ; edx=10*edx+(eax-48)
;   inc esi
;   cmp ecx, esi                  ; >9 digits means we need the FPU; 4294967295 aka 2^32 is the limit
;   jne @B
;
;   pop esi
;   xchg eax, edx
;   retn 4
;
; atodJJ endp
;
; OPTION PROLOGUE:PrologueDef
; OPTION EPILOGUE:EpilogueDef

jj2007

Quote from: hutch-- on January 26, 2013, 12:51:31 AM
JJ,

The new version is much faster than an old version you posted some time ago but the new version does not return the correct results.

Hutch,
I have not tested it much, little time for that, and there is no error checking yet, but the few strings I tried were ok... probably because they were exactly 8 characters long :bgrin:

Thanks,
JJ

qWord

maybe a bit off topic, but thus an short algo can also be simply inlined:
; return: EAX = number
; tchr2d psz
; tchr2dw &sz
; tchr2dw ADDR sz
; tchr2dw Addr sz
; tchr2dw addr sz
tchr2dw macro psz:req
LOCAL l1,l2
% FOR arg,<reparg(psz)>
IF @InStr(1,<&arg>,<ADDR >) OR @InStr(1,<&arg>,<Addr >) OR @InStr(1,<&arg>,<addr >)
lea ecx,@SubStr(<&arg>,5)
ELSEIFIDN @SubStr(<&arg>,1,1),<!&>
lea ecx,@SubStr(<&arg>,2)
ELSE
mov ecx,arg
ENDIF
EXITM
ENDM

    xor eax,eax
l1: movzx edx,TCHAR ptr [ecx]
test edx,edx
jz l2
lea eax,[eax+eax*4]
lea eax,[eax*2+edx-'0']
lea ecx,[ecx+SIZEOF TCHAR]
jmp l1
l2:
endm
MREAL macros - when you need floating point arithmetic while assembling!

dedndave

Jochen's routine does not check for null termination
looks ok, otherwise
although, you could swap usage of EAX and EDX and save the XCHG instruction   :P

i like qWord's idea   :P
saves the call/ret overhead
but, i would mod it a little
        mov     edx,arg
        xor     eax,eax
        jmp short loop01

loop00: lea     eax,[eax+eax*4]
        lea     eax,[eax*2+ecx-'0']

loop01: movzx   ecx,TCHAR ptr [edx]
        test    ecx,ecx
        lea     edx,[edx+sizeof TCHAR]
        jnz     loop00


add a sign check and you've got a cool macro   :biggrin:

jj2007

Hi all,
Here is a new version with correct results and error checks. Of course, it's now slow and bloated :(

testing 1:      1        ok
testing -1:     -1       ok
testing 123:    123      ok
testing 123456789:      123456789        ok
testing 1234567890:     source too long
testing -9876x54321:    x is an invalid character
testing -987.654321:    . is an invalid character
testing -987654321:     -987654321       ok
testing -9876543210:    source too long
testing -987654h:       h is an invalid character
48      bytes for atodw
11      bytes for Dec2Dword
6       bytes for Dec2Dword2
62      bytes for atodwJJ

AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
++++++++++++++++++++
93      cycles for 10 * Dec2Dword
724     cycles for 10 * atodwJJ
3554    cycles for 10 * a2dw

93      cycles for 10 * Dec2Dword
724     cycles for 10 * atodwJJ
3558    cycles for 10 * a2dw

93      cycles for 10 * Dec2Dword
725     cycles for 10 * atodwJJ
3553    cycles for 10 * a2dw

92      cycles for 10 * Dec2Dword
725     cycles for 10 * atodwJJ
3556    cycles for 10 * a2dw

93      cycles for 10 * Dec2Dword
724     cycles for 10 * atodwJJ
3693    cycles for 10 * a2dw


@Dave: Eliminating xchg eax, edx is not a good idea. Two bytes longer and many cycles slower...

Gunther

My results:


testing 1: 1 ok
testing -1: -1 ok
testing 123: 123 ok
testing 123456789: 123456789 ok
testing 1234567890: source too long
testing -9876x54321: x is an invalid character
testing -987.654321: . is an invalid character
testing -987654321: -987654321 ok
testing -9876543210: source too long
testing -987654h: h is an invalid character
48 bytes for atodw
11 bytes for Dec2Dword
6 bytes for Dec2Dword2
62 bytes for atodwJJ

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (SSE4)
loop overhead is approx. 35/10 cycles


23 cycles for 10 * Dec2Dword
1015 cycles for 10 * atodwJJ
1570 cycles for 10 * a2dw

24 cycles for 10 * Dec2Dword
407 cycles for 10 * atodwJJ
1615 cycles for 10 * a2dw

23 cycles for 10 * Dec2Dword
720 cycles for 10 * atodwJJ
1575 cycles for 10 * a2dw

23 cycles for 10 * Dec2Dword
1022 cycles for 10 * atodwJJ
2183 cycles for 10 * a2dw

24 cycles for 10 * Dec2Dword
1018 cycles for 10 * atodwJJ
2180 cycles for 10 * a2dw

--- ok ---


Gunther
You have to know the facts before you can distort them.