News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Fast DwordtoHex ?

Started by guga, November 27, 2015, 11:16:24 PM

Previous topic - Next topic

Grincheux

I suggest :


Hex PROC __lpszString:LPSTR,__dwNumber:DWORD

mov edi,__lpszString
mov edx,__dwNumber

@Loop :

mov eax,edx
and eax,0f0000000h
shr eax,28

cmp al,10
jl @NotAlpha

add al,'A' - 10
jmp @Next

@NotAlpha :

add al,'0'
@Next :

mov Byte Ptr [edi],al
add edi,1
shl edx,4
jnz @Loop

mov Byte Ptr [edi],0

Hex ENDP
Kenavo (Bye)
----------------------
Help me if you can, I'm feeling down...

jj2007

Works fine:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3371    cycles for 100 * dw2hex
3589    cycles for 100 * Hex$ Grincheux
42320   cycles for 100 * CRT sprintf
526     cycles for 100 * Bin2Hex
605     cycles for 100 * Bin2Hex2 cx
1321    cycles for 100 * Bin2Hex6
1626    cycles for 100 * FastHex

3375    cycles for 100 * dw2hex
3518    cycles for 100 * Hex$ Grincheux
42303   cycles for 100 * CRT sprintf
524     cycles for 100 * Bin2Hex
605     cycles for 100 * Bin2Hex2 cx
1321    cycles for 100 * Bin2Hex6
1626    cycles for 100 * FastHex

Grincheux

JJ you suprise me, I am very very slow. I will remake it. How do you compute the cycles? It is very interesting.
Kenavo (Bye)
----------------------
Help me if you can, I'm feeling down...

jj2007

Quote from: Grincheux on December 07, 2015, 01:36:26 PM
JJ you suprise me, I am very very slow. I will remake it. How do you compute the cycles? It is very interesting.

Philippe,

Your solution is not slow, it's on par with the standard Masm32 algo, and a factor 12 faster than the C runtime library.
Bin2Hex is even faster because it's a table-based solution.

Re cycles, it's complicated but public: Open the source in WordPad or RichMasm and search for counter_begin - a macro:
include \masm32\macros\timers.asm      ; download from the Masm32 Laboratory

dedndave

this is the template i use for use with Michael's timing routines

;###############################################################################################

        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
        .686p
        .MMX
        .XMM
        INCLUDE    \Masm32\Macros\Timers.asm
        .List

;###############################################################################################

Loop_Count = 10000         ;adjust so that each pass is roughly 0.5 seconds or longer

;###############################################################################################

        .DATA

;***********************************************************************************************

        .DATA?

;###############################################################################################

        .CODE

;***********************************************************************************************

main    PROC

        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,750

        mov     ecx,5

Loop00: push    ecx

        counter_begin Loop_Count,HIGH_PRIORITY_CLASS

;code to time goes here

        counter_end

        print   str$(eax),32
        pop     ecx
        dec     ecx
        jnz     Loop00

        print   chr$(13,10)
        inkey
        INVOKE  ExitProcess,0

main    ENDP

;###############################################################################################

        END     main

hutch--

See if this has got any legs.

In the .DATA section.

    .data
    align 16
      hex_table2 \
        db "000102030405060708090A0B0C0D0E0F"
        db "101112131415161718191A1B1C1D1E1F"
        db "202122232425262728292A2B2C2D2E2F"
        db "303132333435363738393A3B3C3D3E3F"
        db "404142434445464748494A4B4C4D4E4F"
        db "505152535455565758595A5B5C5D5E5F"
        db "606162636465666768696A6B6C6D6E6F"
        db "707172737475767778797A7B7C7D7E7F"
        db "808182838485868788898A8B8C8D8E8F"
        db "909192939495969798999A9B9C9D9E9F"
        db "A0A1A2A3A4A5A6A7A8A9AAABACADAEAF"
        db "B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF"
        db "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
        db "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
        db "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
        db "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF"


The algorithm.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

utoh proc valu:DWORD,pbuf:DWORD

  ; -----------------------------------------------
  ; convert an unsigned DWORD value to a hex string
  ; -----------------------------------------------

    push ebx

    mov eax, [esp+8][4]                     ; pbuf
    mov ebx, [esp+4][4]                     ; valu

    movzx ecx, bh                           ; 3rd byte
    mov dx, WORD PTR [ecx*2+hex_table2]
    mov [eax+4], dx

    movzx ecx, bl                           ; 4th byte
    mov dx, WORD PTR [ecx*2+hex_table2]
    mov [eax+6], dx

    bswap ebx

    movzx ecx, bl                           ; 1st byte
    mov dx, WORD PTR [ecx*2+hex_table2]
    mov [eax], dx

    movzx ecx, bh                           ; 2nd byte
    mov dx, WORD PTR [ecx*2+hex_table2]
    mov [eax+2], dx

    mov DWORD PTR [eax+8], 0                ; terminate buffer

    pop ebx

    ret 8

utoh endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤


Note that the buffer should be 12 bytes long.

jj2007

Quote from: hutch-- on December 08, 2015, 09:34:21 AM
See if this has got any legs.

Quite an improvement on the standard Masm32 algo, and a factor 43 faster than CRT ;-)

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4342    cycles for 100 * dw2hex
1230    cycles for 100 * utoh (Hutch)
52286   cycles for 100 * CRT sprintf
667     cycles for 100 * Bin2Hex
764     cycles for 100 * Bin2Hex2 cx
765     cycles for 100 * Bin2Hex3 ecx
1645    cycles for 100 * Bin2Hex6
1716    cycles for 100 * FastHex

4121    cycles for 100 * dw2hex
1233    cycles for 100 * utoh (Hutch)
52256   cycles for 100 * CRT sprintf
665     cycles for 100 * Bin2Hex
766     cycles for 100 * Bin2Hex2 cx
763     cycles for 100 * Bin2Hex3 ecx
1661    cycles for 100 * Bin2Hex6
2029    cycles for 100 * FastHex

4122    cycles for 100 * dw2hex
1221    cycles for 100 * utoh (Hutch)
52273   cycles for 100 * CRT sprintf
671     cycles for 100 * Bin2Hex
766     cycles for 100 * Bin2Hex2 cx
768     cycles for 100 * Bin2Hex3 ecx
1655    cycles for 100 * Bin2Hex6
1848    cycles for 100 * FastHex

hutch--


Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz (SSE4)

3717    cycles for 100 * dw2hex
950     cycles for 100 * utoh (Hutch)
60822   cycles for 100 * CRT sprintf
659     cycles for 100 * Bin2Hex
659     cycles for 100 * Bin2Hex2 cx
659     cycles for 100 * Bin2Hex3 ecx
1515    cycles for 100 * Bin2Hex6
1621    cycles for 100 * FastHex

3716    cycles for 100 * dw2hex
950     cycles for 100 * utoh (Hutch)
60830   cycles for 100 * CRT sprintf
658     cycles for 100 * Bin2Hex
659     cycles for 100 * Bin2Hex2 cx
659     cycles for 100 * Bin2Hex3 ecx
1515    cycles for 100 * Bin2Hex6
1621    cycles for 100 * FastHex

3716    cycles for 100 * dw2hex
950     cycles for 100 * utoh (Hutch)
60818   cycles for 100 * CRT sprintf
658     cycles for 100 * Bin2Hex
659     cycles for 100 * Bin2Hex2 cx
659     cycles for 100 * Bin2Hex3 ecx
1514    cycles for 100 * Bin2Hex6
1621    cycles for 100 * FastHex

00345678        = eax dw2hex
12345678        = eax utoh (Hutch)
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
00345678        = eax Bin2Hex3 ecx
12345678        = eax Bin2Hex6
012345678       = eax FastHex

Siekmanski

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

3855    cycles for 100 * dw2hex
1098    cycles for 100 * utoh (Hutch)
50477   cycles for 100 * CRT sprintf
665     cycles for 100 * Bin2Hex
760     cycles for 100 * Bin2Hex2 cx
767     cycles for 100 * Bin2Hex3 ecx
1550    cycles for 100 * Bin2Hex6
1859    cycles for 100 * FastHex

3848    cycles for 100 * dw2hex
1113    cycles for 100 * utoh (Hutch)
50596   cycles for 100 * CRT sprintf
661     cycles for 100 * Bin2Hex
763     cycles for 100 * Bin2Hex2 cx
759     cycles for 100 * Bin2Hex3 ecx
1557    cycles for 100 * Bin2Hex6
1762    cycles for 100 * FastHex

3849    cycles for 100 * dw2hex
1111    cycles for 100 * utoh (Hutch)
50491   cycles for 100 * CRT sprintf
661     cycles for 100 * Bin2Hex
760     cycles for 100 * Bin2Hex2 cx
766     cycles for 100 * Bin2Hex3 ecx
1536    cycles for 100 * Bin2Hex6
1962    cycles for 100 * FastHex

00345678        = eax dw2hex
12345678        = eax utoh (Hutch)
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
00345678        = eax Bin2Hex3 ecx
12345678        = eax Bin2Hex6
012345678       = eax FastHex

--- ok ---
Creative coders use backward thinking techniques as a strategy.

hutch--

There was not much to tweak on this algo but I reused ECX, freed up EDX and removed the push / pop of EBX and with a fall through algo like this the times did improve. I don't have the right stuff set up to use JJ's testbed so I could not test it against the other reference algos.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

align 16

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

utoh proc valu:DWORD,pbuf:DWORD

  ; -----------------------------------------------
  ; convert an unsigned DWORD value to a hex string
  ; the "pbuf" address is also returned in EAX
  ; -----------------------------------------------

    mov edx, [esp+4]                        ; valu
    mov eax, [esp+8]                        ; pbuf

    movzx ecx, dl                           ; 4th byte
    mov cx, WORD PTR [ecx*2+hex_table2]
    mov [eax+6], cx

    movzx ecx, dh                           ; 3rd byte
    mov cx, WORD PTR [ecx*2+hex_table2]
    mov [eax+4], cx

    bswap edx

    movzx ecx, dl                           ; 1st byte
    mov cx, WORD PTR [ecx*2+hex_table2]
    mov [eax], cx

    movzx ecx, dh                           ; 2nd byte
    mov cx, WORD PTR [ecx*2+hex_table2]
    mov [eax+2], cx

    mov DWORD PTR [eax+8], 0                ; terminate buffer

    ret 8

utoh endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

jj2007

1183->1061 :t
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4119    cycles for 100 * dw2hex
1061    cycles for 100 * utoh (Hutch)
52100   cycles for 100 * CRT sprintf
659     cycles for 100 * Bin2Hex
758     cycles for 100 * Bin2Hex2 cx
756     cycles for 100 * Bin2Hex3 ecx
1646    cycles for 100 * Bin2Hex6
1750    cycles for 100 * FastHex

hutch--

I did a couple of other variants, put it in a rough benchmark and zipped it. The FASTCALL variant is clearly faster than the others as its free of stack overhead.


-------------
test accuracy
-------------
12345678 FASTCALL
12345678 NO STACK FRAME
12345678 NO STACK FRAME VARIANT
12345678 WITH STACK FRAME
------------
algo timings
------------
358 FASTCALL
577 NO STACK FRAME
608 NO STACK FRAME VARIANT
639 WITH STACK FRAME
Press any key to continue ...

jj2007

Quote from: hutch-- on December 09, 2015, 09:08:03 AMThe FASTCALL variant is clearly faster than the others as its free of stack overhead.

Included:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

4114    cycles for 100 * dw2hex
670     cycles for 100 * utoh (Hutch)
52229   cycles for 100 * CRT sprintf
661     cycles for 100 * Bin2Hex
761     cycles for 100 * Bin2Hex2 cx
759     cycles for 100 * Bin2Hex3 ecx
1650    cycles for 100 * Bin2Hex6
1674    cycles for 100 * FastHex


Bin2Hex preserves ecx (MasmBasic ABI ;)), but that has no influence on the timings.

For comparison:

Hutch:
  ; value to convert passed in ECX
  ; output buffer address passed in EAX
  ; -----------------------------------

    movzx edx, cl                           ; 4th byte
    mov dx, WORD PTR [edx*2+hex_table2]
    mov [eax+6], dx

    movzx edx, ch                           ; 3rd byte
    mov dx, WORD PTR [edx*2+hex_table2]
    mov [eax+4], dx

    bswap ecx

    movzx edx, cl                           ; 1st byte
    mov dx, WORD PTR [edx*2+hex_table2]
    mov [eax], dx

    movzx edx, ch                           ; 2nd byte
    mov dx, WORD PTR [edx*2+hex_table2]
    mov [eax+2], dx

    mov DWORD PTR [eax+8], 0                ; terminate buffer

    ret



JJ:
Bin2Hex proc    ; value passed in eax, pBuffer returned in eax
mov edx, offset xHex
cmp dword ptr [edx], 0
je crtHexTable
push ecx
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+6], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+4], cx
shr eax, 16
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+2], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+0], cx
lea eax, [edx+512] ; return pBuffer
pop ecx
ret


Hutch's version needs 600 bytes, mine 138; reason is simply that mine creates the table once. Note that the currently attached version does not require MasmBasic.

hutch--

Thanks for adding that, I had no way to compare these algos to the ones in your benchmark.


Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz (SSE4)

3721    cycles for 100 * dw2hex
569     cycles for 100 * utoh (Hutch)
60842   cycles for 100 * CRT sprintf
667     cycles for 100 * Bin2Hex
690     cycles for 100 * Bin2Hex2 cx
665     cycles for 100 * Bin2Hex3 ecx
1543    cycles for 100 * Bin2Hex6
1629    cycles for 100 * FastHex

3722    cycles for 100 * dw2hex
569     cycles for 100 * utoh (Hutch)
61080   cycles for 100 * CRT sprintf
664     cycles for 100 * Bin2Hex
662     cycles for 100 * Bin2Hex2 cx
665     cycles for 100 * Bin2Hex3 ecx
1521    cycles for 100 * Bin2Hex6
1627    cycles for 100 * FastHex

3721    cycles for 100 * dw2hex
569     cycles for 100 * utoh (Hutch)
60834   cycles for 100 * CRT sprintf
664     cycles for 100 * Bin2Hex
665     cycles for 100 * Bin2Hex2 cx
665     cycles for 100 * Bin2Hex3 ecx
1521    cycles for 100 * Bin2Hex6
1627    cycles for 100 * FastHex

20      bytes for dw2hex
600     bytes for utoh (Hutch)
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
214     bytes for Bin2Hex3 ecx
616     bytes for Bin2Hex6
66      bytes for FastHex

00345678        = eax dw2hex
12345678        = eax utoh (Hutch)
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
00345678        = eax Bin2Hex3 ecx
12345678        = eax Bin2Hex6
012345678       = eax FastHex

--- ok ---

TWell

AMD Athlon(tm) II X2 220 Processor (SSE3) 2.8 GHz

8781    cycles for 100 * dw2hex
797     cycles for 100 * utoh (Hutch)
78313   cycles for 100 * CRT sprintf
903     cycles for 100 * Bin2Hex
903     cycles for 100 * Bin2Hex2 cx
933     cycles for 100 * Bin2Hex3 ecx
4241    cycles for 100 * Bin2Hex6
2025    cycles for 100 * FastHex

8676    cycles for 100 * dw2hex
797     cycles for 100 * utoh (Hutch)
78244   cycles for 100 * CRT sprintf
908     cycles for 100 * Bin2Hex
922     cycles for 100 * Bin2Hex2 cx
919     cycles for 100 * Bin2Hex3 ecx
4227    cycles for 100 * Bin2Hex6
2034    cycles for 100 * FastHex

8712    cycles for 100 * dw2hex
810     cycles for 100 * utoh (Hutch)
78209   cycles for 100 * CRT sprintf
991     cycles for 100 * Bin2Hex
916     cycles for 100 * Bin2Hex2 cx
925     cycles for 100 * Bin2Hex3 ecx
4480    cycles for 100 * Bin2Hex6
2191    cycles for 100 * FastHex