The MASM Forum

General => The Laboratory => Topic started by: guga on November 27, 2015, 11:16:24 PM

Title: Fast DwordtoHex ?
Post by: guga on November 27, 2015, 11:16:24 PM
Hi guys, i´m facing a small problem

I need a fast dword to hex algorithm that can produces short strings with only one zero at the start.
Example:
Input AAB12F
Output String; 0AAB12F

Input AA
Output String; 0AA

Input FFFFFFFF
Output String; 0FFFFFFFF

etc.

I´m usng Biterider (http://www.asmcommunity.net/forums/topic/?id=17763&page=2) optimized algo (no loops)


Proc dwtohexEx:
    Arguments @dNumber, @pBuffer
    Uses edx, ecx, eax, edi

    mov edx D@dNumber
    mov ecx edx
    shr edx 4
    and edx 0F0F0F0F
    and ecx 0F0F0F0F

    mov eax edx
    mov edi ecx

    add edx (080808080 - 0A0A0A0A) ; Build mask to discern digit > 9
    add ecx (080808080 - 0A0A0A0A)
    shr edx 4
    shr ecx 4
    not edx
    not ecx
    and edx 07070707 ; Mask digit > 9 ... mask = 0111
    and ecx 07070707
    add edx eax ; Add 'A' - '9' if digit > 9
    add ecx edi
    add edx 030303030 ; Add ascii '0'
    add ecx 030303030

    mov edi D@pBuffer ; Using edi is faster
    mov B$edi+7 cl
    mov B$edi+6 dl
    mov B$edi+5 ch
    mov B$edi+4 dh
    shr ecx 16
    shr edx 16
    mov B$edi+3 cl
    mov B$edi+2 dl
    mov B$edi+1 ch
    mov B$edi+0 dh
    mov B$edi+8 0

EndP


But, the result are fixed on a 8 byte lentgh string (Ex: Input 012A, output: 0000012A), instead the short variation as needed. does anyone knows a fast algo that can do this convertion ? (as in the example i posted. So, input 12A, output 012A)
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on November 28, 2015, 01:35:33 AM
Just use a buffer for the biggest possible string length plus 1.
- call the algo
- check the bytes at the start until you find a non-"0"
- put "0" before that byte, and use this as as the start address.
Title: Re: Fast DwordtoHex ?
Post by: fearless on November 28, 2015, 03:22:21 AM
Havn't tested it, and prob could be optimized but based on JJs suggestion this is something id write to format the hex string, hopefully that helps.

FormatHexString PROTO :DWORD, :DWORD

;-------------------------------------------------------------------------------
; Formats hex string so that it start with one zero
; Input: lpszHexString
; Output: lpszFormattedString
;
; make sure buffer pointed to at lpszFormattedString is large enough
;
;
; Exmaple: Invoke FormatHexString, Addr szHEX, Addr szMyNewHexString
;
;-------------------------------------------------------------------------------
FormatHexString PROC USES EDI ESI lpszHexString:DWORD, lpszFormattedString:DWORD
    LOCAL Position:DWORD
    LOCAL FlagFoundHexChars:DWORD
   
    Invoke szLen, lpszHexString
    mov nMaxLen, eax
   
    mov edi, lpszFormattedString
    mov esi, lpszHexString
   
    mov byte ptr [edi], '0' ; start our formatted string with a ascii zero character
    inc edi

    mov FlagFoundHexChars, FALSE ; set flag to false initially
    mov Position, 0
    mov eax, 0
    .WHILE eax < nMaxLen
       
        .IF FlagFoundHexChars == FALSE
            movzx eax, byte ptr [esi]
            .IF al != '0' ; ascii zero
                mov FlagFoundHexChars, TRUE ; looks like we found some ascii chars
                mov byte ptr [edi], al ; so start storing the first of them into our formatted string (edi)
                inc edi ; position for next char when we loop next and we branch to next bit below till end of string
            .ENDIF
        .ELSE ; we have a flag set, so we fetch rest of characters in string till we hit end of string or a null char
            movzx eax, byte ptr [esi]
            .IF al != 0 ; null
                mov byte ptr [edi], al ; start storing next byte in formatted string (edi)
                inc edi ; position for next char when we loop next
            .ELSE
                .BREAK ; break if null found
            .ENDIF
        .ENDIF

        inc esi
        inc Position
        mov eax, Position
    .ENDW
   
    mov byte ptr [edi], 0 ; final null of formatted string

    ret
   
FormatHexString ENDP
Title: Re: Fast DwordtoHex ?
Post by: dedndave on November 28, 2015, 04:13:34 AM
adding the 0 is simple enough
for converting to hex, a look-up table is likely the fastest
i would think a 512-byte table would work well

if speed isn't that important, but you want UNICODE aware....

awDw2Hex PROC USES EDI dwVal:DWORD,lpBuf:LPSTR

;UNICODE aware Dword to Hex - DednDave, 3-2013

;  Returns: EAX = pointer to string buffer
;           ECX = length in characters (8)
;           EDX = original binary value

;the buffer must be large enough for at least 9 TCHAR's (includes null terminator)

;--------------------------------------------

    mov     ecx,8
    mov     edx,dwVal
    mov     edi,lpBuf
    push    ecx
    push    edx
    push    edi
    IFDEF __UNICODE__
        add     edi,16
        mov word ptr [edi],0
    ELSE
        add     edi,ecx
        mov byte ptr [edi],0
    ENDIF
    .repeat
        mov     eax,edx
        IFDEF __UNICODE__
            sub     edi,2
        ELSE
            dec     edi
        ENDIF
        and     eax,0Fh
        shr     edx,4
        cmp     al,0Ah
        sbb     al,69h
        das
        dec     ecx
        IFDEF __UNICODE__
            mov     [edi],ax
        ELSE
            mov     [edi],al
        ENDIF
    .until ZERO?
    pop     eax
    pop     edx
    pop     ecx
    ret

awDw2Hex ENDP


just modify that to add a 0, as required
Title: Re: Fast DwordtoHex ?
Post by: guga on November 28, 2015, 08:08:17 AM
Thanks, guys...

I´ll give a try. The most important for me is speed.(That´s why i used bitraider´s algo), but i needed to create a "short" output, and not the whole 8 bytes long string.

Perhaps using a bswap at the beginning and doing what JJ suggested ?

I´ll give a try and test all of the algos to check for speed.
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on November 28, 2015, 08:30:31 AM
Check
\Masm32\m32lib\dw2hex.asm
\Masm32\m32lib\dw2h_ex.asm

P.S.: I have hacked together an algo using a table, here are some results.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

4144    cycles for 100 * dw2hex
7838    cycles for 100 * MB Hex$
51698   cycles for 100 * CRT sprintf
1066    cycles for 100 * Bin2Hex
765     cycles for 100 * Bin2Hex2

4143    cycles for 100 * dw2hex
7854    cycles for 100 * MB Hex$
51713   cycles for 100 * CRT sprintf
1066    cycles for 100 * Bin2Hex
765     cycles for 100 * Bin2Hex2

4141    cycles for 100 * dw2hex
7848    cycles for 100 * MB Hex$
51750   cycles for 100 * CRT sprintf
1066    cycles for 100 * Bin2Hex
765     cycles for 100 * Bin2Hex2

4145    cycles for 100 * dw2hex
7884    cycles for 100 * MB Hex$
51708   cycles for 100 * CRT sprintf
1065    cycles for 100 * Bin2Hex
764     cycles for 100 * Bin2Hex2

20      bytes for dw2hex
17      bytes for MB Hex$
29      bytes for CRT sprintf
225     bytes for Bin2Hex
150     bytes for Bin2Hex2

00345678        = eax dw2hex
00345678        = eax MB Hex$
345678  = eax CRT sprintf
345678  = eax Bin2Hex
00345678        = eax Bin2Hex2


As you can see, both CRT sprintf and the first variant of my algo can handle the short form.
Title: Re: Fast DwordtoHex ?
Post by: guga on November 29, 2015, 04:08:14 AM
Hi JJ

I analysed it and i´m trying to gain a bit more speed on dw2hex.asm and dw2hex_ex.asm

On my tests i made a faster variation of the dw2hex using a fixed table (instead of computing it as in your example)

Can you please tests it to check if it is really that fast ? On mine tests it s half of teh speed of dw2hex and 10-18% faster then Bin2Hex2 of yours.

The variation i made was:

[hex_table: B$ "000102030405060708090A0B0C0D0E0F"
            B$ "101112131415161718191A1B1C1D1E1F"
            B$ "202122232425262728292A2B2C2D2E2F"
            B$ "303132333435363738393A3B3C3D3E3F"
            B$ "404142434445464748494A4B4C4D4E4F"
            B$ "505152535455565758595A5B5C5D5E5F"
            B$ "606162636465666768696A6B6C6D6E6F"
            B$ "707172737475767778797A7B7C7D7E7F"
            B$ "808182838485868788898A8B8C8D8E8F"
            B$ "909192939495969798999A9B9C9D9E9F"
            B$ "A0A1A2A3A4A5A6A7A8A9AAABACADAEAF"
            B$ "B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF"
            B$ "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
            B$ "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
            B$ "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
            B$ "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF", 0]

Proc Bin2Hex6:
    Arguments @Input, @Output
    Local @DwordStorage
    Uses eax, edi ; preserves eax and edi on output

    mov eax D@Input
    mov edi D@Output

    mov D@DwordStorage eax
    movzx eax B@DwordStorage+3 | mov ax W$hex_table+eax*2 | stosw ; |  mov W$edi ax . Using stosw is faster then mov W$ on a I7
    movzx eax B@DwordStorage+2 | mov ax W$hex_table+eax*2 | stosw ; | mov W$edi+2 ax. Using stosw is faster then mov W$ on a I7
    movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw ; | mov W$edi+4 ax. Using stosw is faster then mov W$ on a I7
    movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw ; | mov W$edi+6 ax. Using stosw is faster then mov W$ on a I7
    mov B$edi 0

EndP


The above version does not produce the shorter string. I´m trying to speed it up 1st, because if i use "repeat+until" macros the code will slow down due to the loop.

I´ll try replacing it with a test opcode to see if it can speed it up a bit
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on November 29, 2015, 04:21:01 AM
Hi Guga,

Of course, my algo uses the same identical table. If you have an algo that uses this table, please post it (in Masm syntax), and I will add it to the testbed.

Btw if Repeat ... Until loops slow down your code, check what RosAsm generates. In Masm, the macro produces the fastest possible version.
Title: Re: Fast DwordtoHex ?
Post by: dedndave on November 29, 2015, 05:24:59 AM
i haven't looked at the current algorithms

but - if you can write the routine so the address is returned in EAX,
rather than left-justifying the string in a fixed buffer, it should help speed it up
Title: Re: Fast DwordtoHex ?
Post by: dedndave on November 29, 2015, 05:45:56 AM
do you want a leading 0 when the first character is less than A ?
Title: Re: Fast DwordtoHex ?
Post by: guga on November 29, 2015, 06:28:08 AM
Hi dave, yes. I need a leading 0 on all values starting from 0 to F. Ex: 00 , 01234A, 023, 0FFFF etc
Title: Re: Fast DwordtoHex ?
Post by: guga on November 29, 2015, 06:31:07 AM
JJ. i`ll try porting it to masm syntax for you test it
Title: Re: Fast DwordtoHex ?
Post by: guga on November 29, 2015, 07:19:34 AM
Hi JJ, here is the masm syntax



hex_table       db '000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F2'
db '02122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F40'
db '4142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F606'
db '162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F8081'
db '82838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A'
db '2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2'
db 'C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E'
db '3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF',0

; =============== S U B R O U T I N E =======================================

; Attributes: bp-based frame

Bin2Hex6 proc near

DwordStorage = dword ptr -4
Input = dword ptr  8
Output = dword ptr  0Ch

push ebp
mov ebp, esp
sub esp, 4
push eax
push edi
mov eax, [ebp+Input]
mov edi, [ebp+Output]
mov [ebp+DwordStorage], eax
movzx eax, byte ptr [ebp+DwordStorage+3]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [ebp+DwordStorage+2]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [ebp+DwordStorage+1]
mov ax, word ptr hex_table[eax*2]
stosw
movzx eax, byte ptr [ebp+DwordStorage]
mov ax, word ptr hex_table[eax*2]
stosw
mov byte ptr [edi], 0
pop edi
pop eax
mov esp, ebp
pop ebp
retn 8
Bin2Hex6 endp


Title: Re: Fast DwordtoHex ?
Post by: jj2007 on November 29, 2015, 08:03:43 AM
Congrats, Guga, it's much faster than the standard Masm32 algo:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

4120    cycles for 100 * dw2hex
7851    cycles for 100 * MB Hex$
52652   cycles for 100 * CRT sprintf
1071    cycles for 100 * Bin2Hex
772     cycles for 100 * Bin2Hex2 cx
1656    cycles for 100 * Bin2Hex6

4115    cycles for 100 * dw2hex
7832    cycles for 100 * MB Hex$
52043   cycles for 100 * CRT sprintf
1072    cycles for 100 * Bin2Hex
771     cycles for 100 * Bin2Hex2 cx
1660    cycles for 100 * Bin2Hex6

4115    cycles for 100 * dw2hex
7820    cycles for 100 * MB Hex$
52089   cycles for 100 * CRT sprintf
1069    cycles for 100 * Bin2Hex
771     cycles for 100 * Bin2Hex2 cx
1658    cycles for 100 * Bin2Hex6

4115    cycles for 100 * dw2hex
7819    cycles for 100 * MB Hex$
52067   cycles for 100 * CRT sprintf
1070    cycles for 100 * Bin2Hex
773     cycles for 100 * Bin2Hex2 cx
1662    cycles for 100 * Bin2Hex6

20      bytes for dw2hex
17      bytes for MB Hex$
29      bytes for CRT sprintf
225     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
616     bytes for Bin2Hex6

00345678        = eax dw2hex
00345678        = eax MB Hex$
345678  = eax CRT sprintf
345678  = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax Bin2Hex6
Title: Re: Fast DwordtoHex ?
Post by: guga on November 29, 2015, 12:01:19 PM
Thanks...can you tests it preserving the registers of the other algos ? I would like to compare the true speed.
For example, my version saves the used registers (eax and edx), so to make it While trhe orther versiosn does not saves anything. I would like to test the functions as their same functionality to compare the speeds.

For example, when i use your version of Bin2Hex3  and mine is still fast. I don´t understand the differents speeds.

For the benachmark tests i´m uysing teh gui version that Steve made. The one that uses GetTickCount and SleepEx apis as part of the calibration algo.
Title: Re: Fast DwordtoHex ?
Post by: guga on November 29, 2015, 12:11:56 PM
Btw, dave and guys, i suceed to make the shorter version on output. And the speed was kept intact on my tests. It seems fast.


[hex_table: B$ "000102030405060708090A0B0C0D0E0F"
            B$ "101112131415161718191A1B1C1D1E1F"
            B$ "202122232425262728292A2B2C2D2E2F"
            B$ "303132333435363738393A3B3C3D3E3F"
            B$ "404142434445464748494A4B4C4D4E4F"
            B$ "505152535455565758595A5B5C5D5E5F"
            B$ "606162636465666768696A6B6C6D6E6F"
            B$ "707172737475767778797A7B7C7D7E7F"
            B$ "808182838485868788898A8B8C8D8E8F"
            B$ "909192939495969798999A9B9C9D9E9F"
            B$ "A0A1A2A3A4A5A6A7A8A9AAABACADAEAF"
            B$ "B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF"
            B$ "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
            B$ "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
            B$ "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
            B$ "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF", 0]

Proc Bin2Hex7:
    Arguments @Input, @Output
    Local @DwordStorage
    Uses eax, edi, ecx

    mov eax D@Input
    mov edi D@Output
    mov D@DwordStorage eax
    mov B$edi '0' | inc edi

    movzx eax B@DwordStorage+3
    Test_If eax eax
        On ax <= 0F, dec edi
        mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+2 | mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
        mov B$edi 0
        ExitP
    Test_End

    movzx eax B@DwordStorage+2
    Test_If eax eax
        On ax <= 0F, dec edi
        mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
        mov B$edi 0
        ExitP
    Test_End


    movzx eax B@DwordStorage+1
    On ax <= 0F, dec edi
    Test_If eax eax
        mov ax W$hex_table+eax*2 | stosw
    Test_End

    movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
    mov B$edi 0

EndP
Title: Re: Fast DwordtoHex ?
Post by: dedndave on November 29, 2015, 01:11:41 PM
add this one to your tests   :P

0
01
012
0123
01234
012345
0123456
01234567
012345678

Press any key to continue ...
Title: Re: Fast DwordtoHex ?
Post by: guga on November 29, 2015, 04:41:51 PM
Thanks dave

It is quite close. The problem is that on eax it returns the original output variable is being forwarded, which results on zero bytes on the beginning if the  input is short. For example:
[OutputBuff: B$ 0 #12] ; 12 bytes long

call FastHex 01000, OutputBuff
eax = 0 0 0 0 0 + "01000" decimal strings. 5 leading zeros ate the start folloed by the converted data

Although the result is correct this may cause problems if the output is part of a string chain. For example

[OutputBuff: B$ "Test" 0#256]

mov edi OutputBuff
add edi 4 ;
call FastHex 01000, edi

The result will be:
[OutputBuff: B$ "Test" 0 0 0 0 0
                     B$ "01000"
                     B$ 0....]

instead of
[OutputBuff: B$ "Test01000"
                     B$ 0....]

Concerning the speed i made a couple of tests, here is the result:

(http://i67.tinypic.com/novhty.jpg)


Your code ported to RosAsm to it behave the same as the one i´m testing is:
Both preserves the registers they use internally (with the macro "uses" . Which is a simple push/pop operation). I´m testing this to be sure about the  speed of all functions working on the same conditions. The main difference that i´ll try is make yours output on eax the lenght of the converted data, to make sure both functions behaves and works exactly the same, so i can have a better idea in terms of speed.


Proc FastHex:
    Arguments @Input, @Output
    Uses ecx, edx

    mov eax D@Output | add eax 8
    mov ecx D@Input
    test ecx ecx
    mov D$eax 03030 | je P2>
   
FHex00: M6:
    movzx edx cl
    mov dx W$edx*2+hex_table
    mov W$eax  dx
    sub eax 02
    shr ecx 08 | jne M6<
    inc eax
    mov B$eax  030
   
FHex01: P2:
    cmp B$eax  030
    lea eax D$eax+01 | je P2<
    sub eax 02
EndP


Btw, i updated mine version


Proc dwtoHex_Ex2:
    Arguments @Input, @Output
    Local @DwordStorage
    Uses edi

    mov eax D@Input
    mov edi D@Output
    mov D@DwordStorage eax
    mov B$edi '0' | inc edi

    movzx eax B@DwordStorage+3
    Test_If eax eax
        On ax <= 0F, dec edi
        mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+2 | mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
        mov B$edi 0
        sub edi D@Output | mov eax edi
        ExitP
    Test_End

    movzx eax B@DwordStorage+2
    Test_If eax eax
        On ax <= 0F, dec edi
        mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+1 | mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+0 | mov ax W$hex_table+eax*2 | stosw
        mov B$edi 0
        sub edi D@Output | mov eax edi
        ExitP
    Test_End

    movzx eax B@DwordStorage+1
    Test_If eax eax
        On ax <= 0F, dec edi
        mov ax W$hex_table+eax*2 | stosw
        movzx eax B@DwordStorage+0
    Test_Else
        movzx eax B@DwordStorage+0
        On ax <= 0F, dec edi
    Test_End

    mov ax W$hex_table+eax*2 | stosw
    mov B$edi 0
    sub edi D@Output | mov eax edi

EndP

Title: Re: Fast DwordtoHex ?
Post by: jj2007 on November 29, 2015, 07:23:51 PM
Quote from: dedndave on November 29, 2015, 01:11:41 PM
add this one to your tests   :P

It gets a bit crowded now :biggrin:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

4166    cycles for 100 * dw2hex
7871    cycles for 100 * MB Hex$
52117   cycles for 100 * CRT sprintf
660     cycles for 100 * Bin2Hex
760     cycles for 100 * Bin2Hex2 cx
1635    cycles for 100 * Bin2Hex6
1981    cycles for 100 * FastHex

4217    cycles for 100 * dw2hex
7799    cycles for 100 * MB Hex$
52408   cycles for 100 * CRT sprintf
659     cycles for 100 * Bin2Hex
759     cycles for 100 * Bin2Hex2 cx
1645    cycles for 100 * Bin2Hex6
1763    cycles for 100 * FastHex

4180    cycles for 100 * dw2hex
7841    cycles for 100 * MB Hex$
52083   cycles for 100 * CRT sprintf
658     cycles for 100 * Bin2Hex
778     cycles for 100 * Bin2Hex2 cx
1656    cycles for 100 * Bin2Hex6
1904    cycles for 100 * FastHex

4214    cycles for 100 * dw2hex
7866    cycles for 100 * MB Hex$
52062   cycles for 100 * CRT sprintf
660     cycles for 100 * Bin2Hex
757     cycles for 100 * Bin2Hex2 cx
1647    cycles for 100 * Bin2Hex6
1995    cycles for 100 * FastHex

20      bytes for dw2hex
17      bytes for MB Hex$
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
616     bytes for Bin2Hex6
66      bytes for FastHex

00345678        = eax dw2hex
00345678        = eax MB Hex$
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax Bin2Hex6
012345678       = eax FastHex
Title: Re: Fast DwordtoHex ?
Post by: sinsi on November 29, 2015, 10:35:28 PM

AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G (SSE4)
11919   cycles for 100 * dw2hex
10902   cycles for 100 * MB Hex$
54302   cycles for 100 * CRT sprintf
742     cycles for 100 * Bin2Hex
932     cycles for 100 * Bin2Hex2 cx
2277    cycles for 100 * Bin2Hex6
2171    cycles for 100 * FastHex

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (SSE4)
3566    cycles for 100 * dw2hex
7054    cycles for 100 * MB Hex$
53879   cycles for 100 * CRT sprintf
589     cycles for 100 * Bin2Hex
722     cycles for 100 * Bin2Hex2 cx
1430    cycles for 100 * Bin2Hex6
1497    cycles for 100 * FastHex

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz (SSE4)
3077    cycles for 100 * dw2hex
5258    cycles for 100 * MB Hex$
44338   cycles for 100 * CRT sprintf
526     cycles for 100 * Bin2Hex
521     cycles for 100 * Bin2Hex2 cx
1157    cycles for 100 * Bin2Hex6
1261    cycles for 100 * FastHex

Title: Re: Fast DwordtoHex ?
Post by: TWell on November 29, 2015, 10:44:50 PM
AMD Athlon(tm) II X2 220 Processor (SSE3)

8622    cycles for 100 * dw2hex
8122    cycles for 100 * MB Hex$
78598   cycles for 100 * CRT sprintf
902     cycles for 100 * Bin2Hex
901     cycles for 100 * Bin2Hex2 cx
4208    cycles for 100 * Bin2Hex6
2019    cycles for 100 * FastHex
Title: Re: Fast DwordtoHex ?
Post by: Siekmanski on November 29, 2015, 10:50:23 PM
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

3956    cycles for 100 * dw2hex
7890    cycles for 100 * MB Hex$
50529   cycles for 100 * CRT sprintf
665     cycles for 100 * Bin2Hex
760     cycles for 100 * Bin2Hex2 cx
1550    cycles for 100 * Bin2Hex6
1952    cycles for 100 * FastHex

3955    cycles for 100 * dw2hex
7845    cycles for 100 * MB Hex$
50451   cycles for 100 * CRT sprintf
663     cycles for 100 * Bin2Hex
760     cycles for 100 * Bin2Hex2 cx
1553    cycles for 100 * Bin2Hex6
1974    cycles for 100 * FastHex

3953    cycles for 100 * dw2hex
7889    cycles for 100 * MB Hex$
50417   cycles for 100 * CRT sprintf
662     cycles for 100 * Bin2Hex
762     cycles for 100 * Bin2Hex2 cx
1561    cycles for 100 * Bin2Hex6
2015    cycles for 100 * FastHex

3933    cycles for 100 * dw2hex
7851    cycles for 100 * MB Hex$
50340   cycles for 100 * CRT sprintf
661     cycles for 100 * Bin2Hex
766     cycles for 100 * Bin2Hex2 cx
1584    cycles for 100 * Bin2Hex6
1967    cycles for 100 * FastHex

20      bytes for dw2hex
17      bytes for MB Hex$
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
616     bytes for Bin2Hex6
66      bytes for FastHex

00345678        = eax dw2hex
00345678        = eax MB Hex$
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax Bin2Hex6
012345678       = eax FastHex
Title: Re: Fast DwordtoHex ?
Post by: guga on November 30, 2015, 01:29:50 AM
JJ, i´m trying to make the testing fucntions behave the same (I mean, all inside a regular proc, instead a void function ), but im having probklems wuth the syntax in masm.

I rebuild the function as:



Bin2Hex6 proc Input:DWORD, Output:DWORD
Local DwordStorage:DWORD

  push eax
  push edi

  mov eax, Input
  mov edi, Output
  mov DwordStorage, eax
  movzx eax, byte ptr [DwordStorage+3]
  mov ax, word ptr hex_table[eax*2]
  stosw
  movzx eax, byte ptr [DwordStorage+2]
  mov ax, word ptr hex_table[eax*2]
  stosw
  movzx eax, byte ptr [DwordStorage+1]
  mov ax, word ptr hex_table[eax*2]
  stosw
  movzx eax, byte ptr [DwordStorage]
  mov ax, word ptr hex_table[eax*2]
  stosw
  mov byte ptr [edi], 0

  pop edi
  pop eax

Bin2Hex6 endp

NameG equ <Bin2Hex6> ; assign a descriptive name here
TestG proc
  mov ebx, AlgoLoops-1 ; loop e.g. 100x
  align 4
  .Repeat
;push offset somestring
;push 12345678h
call Bin2Hex6 12345678h, offset somestring
dec ebx
  .Until Sign?
  mov eax, offset somestring
  ret
TestG endp


But, why masm can´t assembled it ? It says it have a symbol redefinton. Is this the proper syntax ?
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on November 30, 2015, 02:09:43 AM
Well... using Input and Output as equates is kind of courageous ;)
Title: Re: Fast DwordtoHex ?
Post by: guga on November 30, 2015, 02:28:00 AM
Equates ? I thought they were arguments of the function  :icon_mrgreen: It´s a long time since i last used masm, but i suceeded to port something more similar to the output results. Dispites the difference of the way the results are built. The timmings are close :)

I hope the syntax is ok now. I rebuilt the function bin2hex to it work as a proc with 2 arguments


Bin2Hex proc near
_Input = dword ptr 8
_Output = dword ptr 0Ch
push ebp
mov ebp, esp
push ecx
push edx
push edi
mov eax, [ebp+_Input]
mov edi, [ebp+_Output]
mov edx, offset hex_table ; "000102030405060708090A0B0C0D0E0F1011121"...
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edi+6], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edi+4], cx
shr eax, 10h
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edi+2], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edi], cx
mov byte ptr [edi+8], 0
lea eax, [edi]
pop edi
pop edx
pop ecx
mov esp, ebp
pop ebp
retn 8
Bin2Hex endp



Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (SSE4)

8867 cycles for 100 * dw2hex
7048 cycles for 100 * MB Hex$
48977 cycles for 100 * CRT sprintf
787 cycles for 100 * Bin2Hex
245 cycles for 100 * Bin2Hex2 cx
1243 cycles for 100 * Bin2Hex6
1679 cycles for 100 * FastHex

4721 cycles for 100 * dw2hex
9396 cycles for 100 * MB Hex$
65490 cycles for 100 * CRT sprintf
1332 cycles for 100 * Bin2Hex
558 cycles for 100 * Bin2Hex2 cx
1598 cycles for 100 * Bin2Hex6
1727 cycles for 100 * FastHex

4414 cycles for 100 * dw2hex
9983 cycles for 100 * MB Hex$
77615 cycles for 100 * CRT sprintf
1574 cycles for 100 * Bin2Hex
682 cycles for 100 * Bin2Hex2 cx
1934 cycles for 100 * Bin2Hex6
1990 cycles for 100 * FastHex

5609 cycles for 100 * dw2hex
11614 cycles for 100 * MB Hex$
81394 cycles for 100 * CRT sprintf
1708 cycles for 100 * Bin2Hex
760 cycles for 100 * Bin2Hex2 cx
1992 cycles for 100 * Bin2Hex6
2185 cycles for 100 * FastHex

20 bytes for dw2hex
17 bytes for MB Hex$
29 bytes for CRT sprintf
139 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
616 bytes for Bin2Hex6
66 bytes for FastHex

00345678 = eax dw2hex
00345678 = eax MB Hex$
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
12345678 = eax Bin2Hex6
012345678 = eax FastHex

--- ok ---



Title: Re: Fast DwordtoHex ?
Post by: jj2007 on November 30, 2015, 05:45:50 AM
Quote from: guga on November 30, 2015, 02:28:00 AM
Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (SSE4)
787 cycles for 100 * Bin2Hex
245 cycles for 100 * Bin2Hex2 cx

Your i7 is cheating, Guga :eusa_naughty:

My code is fast but 2.45 cycles is fake 8)
Title: Re: Fast DwordtoHex ?
Post by: guga on November 30, 2015, 03:58:54 PM
Cheating ?  How is that possible ?
I didn´t touched  Bin2Hex2 cx


(http://i66.tinypic.com/i2tsw2.jpg)
Title: Re: Fast DwordtoHex ?
Post by: dedndave on November 30, 2015, 04:11:54 PM
i suggest you write a little test piece, like i did for FastHex, to verify proper results
if it takes 2 or 3 clock cycles to finish, it's a good bet that it isn't working to begin with
Title: Re: Fast DwordtoHex ?
Post by: guga on November 30, 2015, 04:24:46 PM
There is something very weird. I isolated JJ´s code Bin2hex2 cx to it displays only this algo. (I deleted all others), and i keep having different results whenever i click the app.

To achieve this different results, all i did was, open the app and close it, wait 3 to 5 seconds, open t again+close, and so on.
(http://i65.tinypic.com/2rftvrq.jpg)
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on November 30, 2015, 05:49:45 PM
Hi Guga,

I was just making fun, but this is indeed weird! I thought it was just a strange outlier. These are my very stable results:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
3386    cycles for 100 * dw2hex
6473    cycles for 100 * MB Hex$
42045   cycles for 100 * CRT sprintf
1141    cycles for 100 * Bin2Hex Guga
613     cycles for 100 * Bin2Hex2 cx
1328    cycles for 100 * Bin2Hex6
1348    cycles for 100 * FastHex

3389    cycles for 100 * dw2hex
6370    cycles for 100 * MB Hex$
42029   cycles for 100 * CRT sprintf
1147    cycles for 100 * Bin2Hex Guga
612     cycles for 100 * Bin2Hex2 cx
1329    cycles for 100 * Bin2Hex6
1633    cycles for 100 * FastHex


Try setting AlgoLoops or TimerLoops ten times higher, sometimes this help to stabilise timings.
Title: Re: Fast DwordtoHex ?
Post by: Grincheux on December 07, 2015, 06:21:12 AM
I suggest :


Hex PROC __lpszString:LPSTR,__dwNumber:DWORD

mov edi,__lpszString
mov edx,__dwNumber

@Loop :

mov eax,edx
and eax,0f0000000h
shr eax,28

cmp al,10
jl @NotAlpha

add al,'A' - 10
jmp @Next

@NotAlpha :

add al,'0'
@Next :

mov Byte Ptr [edi],al
add edi,1
shl edx,4
jnz @Loop

mov Byte Ptr [edi],0

Hex ENDP
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on December 07, 2015, 08:33:43 AM
Works fine:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3371    cycles for 100 * dw2hex
3589    cycles for 100 * Hex$ Grincheux
42320   cycles for 100 * CRT sprintf
526     cycles for 100 * Bin2Hex
605     cycles for 100 * Bin2Hex2 cx
1321    cycles for 100 * Bin2Hex6
1626    cycles for 100 * FastHex

3375    cycles for 100 * dw2hex
3518    cycles for 100 * Hex$ Grincheux
42303   cycles for 100 * CRT sprintf
524     cycles for 100 * Bin2Hex
605     cycles for 100 * Bin2Hex2 cx
1321    cycles for 100 * Bin2Hex6
1626    cycles for 100 * FastHex
Title: Re: Fast DwordtoHex ?
Post by: Grincheux on December 07, 2015, 01:36:26 PM
JJ you suprise me, I am very very slow. I will remake it. How do you compute the cycles? It is very interesting.
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on December 07, 2015, 06:41:06 PM
Quote from: Grincheux on December 07, 2015, 01:36:26 PM
JJ you suprise me, I am very very slow. I will remake it. How do you compute the cycles? It is very interesting.

Philippe,

Your solution is not slow, it's on par with the standard Masm32 algo, and a factor 12 faster than the C runtime library.
Bin2Hex is even faster because it's a table-based solution.

Re cycles, it's complicated but public: Open the source in WordPad or RichMasm and search for counter_begin - a macro:
include \masm32\macros\timers.asm      ; download from the Masm32 Laboratory (http://masm32.com/board/index.php?topic=49.0)
Title: Re: Fast DwordtoHex ?
Post by: dedndave on December 08, 2015, 04:50:54 AM
this is the template i use for use with Michael's timing routines

;###############################################################################################

        .XCREF
        .NoList
        INCLUDE    \Masm32\Include\Masm32rt.inc
        .686p
        .MMX
        .XMM
        INCLUDE    \Masm32\Macros\Timers.asm
        .List

;###############################################################################################

Loop_Count = 10000         ;adjust so that each pass is roughly 0.5 seconds or longer

;###############################################################################################

        .DATA

;***********************************************************************************************

        .DATA?

;###############################################################################################

        .CODE

;***********************************************************************************************

main    PROC

        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,750

        mov     ecx,5

Loop00: push    ecx

        counter_begin Loop_Count,HIGH_PRIORITY_CLASS

;code to time goes here

        counter_end

        print   str$(eax),32
        pop     ecx
        dec     ecx
        jnz     Loop00

        print   chr$(13,10)
        inkey
        INVOKE  ExitProcess,0

main    ENDP

;###############################################################################################

        END     main
Title: Re: Fast DwordtoHex ?
Post by: hutch-- on December 08, 2015, 09:34:21 AM
See if this has got any legs.

In the .DATA section.

    .data
    align 16
      hex_table2 \
        db "000102030405060708090A0B0C0D0E0F"
        db "101112131415161718191A1B1C1D1E1F"
        db "202122232425262728292A2B2C2D2E2F"
        db "303132333435363738393A3B3C3D3E3F"
        db "404142434445464748494A4B4C4D4E4F"
        db "505152535455565758595A5B5C5D5E5F"
        db "606162636465666768696A6B6C6D6E6F"
        db "707172737475767778797A7B7C7D7E7F"
        db "808182838485868788898A8B8C8D8E8F"
        db "909192939495969798999A9B9C9D9E9F"
        db "A0A1A2A3A4A5A6A7A8A9AAABACADAEAF"
        db "B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF"
        db "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
        db "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
        db "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
        db "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF"


The algorithm.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

utoh proc valu:DWORD,pbuf:DWORD

  ; -----------------------------------------------
  ; convert an unsigned DWORD value to a hex string
  ; -----------------------------------------------

    push ebx

    mov eax, [esp+8][4]                     ; pbuf
    mov ebx, [esp+4][4]                     ; valu

    movzx ecx, bh                           ; 3rd byte
    mov dx, WORD PTR [ecx*2+hex_table2]
    mov [eax+4], dx

    movzx ecx, bl                           ; 4th byte
    mov dx, WORD PTR [ecx*2+hex_table2]
    mov [eax+6], dx

    bswap ebx

    movzx ecx, bl                           ; 1st byte
    mov dx, WORD PTR [ecx*2+hex_table2]
    mov [eax], dx

    movzx ecx, bh                           ; 2nd byte
    mov dx, WORD PTR [ecx*2+hex_table2]
    mov [eax+2], dx

    mov DWORD PTR [eax+8], 0                ; terminate buffer

    pop ebx

    ret 8

utoh endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤


Note that the buffer should be 12 bytes long.
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on December 08, 2015, 10:32:40 AM
Quote from: hutch-- on December 08, 2015, 09:34:21 AM
See if this has got any legs.

Quite an improvement on the standard Masm32 algo, and a factor 43 faster than CRT ;-)

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4342    cycles for 100 * dw2hex
1230    cycles for 100 * utoh (Hutch)
52286   cycles for 100 * CRT sprintf
667     cycles for 100 * Bin2Hex
764     cycles for 100 * Bin2Hex2 cx
765     cycles for 100 * Bin2Hex3 ecx
1645    cycles for 100 * Bin2Hex6
1716    cycles for 100 * FastHex

4121    cycles for 100 * dw2hex
1233    cycles for 100 * utoh (Hutch)
52256   cycles for 100 * CRT sprintf
665     cycles for 100 * Bin2Hex
766     cycles for 100 * Bin2Hex2 cx
763     cycles for 100 * Bin2Hex3 ecx
1661    cycles for 100 * Bin2Hex6
2029    cycles for 100 * FastHex

4122    cycles for 100 * dw2hex
1221    cycles for 100 * utoh (Hutch)
52273   cycles for 100 * CRT sprintf
671     cycles for 100 * Bin2Hex
766     cycles for 100 * Bin2Hex2 cx
768     cycles for 100 * Bin2Hex3 ecx
1655    cycles for 100 * Bin2Hex6
1848    cycles for 100 * FastHex
Title: Re: Fast DwordtoHex ?
Post by: hutch-- on December 08, 2015, 11:02:16 AM

Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz (SSE4)

3717    cycles for 100 * dw2hex
950     cycles for 100 * utoh (Hutch)
60822   cycles for 100 * CRT sprintf
659     cycles for 100 * Bin2Hex
659     cycles for 100 * Bin2Hex2 cx
659     cycles for 100 * Bin2Hex3 ecx
1515    cycles for 100 * Bin2Hex6
1621    cycles for 100 * FastHex

3716    cycles for 100 * dw2hex
950     cycles for 100 * utoh (Hutch)
60830   cycles for 100 * CRT sprintf
658     cycles for 100 * Bin2Hex
659     cycles for 100 * Bin2Hex2 cx
659     cycles for 100 * Bin2Hex3 ecx
1515    cycles for 100 * Bin2Hex6
1621    cycles for 100 * FastHex

3716    cycles for 100 * dw2hex
950     cycles for 100 * utoh (Hutch)
60818   cycles for 100 * CRT sprintf
658     cycles for 100 * Bin2Hex
659     cycles for 100 * Bin2Hex2 cx
659     cycles for 100 * Bin2Hex3 ecx
1514    cycles for 100 * Bin2Hex6
1621    cycles for 100 * FastHex

00345678        = eax dw2hex
12345678        = eax utoh (Hutch)
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
00345678        = eax Bin2Hex3 ecx
12345678        = eax Bin2Hex6
012345678       = eax FastHex
Title: Re: Fast DwordtoHex ?
Post by: Siekmanski on December 08, 2015, 12:40:31 PM
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (SSE4)

3855    cycles for 100 * dw2hex
1098    cycles for 100 * utoh (Hutch)
50477   cycles for 100 * CRT sprintf
665     cycles for 100 * Bin2Hex
760     cycles for 100 * Bin2Hex2 cx
767     cycles for 100 * Bin2Hex3 ecx
1550    cycles for 100 * Bin2Hex6
1859    cycles for 100 * FastHex

3848    cycles for 100 * dw2hex
1113    cycles for 100 * utoh (Hutch)
50596   cycles for 100 * CRT sprintf
661     cycles for 100 * Bin2Hex
763     cycles for 100 * Bin2Hex2 cx
759     cycles for 100 * Bin2Hex3 ecx
1557    cycles for 100 * Bin2Hex6
1762    cycles for 100 * FastHex

3849    cycles for 100 * dw2hex
1111    cycles for 100 * utoh (Hutch)
50491   cycles for 100 * CRT sprintf
661     cycles for 100 * Bin2Hex
760     cycles for 100 * Bin2Hex2 cx
766     cycles for 100 * Bin2Hex3 ecx
1536    cycles for 100 * Bin2Hex6
1962    cycles for 100 * FastHex

00345678        = eax dw2hex
12345678        = eax utoh (Hutch)
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
00345678        = eax Bin2Hex3 ecx
12345678        = eax Bin2Hex6
012345678       = eax FastHex

--- ok ---
Title: Re: Fast DwordtoHex ?
Post by: hutch-- on December 09, 2015, 05:17:48 AM
There was not much to tweak on this algo but I reused ECX, freed up EDX and removed the push / pop of EBX and with a fall through algo like this the times did improve. I don't have the right stuff set up to use JJ's testbed so I could not test it against the other reference algos.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

align 16

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

utoh proc valu:DWORD,pbuf:DWORD

  ; -----------------------------------------------
  ; convert an unsigned DWORD value to a hex string
  ; the "pbuf" address is also returned in EAX
  ; -----------------------------------------------

    mov edx, [esp+4]                        ; valu
    mov eax, [esp+8]                        ; pbuf

    movzx ecx, dl                           ; 4th byte
    mov cx, WORD PTR [ecx*2+hex_table2]
    mov [eax+6], cx

    movzx ecx, dh                           ; 3rd byte
    mov cx, WORD PTR [ecx*2+hex_table2]
    mov [eax+4], cx

    bswap edx

    movzx ecx, dl                           ; 1st byte
    mov cx, WORD PTR [ecx*2+hex_table2]
    mov [eax], cx

    movzx ecx, dh                           ; 2nd byte
    mov cx, WORD PTR [ecx*2+hex_table2]
    mov [eax+2], cx

    mov DWORD PTR [eax+8], 0                ; terminate buffer

    ret 8

utoh endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on December 09, 2015, 05:26:41 AM
1183->1061 :t
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
4119    cycles for 100 * dw2hex
1061    cycles for 100 * utoh (Hutch)
52100   cycles for 100 * CRT sprintf
659     cycles for 100 * Bin2Hex
758     cycles for 100 * Bin2Hex2 cx
756     cycles for 100 * Bin2Hex3 ecx
1646    cycles for 100 * Bin2Hex6
1750    cycles for 100 * FastHex
Title: Re: Fast DwordtoHex ?
Post by: hutch-- on December 09, 2015, 09:08:03 AM
I did a couple of other variants, put it in a rough benchmark and zipped it. The FASTCALL variant is clearly faster than the others as its free of stack overhead.


-------------
test accuracy
-------------
12345678 FASTCALL
12345678 NO STACK FRAME
12345678 NO STACK FRAME VARIANT
12345678 WITH STACK FRAME
------------
algo timings
------------
358 FASTCALL
577 NO STACK FRAME
608 NO STACK FRAME VARIANT
639 WITH STACK FRAME
Press any key to continue ...
Title: Re: Fast DwordtoHex ?
Post by: jj2007 on December 09, 2015, 01:37:36 PM
Quote from: hutch-- on December 09, 2015, 09:08:03 AMThe FASTCALL variant is clearly faster than the others as its free of stack overhead.

Included:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

4114    cycles for 100 * dw2hex
670     cycles for 100 * utoh (Hutch)
52229   cycles for 100 * CRT sprintf
661     cycles for 100 * Bin2Hex
761     cycles for 100 * Bin2Hex2 cx
759     cycles for 100 * Bin2Hex3 ecx
1650    cycles for 100 * Bin2Hex6
1674    cycles for 100 * FastHex


Bin2Hex preserves ecx (MasmBasic ABI ;)), but that has no influence on the timings.

For comparison:

Hutch:
  ; value to convert passed in ECX
  ; output buffer address passed in EAX
  ; -----------------------------------

    movzx edx, cl                           ; 4th byte
    mov dx, WORD PTR [edx*2+hex_table2]
    mov [eax+6], dx

    movzx edx, ch                           ; 3rd byte
    mov dx, WORD PTR [edx*2+hex_table2]
    mov [eax+4], dx

    bswap ecx

    movzx edx, cl                           ; 1st byte
    mov dx, WORD PTR [edx*2+hex_table2]
    mov [eax], dx

    movzx edx, ch                           ; 2nd byte
    mov dx, WORD PTR [edx*2+hex_table2]
    mov [eax+2], dx

    mov DWORD PTR [eax+8], 0                ; terminate buffer

    ret



JJ:
Bin2Hex proc    ; value passed in eax, pBuffer returned in eax
mov edx, offset xHex
cmp dword ptr [edx], 0
je crtHexTable
push ecx
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+6], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+4], cx
shr eax, 16
movzx ecx, al
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+2], cx
movzx ecx, ah
movzx ecx, word ptr [edx+ecx*2]
mov [edx+512+0], cx
lea eax, [edx+512] ; return pBuffer
pop ecx
ret


Hutch's version needs 600 bytes, mine 138; reason is simply that mine creates the table once. Note that the currently attached version does not require MasmBasic.
Title: Re: Fast DwordtoHex ?
Post by: hutch-- on December 09, 2015, 02:18:13 PM
Thanks for adding that, I had no way to compare these algos to the ones in your benchmark.


Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz (SSE4)

3721    cycles for 100 * dw2hex
569     cycles for 100 * utoh (Hutch)
60842   cycles for 100 * CRT sprintf
667     cycles for 100 * Bin2Hex
690     cycles for 100 * Bin2Hex2 cx
665     cycles for 100 * Bin2Hex3 ecx
1543    cycles for 100 * Bin2Hex6
1629    cycles for 100 * FastHex

3722    cycles for 100 * dw2hex
569     cycles for 100 * utoh (Hutch)
61080   cycles for 100 * CRT sprintf
664     cycles for 100 * Bin2Hex
662     cycles for 100 * Bin2Hex2 cx
665     cycles for 100 * Bin2Hex3 ecx
1521    cycles for 100 * Bin2Hex6
1627    cycles for 100 * FastHex

3721    cycles for 100 * dw2hex
569     cycles for 100 * utoh (Hutch)
60834   cycles for 100 * CRT sprintf
664     cycles for 100 * Bin2Hex
665     cycles for 100 * Bin2Hex2 cx
665     cycles for 100 * Bin2Hex3 ecx
1521    cycles for 100 * Bin2Hex6
1627    cycles for 100 * FastHex

20      bytes for dw2hex
600     bytes for utoh (Hutch)
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
214     bytes for Bin2Hex3 ecx
616     bytes for Bin2Hex6
66      bytes for FastHex

00345678        = eax dw2hex
12345678        = eax utoh (Hutch)
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
00345678        = eax Bin2Hex3 ecx
12345678        = eax Bin2Hex6
012345678       = eax FastHex

--- ok ---
Title: Re: Fast DwordtoHex ?
Post by: TWell on December 09, 2015, 08:28:10 PM
AMD Athlon(tm) II X2 220 Processor (SSE3) 2.8 GHz

8781    cycles for 100 * dw2hex
797     cycles for 100 * utoh (Hutch)
78313   cycles for 100 * CRT sprintf
903     cycles for 100 * Bin2Hex
903     cycles for 100 * Bin2Hex2 cx
933     cycles for 100 * Bin2Hex3 ecx
4241    cycles for 100 * Bin2Hex6
2025    cycles for 100 * FastHex

8676    cycles for 100 * dw2hex
797     cycles for 100 * utoh (Hutch)
78244   cycles for 100 * CRT sprintf
908     cycles for 100 * Bin2Hex
922     cycles for 100 * Bin2Hex2 cx
919     cycles for 100 * Bin2Hex3 ecx
4227    cycles for 100 * Bin2Hex6
2034    cycles for 100 * FastHex

8712    cycles for 100 * dw2hex
810     cycles for 100 * utoh (Hutch)
78209   cycles for 100 * CRT sprintf
991     cycles for 100 * Bin2Hex
916     cycles for 100 * Bin2Hex2 cx
925     cycles for 100 * Bin2Hex3 ecx
4480    cycles for 100 * Bin2Hex6
2191    cycles for 100 * FastHex
Title: Re: Fast DwordtoHex ?
Post by: dedndave on December 09, 2015, 09:57:27 PM
prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

23333   cycles for 100 * dw2hex
1261    cycles for 100 * utoh (Hutch)
173108  cycles for 100 * CRT sprintf
1684    cycles for 100 * Bin2Hex
1480    cycles for 100 * Bin2Hex2 cx
1549    cycles for 100 * Bin2Hex3 ecx
5593    cycles for 100 * Bin2Hex6
2880    cycles for 100 * FastHex

23375   cycles for 100 * dw2hex
1266    cycles for 100 * utoh (Hutch)
173666  cycles for 100 * CRT sprintf
1460    cycles for 100 * Bin2Hex
1485    cycles for 100 * Bin2Hex2 cx
1474    cycles for 100 * Bin2Hex3 ecx
5608    cycles for 100 * Bin2Hex6
2874    cycles for 100 * FastHex

23111   cycles for 100 * dw2hex
1275    cycles for 100 * utoh (Hutch)
172950  cycles for 100 * CRT sprintf
1465    cycles for 100 * Bin2Hex
1481    cycles for 100 * Bin2Hex2 cx
1481    cycles for 100 * Bin2Hex3 ecx
5592    cycles for 100 * Bin2Hex6
2805    cycles for 100 * FastHex
Title: Re: Fast DwordtoHex ?
Post by: FORTRANS on December 10, 2015, 02:58:50 AM
Hi,

Quote from: jj2007 on December 09, 2015, 01:37:36 PM
Note that the currently attached version does not require MasmBasic.

   In that case, some oldies.  Somewhat odd utoh results for
the P-MMX?


{P-MMX}
pre-P4
15186 cycles for 100 * dw2hex
7585 cycles for 100 * utoh (Hutch)
253408 cycles for 100 * CRT sprintf
7057 cycles for 100 * Bin2Hex
7626 cycles for 100 * Bin2Hex2 cx
7585 cycles for 100 * Bin2Hex3 ecx
8991 cycles for 100 * Bin2Hex6
6013 cycles for 100 * FastHex

14894 cycles for 100 * dw2hex
8126 cycles for 100 * utoh (Hutch)
245623 cycles for 100 * CRT sprintf
7622 cycles for 100 * Bin2Hex
7604 cycles for 100 * Bin2Hex2 cx
7590 cycles for 100 * Bin2Hex3 ecx
8918 cycles for 100 * Bin2Hex6
5857 cycles for 100 * FastHex

13313 cycles for 100 * dw2hex
7410 cycles for 100 * utoh (Hutch)
245176 cycles for 100 * CRT sprintf
8035 cycles for 100 * Bin2Hex
7633 cycles for 100 * Bin2Hex2 cx
6907 cycles for 100 * Bin2Hex3 ecx
9412 cycles for 100 * Bin2Hex6
6373 cycles for 100 * FastHex

20 bytes for dw2hex
600 bytes for utoh (Hutch)
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
214 bytes for Bin2Hex3 ecx
616 bytes for Bin2Hex6
66 bytes for FastHex

00345678 = eax dw2hex
12345678 = eax utoh (Hutch)
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
00345678 = eax Bin2Hex3 ecx
12345678 = eax Bin2Hex6
012345678 = eax FastHex

--- ok ---
{P-III}
pre-P4 (SSE1)

7668 cycles for 100 * dw2hex
1311 cycles for 100 * utoh (Hutch)
166093 cycles for 100 * CRT sprintf
1614 cycles for 100 * Bin2Hex
2016 cycles for 100 * Bin2Hex2 cx
1617 cycles for 100 * Bin2Hex3 ecx
3698 cycles for 100 * Bin2Hex6
5975 cycles for 100 * FastHex

7687 cycles for 100 * dw2hex
1312 cycles for 100 * utoh (Hutch)
166001 cycles for 100 * CRT sprintf
1728 cycles for 100 * Bin2Hex
2025 cycles for 100 * Bin2Hex2 cx
1652 cycles for 100 * Bin2Hex3 ecx
3717 cycles for 100 * Bin2Hex6
5958 cycles for 100 * FastHex

7719 cycles for 100 * dw2hex
1322 cycles for 100 * utoh (Hutch)
166667 cycles for 100 * CRT sprintf
1625 cycles for 100 * Bin2Hex
2023 cycles for 100 * Bin2Hex2 cx
1613 cycles for 100 * Bin2Hex3 ecx
3718 cycles for 100 * Bin2Hex6
5966 cycles for 100 * FastHex

20 bytes for dw2hex
600 bytes for utoh (Hutch)
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
214 bytes for Bin2Hex3 ecx
616 bytes for Bin2Hex6
66 bytes for FastHex

00345678 = eax dw2hex
12345678 = eax utoh (Hutch)
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
00345678 = eax Bin2Hex3 ecx
12345678 = eax Bin2Hex6
012345678 = eax FastHex

--- ok ---


Regards,

Steve N.
Title: Re: Fast DwordtoHex ?
Post by: hutch-- on December 10, 2015, 04:34:55 AM
Steve,

The variations with the older hardware is probably due to the different handling of the Intel complex addressing mode. Its been a long time since I worked on a PIII or earlier but from memory you preferentially loaded a table address into a register first then processed it and this was to simplify the complex address to registers rather than combined OFFSETs from a table mixed with registers. The techniques changed with the PIV and then again with the Core2 series. So far the i7 I am using is similar to the Core2 quad I used to use and this is what I tested the last set of algos on.
Title: Re: Fast DwordtoHex ?
Post by: FORTRANS on December 10, 2015, 09:48:58 AM
Hi Steve,

   Thanks for the reply.  So complex addressing is the difference.
I should have remembered when I first got a Pentium Pro computer.
It performed quite a bit better than the Pentiums that were then
current.  If you had a good 32-bit compiler.  16-bit stuff was only
faster due to the faster clock and better cache.

Regards,

Steve N.
Title: Re: Fast DwordtoHex ?
Post by: dedndave on December 11, 2015, 04:03:02 AM
i like the Pentium MMX - it makes my code look the best   :lol:
Title: Re: Fast DwordtoHex ?
Post by: guga on December 14, 2015, 01:11:57 AM
 :greenclp:
Title: Re: Fast DwordtoHex ?
Post by: hutch-- on December 14, 2015, 10:17:50 AM
Dave,

If you can optimise code for a Prescott you will find the later processors easy in comparison. The Prescott processors had the longest and fussiest pipeline of any of the Intel processors I remember.
Title: Re: Fast DwordtoHex ?
Post by: dedndave on December 17, 2015, 12:16:21 AM
i would say it's easiest to optimize for whatever processor you are currently using - lol
Title: Re: Fast DwordtoHex ?
Post by: guga on December 17, 2015, 04:35:55 AM
Dave

the Rdtscp opcode exists in what processors ? I7 and above only ?

And the Rdtsc ? Does it exists on a PII or PIII too ? (If so, how to detect it ?

I mean, rdtscp is detectable thourgh bit 27 resultant from cpuid with mode 1 (in eax) activated. But, and rdtsc which bit is used to recognize it  ?

I´m not sure if rdtsc exists in all Pentium Series or only from PII or PIII and above
Title: Re: Fast DwordtoHex ?
Post by: dedndave on December 17, 2015, 05:58:41 AM
i believe RDTSC is supported in all pentiums and newer (that's from memory, so check it)

the CPUID bit is named TSC
it is CPUID function 1, EDX bit 4
Title: Re: Fast DwordtoHex ?
Post by: guga on December 17, 2015, 08:05:18 AM
Many thanks, dave  :t