The MASM Forum

General => The Laboratory => Topic started by: guga on July 28, 2023, 10:32:41 AM

Title: benchmark for another Bin2Hex, pls
Post by: guga on July 28, 2023, 10:32:41 AM
Hi guys

cant someone test this Bin2hex i adapted from an old one ?

I rebuuld 2 of them to test:
dwtoHex_Guga_SSE
FastHex (Its similar to above, but don´t use SSE -  I didn´t changed the original name, just puted my code onto it)

Both uses a huge hexadecimal table for my tests.

Note: Despite the speed, the downside is that the function is huge, because of the table.
Title: Re: benchmark for another Bin2Hex, pls
Post by: zedd151 on July 28, 2023, 10:37:46 AM
Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)

4149    cycles for 100 * dw2hex
4401    cycles for 100 * Hex$ Grincheux
61610  cycles for 100 * CRT sprintf
685    cycles for 100 * Bin2Hex
785    cycles for 100 * Bin2Hex2 cx
1286    cycles for 100 * dwtoHex_Guga_SSE
1434    cycles for 100 * dwtoHex_Guga

4070    cycles for 100 * dw2hex
4397    cycles for 100 * Hex$ Grincheux
61441  cycles for 100 * CRT sprintf
698    cycles for 100 * Bin2Hex
789    cycles for 100 * Bin2Hex2 cx
1285    cycles for 100 * dwtoHex_Guga_SSE
1436    cycles for 100 * dwtoHex_Guga

4073    cycles for 100 * dw2hex
4391    cycles for 100 * Hex$ Grincheux
61415  cycles for 100 * CRT sprintf
681    cycles for 100 * Bin2Hex
786    cycles for 100 * Bin2Hex2 cx
1287    cycles for 100 * dwtoHex_Guga_SSE
1434    cycles for 100 * dwtoHex_Guga

4070    cycles for 100 * dw2hex
4392    cycles for 100 * Hex$ Grincheux
61459  cycles for 100 * CRT sprintf
683    cycles for 100 * Bin2Hex
785    cycles for 100 * Bin2Hex2 cx
1289    cycles for 100 * dwtoHex_Guga_SSE
1437    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138    bytes for Bin2Hex
150    bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

second run
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (SSE4)

4076    cycles for 100 * dw2hex
4403    cycles for 100 * Hex$ Grincheux
61353   cycles for 100 * CRT sprintf
691     cycles for 100 * Bin2Hex
795     cycles for 100 * Bin2Hex2 cx
1301    cycles for 100 * dwtoHex_Guga_SSE
1444    cycles for 100 * dwtoHex_Guga

4080    cycles for 100 * dw2hex
4401    cycles for 100 * Hex$ Grincheux
61223   cycles for 100 * CRT sprintf
708     cycles for 100 * Bin2Hex
800     cycles for 100 * Bin2Hex2 cx
1295    cycles for 100 * dwtoHex_Guga_SSE
1444    cycles for 100 * dwtoHex_Guga

4309    cycles for 100 * dw2hex
4419    cycles for 100 * Hex$ Grincheux
61311   cycles for 100 * CRT sprintf
707     cycles for 100 * Bin2Hex
812     cycles for 100 * Bin2Hex2 cx
1316    cycles for 100 * dwtoHex_Guga_SSE
1460    cycles for 100 * dwtoHex_Guga

4092    cycles for 100 * dw2hex
4413    cycles for 100 * Hex$ Grincheux
61683   cycles for 100 * CRT sprintf
708     cycles for 100 * Bin2Hex
811     cycles for 100 * Bin2Hex2 cx
1315    cycles for 100 * dwtoHex_Guga_SSE
1444    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---
Title: Re: benchmark for another Bin2Hex, pls
Post by: Gunther on July 28, 2023, 10:51:12 AM
guga,

here are my results:
2787    cycles for 100 * dw2hex
2400    cycles for 100 * Hex$ Grincheux
41084   cycles for 100 * CRT sprintf
505     cycles for 100 * Bin2Hex
534     cycles for 100 * Bin2Hex2 cx
616     cycles for 100 * dwtoHex_Guga_SSE
658     cycles for 100 * dwtoHex_Guga

2807    cycles for 100 * dw2hex
2454    cycles for 100 * Hex$ Grincheux
41195   cycles for 100 * CRT sprintf
501     cycles for 100 * Bin2Hex
502     cycles for 100 * Bin2Hex2 cx
612     cycles for 100 * dwtoHex_Guga_SSE
659     cycles for 100 * dwtoHex_Guga

2886    cycles for 100 * dw2hex
2294    cycles for 100 * Hex$ Grincheux
40487   cycles for 100 * CRT sprintf
506     cycles for 100 * Bin2Hex
505     cycles for 100 * Bin2Hex2 cx
666     cycles for 100 * dwtoHex_Guga_SSE
694     cycles for 100 * dwtoHex_Guga

2751    cycles for 100 * dw2hex
2453    cycles for 100 * Hex$ Grincheux
40054   cycles for 100 * CRT sprintf
506     cycles for 100 * Bin2Hex
522     cycles for 100 * Bin2Hex2 cx
625     cycles for 100 * dwtoHex_Guga_SSE
662     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

I hope this has been helpful to you.
Title: Re: benchmark for another Bin2Hex, pls
Post by: guga on July 28, 2023, 11:05:26 AM
Tks guys

I´m trying to see if the algorithm is worthfull to maintain with a huge table, but it seems that on Intel, it looses speed somehow ? The original version is closer to the results from Bin2Hex, but AMD seems to handle better such huge tables. Maybe doing huge tables like that is not worthfull because at the end we are talking about a difference of less then 30%-40% of speed that also seems to depends heavily on the processor ?

From yours results, that's a bit surprising for me. I though Intel would deal better on such situations :dazzled: .

AMD Ryzen 5 2400G with Radeon Vega Graphics     (SSE4)

7155    cycles for 100 * dw2hex
2999    cycles for 100 * Hex$ Grincheux
46349   cycles for 100 * CRT sprintf
1017    cycles for 100 * Bin2Hex
854     cycles for 100 * Bin2Hex2 cx
655     cycles for 100 * dwtoHex_Guga_SSE
761     cycles for 100 * dwtoHex_Guga

7213    cycles for 100 * dw2hex
2883    cycles for 100 * Hex$ Grincheux
45477   cycles for 100 * CRT sprintf
863     cycles for 100 * Bin2Hex
893     cycles for 100 * Bin2Hex2 cx
655     cycles for 100 * dwtoHex_Guga_SSE
732     cycles for 100 * dwtoHex_Guga

7224    cycles for 100 * dw2hex
2890    cycles for 100 * Hex$ Grincheux
44463   cycles for 100 * CRT sprintf
866     cycles for 100 * Bin2Hex
815     cycles for 100 * Bin2Hex2 cx
693     cycles for 100 * dwtoHex_Guga_SSE
732     cycles for 100 * dwtoHex_Guga

7516    cycles for 100 * dw2hex
3078    cycles for 100 * Hex$ Grincheux
46002   cycles for 100 * CRT sprintf
929     cycles for 100 * Bin2Hex
880     cycles for 100 * Bin2Hex2 cx
661     cycles for 100 * dwtoHex_Guga_SSE
804     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---
Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 28, 2023, 11:24:33 AM
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3333    cycles for 100 * dw2hex
3600    cycles for 100 * Hex$ Grincheux
42239  cycles for 100 * CRT sprintf
540    cycles for 100 * Bin2Hex
629    cycles for 100 * Bin2Hex2 cx
736    cycles for 100 * dwtoHex_Guga_SSE
837    cycles for 100 * dwtoHex_Guga

3331    cycles for 100 * dw2hex
3600    cycles for 100 * Hex$ Grincheux
42094  cycles for 100 * CRT sprintf
565    cycles for 100 * Bin2Hex
621    cycles for 100 * Bin2Hex2 cx
732    cycles for 100 * dwtoHex_Guga_SSE
815    cycles for 100 * dwtoHex_Guga

3318    cycles for 100 * dw2hex
3602    cycles for 100 * Hex$ Grincheux
42300  cycles for 100 * CRT sprintf
555    cycles for 100 * Bin2Hex
621    cycles for 100 * Bin2Hex2 cx
829    cycles for 100 * dwtoHex_Guga_SSE
843    cycles for 100 * dwtoHex_Guga

3328    cycles for 100 * dw2hex
3603    cycles for 100 * Hex$ Grincheux
42090  cycles for 100 * CRT sprintf
539    cycles for 100 * Bin2Hex
621    cycles for 100 * Bin2Hex2 cx
732    cycles for 100 * dwtoHex_Guga_SSE
814    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138    bytes for Bin2Hex
150    bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

It's difficult to beat Bin2Hex (https://masm32.com/board/index.php?msg=52677) ;-)
Title: Re: benchmark for another Bin2Hex, pls
Post by: zedd151 on July 28, 2023, 11:30:57 AM
I have an excuse... my computer is old and tired, like me.  :tongue:
Seeing your results  guga, compared to mine... I feel a little less old, and a little less tired. ... at least partly... :biggrin:
 Some fare better on your machine, others fare better on mine
Title: Re: benchmark for another Bin2Hex, pls
Post by: guga on July 28, 2023, 11:40:19 AM
Quote from: jj2007 on July 28, 2023, 11:24:33 AMIntel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3333    cycles for 100 * dw2hex
3600    cycles for 100 * Hex$ Grincheux
42239   cycles for 100 * CRT sprintf
540     cycles for 100 * Bin2Hex
629     cycles for 100 * Bin2Hex2 cx
736     cycles for 100 * dwtoHex_Guga_SSE
837     cycles for 100 * dwtoHex_Guga

3331    cycles for 100 * dw2hex
3600    cycles for 100 * Hex$ Grincheux
42094   cycles for 100 * CRT sprintf
565     cycles for 100 * Bin2Hex
621     cycles for 100 * Bin2Hex2 cx
732     cycles for 100 * dwtoHex_Guga_SSE
815     cycles for 100 * dwtoHex_Guga

3318    cycles for 100 * dw2hex
3602    cycles for 100 * Hex$ Grincheux
42300   cycles for 100 * CRT sprintf
555     cycles for 100 * Bin2Hex
621     cycles for 100 * Bin2Hex2 cx
829     cycles for 100 * dwtoHex_Guga_SSE
843     cycles for 100 * dwtoHex_Guga

3328    cycles for 100 * dw2hex
3603    cycles for 100 * Hex$ Grincheux
42090   cycles for 100 * CRT sprintf
539     cycles for 100 * Bin2Hex
621     cycles for 100 * Bin2Hex2 cx
732     cycles for 100 * dwtoHex_Guga_SSE
814     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

Hi JJ. Tks...


Those results are a bit odd for me. It seems that on AMD dwtoHex_Guga_SSE is faster then Bin2Hex and on Intel, it´s the opposite.


The difference between both are not that big, but since this new version uses a huge table, i´ll keep the good old Bin2Hex. The benefits of speed (at least on my AMD) don´t seems to compensates the final size of the function - 245848 bytes (due to the tbl size).
Title: Re: benchmark for another Bin2Hex, pls
Post by: NoCforMe on July 28, 2023, 04:56:38 PM
OK, since we seem to be racing different binary--> hex conversion routines, let me throw mine into the mix:

XlatTable    DB "0123456789ABCDEF"

;====================================
; Bin2Hex()
;
; On entry,
;    ECX--> buffer to write hex chars. to
;    EDX = value to convert
;====================================

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

Bin2Hex    PROC

    PUSH    EBX
    PUSH    EDI

    MOV    EBX, OFFSET XlatTable
    MOV    EDI, ECX
    MOV    CL, 24                ;# bits to shift right.
    MOV    CH, 4                ;# of bytes to convert.

hloop:    MOV    EAX, EDX            ;Original value.
    SHR    EAX, CL                ;Shift byte to right.
    MOV    AH, AL                ;Save byte.
    SHR    AL, 4                ;Get high nybble.
    XLATB
    STOSB
    MOV    AL, AH                ;Get back byte.
    AND    AL, 0FH                ;Get low nybble.
    XLATB
    STOSB
    SUB    CL, 8                ;Next byte over.
    DEC    CH
    JNZ    hloop

    MOV    BYTE PTR [EDI], 0        ;Terminate buffer.

    POP    EDI
    POP    EBX
    RET

Bin2Hex    ENDP

Anyone care to test it?

I did not design this at all with speed in mind (because I just don't care), but I'm curious nonetheless. I did eliminate all memory references inside the loop except for XLATB and STOSB.
Title: Re: benchmark for another Bin2Hex, pls
Post by: NoCforMe on July 28, 2023, 05:30:01 PM
My entry #2: longer table, shorter code:

XlatTable    DB "000102030405060708090A0B0C0D0E0F"
        DB "101112131415161718191A1B1C1D1E1F"
        DB "202122232425262728292A2B2C2D2E2F"
        DB "303132333435363738393A3B3C3D3E3F"
        DB "404142434445464748494A4B4C4D4E4F"
        DB "505152535455565758595A5B5C5D5E5F"
        DB "606162636465666768696A6B6C6D6E6F"
        DB "707172737475767778797A7B7C7D7E7F"
        DB "808182838485868788898A8B8C8D8E8F"
        DB "909192939495969798999A9B9C9D9E9F"
        DB "A0A1A2A3A4A5A6A7A819AAABACADAEAF"
        DB "B011B2B3B4B5B6B7B8B9BABBBCBDBEBF"
        DB "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
        DB "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
        DB "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
        DB "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF"

;====================================
; Bin2Hex()
;
; On entry,
;    ECX--> buffer to write hex chars. to
;    EDX = value to convert
;====================================

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

Bin2Hex    PROC

    PUSH    EBX
    PUSH    EDI

    MOV    EBX, OFFSET XlatTable
    MOV    EDI, ECX
    MOV    CL, 24                ;# bits to shift right.
    MOV    CH, 4                ;# of bytes to convert.

hloop:    MOV    EAX, EDX            ;Original value.
    SHR    EAX, CL                ;Shift byte to right.
    AND    EAX, 0FFH            ;Isolate that byte.
    MOV    AX, [EBX + EAX*2]        ;Translate byte.
    STOSW
    SUB    CL, 8                ;Next byte over.
    DEC    CH
    JNZ    hloop

    MOV    BYTE PTR [EDI], 0        ;Terminate buffer.

    POP    EDI
    POP    EBX
    RET

Bin2Hex    ENDP

Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 28, 2023, 08:00:25 PM
Quote from: guga on July 28, 2023, 11:40:19 AMThose results are a bit odd for me. It seems that on AMD dwtoHex_Guga_SSE is faster then Bin2Hex and on Intel, it´s the opposite.


The difference between both are not that big, but since this new version uses a huge table, i´ll keep the good old Bin2Hex. The benefits of speed (at least on my AMD) don´t seems to compensates the final size of the function - 245848 bytes (due to the tbl size).

Bin2Hex is pretty fast, indeed, and I've just spent hours trying to make it faster, without success.

Have you thought of gathering instructions such as VGATHERDPS?

Btw if table size bothers you, look at crtHexTable3 in the source :cool:

New version with NoCforMe's #2:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3341    cycles for 100 * dw2hex
1358    cycles for 100 * NoCforMe
540    cycles for 100 * Bin2Hex
739    cycles for 100 * dwtoHex_Guga_SSE
833    cycles for 100 * dwtoHex_Guga

3336    cycles for 100 * dw2hex
1270    cycles for 100 * NoCforMe
540    cycles for 100 * Bin2Hex
734    cycles for 100 * dwtoHex_Guga_SSE
789    cycles for 100 * dwtoHex_Guga

3336    cycles for 100 * dw2hex
1271    cycles for 100 * NoCforMe
540    cycles for 100 * Bin2Hex
732    cycles for 100 * dwtoHex_Guga_SSE
1413    cycles for 100 * dwtoHex_Guga

3363    cycles for 100 * dw2hex
1270    cycles for 100 * NoCforMe
540    cycles for 100 * Bin2Hex
731    cycles for 100 * dwtoHex_Guga_SSE
790    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga
Title: Re: benchmark for another Bin2Hex, pls
Post by: guga on July 28, 2023, 10:52:54 PM
QuoteBin2Hex is pretty fast, indeed, and I've just spent hours trying to make it faster, without success.

Have you thought of gathering instructions such as VGATHERDPS?

Btw if table size bothers you, look at crtHexTable3 in the source :cool:

Hi JJ

Yeah, on Intel bin2hex is pretty fast. I´ll keep with it. Seems, in fact, not necessary for me reinventing the wheel becauuse mine variation only works faster on AMD and uses a huge table at teh costr of only 20-30% of better performance. I could, hoeever make a variation of suuch functions that checks the processor, but, it will definitelly kill the general performance anyway.

About VGATHERDPS, i didn´t implemented it yet. Is it for x86 (32 bits) ? Do you have an example using such instrucions ? I´ll try to port it later when i have more free time.
Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 28, 2023, 11:09:15 PM
There is indeed quite a difference between Intel and AMD:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

5494    cycles for 100 * dw2hex
1381    cycles for 100 * NoCforMe
675    cycles for 100 * Bin2Hex
496    cycles for 100 * dwtoHex_Guga_SSE
562    cycles for 100 * dwtoHex_Guga

5558    cycles for 100 * dw2hex
1425    cycles for 100 * NoCforMe
680    cycles for 100 * Bin2Hex
515    cycles for 100 * dwtoHex_Guga_SSE
564    cycles for 100 * dwtoHex_Guga

5536    cycles for 100 * dw2hex
1469    cycles for 100 * NoCforMe
676    cycles for 100 * Bin2Hex
466    cycles for 100 * dwtoHex_Guga_SSE
567    cycles for 100 * dwtoHex_Guga

5520    cycles for 100 * dw2hex
1438    cycles for 100 * NoCforMe
692    cycles for 100 * Bin2Hex
534    cycles for 100 * dwtoHex_Guga_SSE
579    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

Quote from: guga on July 28, 2023, 10:52:54 PMAbout VGATHERDPS, i didn´t implemented it yet. Is it for x86 (32 bits) ? Do you have an example using such instrucions ? I´ll try to port it later when i have more free time.

No, I never tested it. The documentation is lousy :sad:
Title: Re: benchmark for another Bin2Hex, pls
Post by: Greenhorn on July 28, 2023, 11:34:56 PM
Another test result.

AMD Ryzen 7 3700X 8-Core Processor              (SSE4)

5736 cycles for 100 * dw2hex
2189 cycles for 100 * Hex$ Grincheux
29319 cycles for 100 * CRT sprintf
360 cycles for 100 * Bin2Hex
397 cycles for 100 * Bin2Hex2 cx
499 cycles for 100 * dwtoHex_Guga_SSE
523 cycles for 100 * dwtoHex_Guga

5731 cycles for 100 * dw2hex
2479 cycles for 100 * Hex$ Grincheux
29270 cycles for 100 * CRT sprintf
363 cycles for 100 * Bin2Hex
383 cycles for 100 * Bin2Hex2 cx
497 cycles for 100 * dwtoHex_Guga_SSE
525 cycles for 100 * dwtoHex_Guga

5741 cycles for 100 * dw2hex
2483 cycles for 100 * Hex$ Grincheux
29275 cycles for 100 * CRT sprintf
365 cycles for 100 * Bin2Hex
385 cycles for 100 * Bin2Hex2 cx
494 cycles for 100 * dwtoHex_Guga_SSE
524 cycles for 100 * dwtoHex_Guga

5732 cycles for 100 * dw2hex
2245 cycles for 100 * Hex$ Grincheux
29267 cycles for 100 * CRT sprintf
362 cycles for 100 * Bin2Hex
385 cycles for 100 * Bin2Hex2 cx
496 cycles for 100 * dwtoHex_Guga_SSE
523 cycles for 100 * dwtoHex_Guga

20 bytes for dw2hex
64 bytes for Hex$ Grincheux
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
245848 bytes for dwtoHex_Guga_SSE
76 bytes for dwtoHex_Guga

00345678 = eax dw2hex
12345678 = eax Hex$ Grincheux
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
12345678 = eax dwtoHex_Guga_SSE
12345678 = eax dwtoHex_Guga

--- ok ---
Title: Re: benchmark for another Bin2Hex, pls
Post by: guga on July 29, 2023, 03:07:02 AM
Tks Greenhorn

That´s a bit wierdi. On your AMD processor, my version produces the same results as in Intel. But, on mine processor, it seems to happens the opposite. I´m wonderin what is happenning for such different results.

AMD Ryzen 5 2400G with Radeon Vega Graphics     (SSE4)

9124    cycles for 100 * dw2hex
2980    cycles for 100 * Hex$ Grincheux
51106   cycles for 100 * CRT sprintf
864     cycles for 100 * Bin2Hex
807     cycles for 100 * Bin2Hex2 cx
623     cycles for 100 * dwtoHex_Guga_SSE
713     cycles for 100 * dwtoHex_Guga

7591    cycles for 100 * dw2hex
4216    cycles for 100 * Hex$ Grincheux
47734   cycles for 100 * CRT sprintf
899     cycles for 100 * Bin2Hex
814     cycles for 100 * Bin2Hex2 cx
657     cycles for 100 * dwtoHex_Guga_SSE
716     cycles for 100 * dwtoHex_Guga

7496    cycles for 100 * dw2hex
5396    cycles for 100 * Hex$ Grincheux
55117   cycles for 100 * CRT sprintf
936     cycles for 100 * Bin2Hex
1289    cycles for 100 * Bin2Hex2 cx
858     cycles for 100 * dwtoHex_Guga_SSE
708     cycles for 100 * dwtoHex_Guga

8734    cycles for 100 * dw2hex
2919    cycles for 100 * Hex$ Grincheux
46808   cycles for 100 * CRT sprintf
900     cycles for 100 * Bin2Hex
807     cycles for 100 * Bin2Hex2 cx
860     cycles for 100 * dwtoHex_Guga_SSE
1015    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---
Title: Re: benchmark for another Bin2Hex, pls
Post by: guga on July 29, 2023, 03:09:31 AM
Quote from: jj2007 on July 28, 2023, 11:09:15 PMNo, I never tested it. The documentation is lousy :sad:

Hi JJ

I´ll try to find some more info to see if it can be implemented on x86. I found some few information (in fasm as well), that maybe could give a idea where to start later

https://board.flatassembler.net/topic.php?t=22502
https://www.felixcloutier.com/x86/vgatherdps:vgatherqps
https://stackoverflow.com/questions/21774454/how-are-the-gather-instructions-in-avx2-implemented
Title: Re: benchmark for another Bin2Hex, pls
Post by: zedd151 on July 29, 2023, 03:24:00 AM
With jj's testbed:
Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)

3899    cycles for 100 * dw2hex
1692    cycles for 100 * NoCforMe
697    cycles for 100 * Bin2Hex
1224    cycles for 100 * dwtoHex_Guga_SSE
1374    cycles for 100 * dwtoHex_Guga

4106    cycles for 100 * dw2hex
1692    cycles for 100 * NoCforMe
701    cycles for 100 * Bin2Hex
1228    cycles for 100 * dwtoHex_Guga_SSE
1360    cycles for 100 * dwtoHex_Guga

4142    cycles for 100 * dw2hex
1689    cycles for 100 * NoCforMe
690    cycles for 100 * Bin2Hex
1220    cycles for 100 * dwtoHex_Guga_SSE
1357    cycles for 100 * dwtoHex_Guga

4129    cycles for 100 * dw2hex
1692    cycles for 100 * NoCforMe
690    cycles for 100 * Bin2Hex
1225    cycles for 100 * dwtoHex_Guga_SSE
1356    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---
Title: Re: benchmark for another Bin2Hex, pls
Post by: Greenhorn on July 29, 2023, 05:06:50 AM
Quote from: guga on July 29, 2023, 03:07:02 AMTks Greenhorn

That´s a bit wierdi. On your AMD processor, my version produces the same results as in Intel. But, on mine processor, it seems to happens the opposite. I´m wonderin what is happenning for such different results.

I guess it's due to optimization of the Ryzen Arch with each generation.
Would be interesting to see how Ryzen 5000 and 7000 performs.
Title: Re: benchmark for another Bin2Hex, pls
Post by: fearless on July 29, 2023, 07:14:10 AM
Bin2HexTimingsNew2:

AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4671    cycles for 100 * dw2hex
1920    cycles for 100 * Hex$ Grincheux
22845  cycles for 100 * CRT sprintf
361    cycles for 100 * Bin2Hex
368    cycles for 100 * Bin2Hex2 cx
465    cycles for 100 * dwtoHex_Guga_SSE
475    cycles for 100 * dwtoHex_Guga

4671    cycles for 100 * dw2hex
1917    cycles for 100 * Hex$ Grincheux
23276  cycles for 100 * CRT sprintf
364    cycles for 100 * Bin2Hex
368    cycles for 100 * Bin2Hex2 cx
480    cycles for 100 * dwtoHex_Guga_SSE
454    cycles for 100 * dwtoHex_Guga

4698    cycles for 100 * dw2hex
1923    cycles for 100 * Hex$ Grincheux
22942  cycles for 100 * CRT sprintf
362    cycles for 100 * Bin2Hex
357    cycles for 100 * Bin2Hex2 cx
431    cycles for 100 * dwtoHex_Guga_SSE
438    cycles for 100 * dwtoHex_Guga

4682    cycles for 100 * dw2hex
1933    cycles for 100 * Hex$ Grincheux
23104  cycles for 100 * CRT sprintf
362    cycles for 100 * Bin2Hex
367    cycles for 100 * Bin2Hex2 cx
430    cycles for 100 * dwtoHex_Guga_SSE
439    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138    bytes for Bin2Hex
150    bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga



Bin2HexTimingsNew2JJ:

AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4650    cycles for 100 * dw2hex
1359    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
445    cycles for 100 * dwtoHex_Guga_SSE
451    cycles for 100 * dwtoHex_Guga

4657    cycles for 100 * dw2hex
1385    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
485    cycles for 100 * dwtoHex_Guga_SSE
473    cycles for 100 * dwtoHex_Guga

4659    cycles for 100 * dw2hex
1375    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
434    cycles for 100 * dwtoHex_Guga_SSE
436    cycles for 100 * dwtoHex_Guga

4660    cycles for 100 * dw2hex
1376    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
466    cycles for 100 * dwtoHex_Guga_SSE
437    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga
Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 29, 2023, 12:46:35 PM
Quote from: guga on July 29, 2023, 03:09:31 AM
Quote from: jj2007 on July 28, 2023, 11:09:15 PMNo, I never tested it. The documentation is lousy :sad:

Hi JJ

I´ll try to find some more info to see if it can be implemented on x86. I found some few information (in fasm as well), that maybe could give a idea where to start later

https://board.flatassembler.net/topic.php?t=22502
https://www.felixcloutier.com/x86/vgatherdps:vgatherqps
https://stackoverflow.com/questions/21774454/how-are-the-gather-instructions-in-avx2-implemented

At least, now I got the syntax right. This assembles with UAsm64:

  mov eax, offset sometext
  int 3
  nop
  vpgatherdd xmm0, [eax+xmm1], xmm2
  nop
  movd edx, xmm0
Title: Re: benchmark for another Bin2Hex, pls
Post by: TimoVJL on July 29, 2023, 10:22:37 PM
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

6854    cycles for 100 * dw2hex
2959    cycles for 100 * Hex$ Grincheux
44193   cycles for 100 * CRT sprintf
911     cycles for 100 * Bin2Hex
780     cycles for 100 * Bin2Hex2 cx
620     cycles for 100 * dwtoHex_Guga_SSE
691     cycles for 100 * dwtoHex_Guga

6836    cycles for 100 * dw2hex
2805    cycles for 100 * Hex$ Grincheux
43706   cycles for 100 * CRT sprintf
834     cycles for 100 * Bin2Hex
778     cycles for 100 * Bin2Hex2 cx
621     cycles for 100 * dwtoHex_Guga_SSE
692     cycles for 100 * dwtoHex_Guga

7309    cycles for 100 * dw2hex
2890    cycles for 100 * Hex$ Grincheux
45566   cycles for 100 * CRT sprintf
837     cycles for 100 * Bin2Hex
849     cycles for 100 * Bin2Hex2 cx
625     cycles for 100 * dwtoHex_Guga_SSE
694     cycles for 100 * dwtoHex_Guga

7069    cycles for 100 * dw2hex
2887    cycles for 100 * Hex$ Grincheux
45794   cycles for 100 * CRT sprintf
835     cycles for 100 * Bin2Hex
923     cycles for 100 * Bin2Hex2 cx
626     cycles for 100 * dwtoHex_Guga_SSE
689     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

-
Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 30, 2023, 06:06:35 AM
One more, please...

There is one Avx2 algo. If you run it on an older cpu, it will simply output "No Avx2" (I hope that works...)

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

5509    cycles for 100 * dw2hex
1372    cycles for 100 * NoCforMe
662    cycles for 100 * Bin2Hex
1082    cycles for 100 * Bin2Hex Avx2
653    cycles for 100 * dwtoHex_Guga_SSE
546    cycles for 100 * dwtoHex_Guga

5480    cycles for 100 * dw2hex
1373    cycles for 100 * NoCforMe
676    cycles for 100 * Bin2Hex
1074    cycles for 100 * Bin2Hex Avx2
495    cycles for 100 * dwtoHex_Guga_SSE
546    cycles for 100 * dwtoHex_Guga

5481    cycles for 100 * dw2hex
1381    cycles for 100 * NoCforMe
702    cycles for 100 * Bin2Hex
1085    cycles for 100 * Bin2Hex Avx2
495    cycles for 100 * dwtoHex_Guga_SSE
546    cycles for 100 * dwtoHex_Guga

5436    cycles for 100 * dw2hex
1520    cycles for 100 * NoCforMe
660    cycles for 100 * Bin2Hex
1074    cycles for 100 * Bin2Hex Avx2
495    cycles for 100 * dwtoHex_Guga_SSE
573    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
198    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga
Title: Re: benchmark for another Bin2Hex, pls
Post by: zedd151 on July 30, 2023, 06:30:50 AM
Bin2HexTimingsNewG:
Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)

4111    cycles for 100 * dw2hex
1678    cycles for 100 * NoCforMe
1236    cycles for 100 * Bin2Hex
23921  cycles for 100 * Bin2Hex Avx2
1243    cycles for 100 * dwtoHex_Guga_SSE
1360    cycles for 100 * dwtoHex_Guga

4067    cycles for 100 * dw2hex
1678    cycles for 100 * NoCforMe
1242    cycles for 100 * Bin2Hex
23994  cycles for 100 * Bin2Hex Avx2
1294    cycles for 100 * dwtoHex_Guga_SSE
1334    cycles for 100 * dwtoHex_Guga

4079    cycles for 100 * dw2hex
1696    cycles for 100 * NoCforMe
1131    cycles for 100 * Bin2Hex
23931  cycles for 100 * Bin2Hex Avx2
1243    cycles for 100 * dwtoHex_Guga_SSE
1342    cycles for 100 * dwtoHex_Guga

4070    cycles for 100 * dw2hex
1678    cycles for 100 * NoCforMe
1236    cycles for 100 * Bin2Hex
23971  cycles for 100 * Bin2Hex Avx2
1239    cycles for 100 * dwtoHex_Guga_SSE
1364    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
198    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
No Avx2 = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

No Avx2 = eax Bin2Hex Avx2  :tongue:
Title: Re: benchmark for another Bin2Hex, pls
Post by: fearless on July 30, 2023, 06:42:09 AM
Bin2HexTimingsNewG:

AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4677    cycles for 100 * dw2hex
1368    cycles for 100 * NoCforMe
356    cycles for 100 * Bin2Hex
18422  cycles for 100 * Bin2Hex Avx2
442    cycles for 100 * dwtoHex_Guga_SSE
436    cycles for 100 * dwtoHex_Guga

4670    cycles for 100 * dw2hex
1374    cycles for 100 * NoCforMe
361    cycles for 100 * Bin2Hex
18377  cycles for 100 * Bin2Hex Avx2
428    cycles for 100 * dwtoHex_Guga_SSE
442    cycles for 100 * dwtoHex_Guga

4703    cycles for 100 * dw2hex
1370    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
18341  cycles for 100 * Bin2Hex Avx2
432    cycles for 100 * dwtoHex_Guga_SSE
436    cycles for 100 * dwtoHex_Guga

4731    cycles for 100 * dw2hex
1373    cycles for 100 * NoCforMe
390    cycles for 100 * Bin2Hex
18299  cycles for 100 * Bin2Hex Avx2
464    cycles for 100 * dwtoHex_Guga_SSE
439    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
198    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga
Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 30, 2023, 07:09:42 AM
Ok, there was a little glitch. Corrected version attached, it even builds without MasmBasic.

As you can see, the Avx2 version is not particularly fast. However, it was fun to code it, and I almost understood one of the exotic gather instructions ;-)

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

5513    cycles for 100 * dw2hex
1383    cycles for 100 * NoCforMe
674     cycles for 100 * Bin2Hex
1077    cycles for 100 * Bin2Hex Avx2
493     cycles for 100 * dwtoHex_Guga_SSE
565     cycles for 100 * dwtoHex_Guga

5470    cycles for 100 * dw2hex
1377    cycles for 100 * NoCforMe
672     cycles for 100 * Bin2Hex
1075    cycles for 100 * Bin2Hex Avx2
501     cycles for 100 * dwtoHex_Guga_SSE
549     cycles for 100 * dwtoHex_Guga

5439    cycles for 100 * dw2hex
1375    cycles for 100 * NoCforMe
668     cycles for 100 * Bin2Hex
1077    cycles for 100 * Bin2Hex Avx2
490     cycles for 100 * dwtoHex_Guga_SSE
551     cycles for 100 * dwtoHex_Guga

5543    cycles for 100 * dw2hex
1377    cycles for 100 * NoCforMe
716     cycles for 100 * Bin2Hex
1088    cycles for 100 * Bin2Hex Avx2
491     cycles for 100 * dwtoHex_Guga_SSE
555     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572     bytes for NoCforMe
138     bytes for Bin2Hex
222     bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

Title: Re: benchmark for another Bin2Hex, pls
Post by: fearless on July 30, 2023, 08:04:19 AM
AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4713    cycles for 100 * dw2hex
1381    cycles for 100 * NoCforMe
358    cycles for 100 * Bin2Hex
18324  cycles for 100 * Bin2Hex Avx2
433    cycles for 100 * dwtoHex_Guga_SSE
452    cycles for 100 * dwtoHex_Guga

4699    cycles for 100 * dw2hex
1376    cycles for 100 * NoCforMe
357    cycles for 100 * Bin2Hex
18193  cycles for 100 * Bin2Hex Avx2
432    cycles for 100 * dwtoHex_Guga_SSE
438    cycles for 100 * dwtoHex_Guga

4716    cycles for 100 * dw2hex
1380    cycles for 100 * NoCforMe
357    cycles for 100 * Bin2Hex
18365  cycles for 100 * Bin2Hex Avx2
430    cycles for 100 * dwtoHex_Guga_SSE
438    cycles for 100 * dwtoHex_Guga

4695    cycles for 100 * dw2hex
1376    cycles for 100 * NoCforMe
357    cycles for 100 * Bin2Hex
18265  cycles for 100 * Bin2Hex Avx2
428    cycles for 100 * dwtoHex_Guga_SSE
450    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
222    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga
Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 30, 2023, 08:10:32 AM
Wow, this is weird - and it's not an old CPU (https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+9+5950X&id=3862) :rolleyes:

358    cycles for 100 * Bin2Hex
18324  cycles for 100 * Bin2Hex Avx2
Title: Re: benchmark for another Bin2Hex, pls
Post by: zedd151 on July 30, 2023, 08:12:24 AM
Bin2HexTimingsM32
Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)

4101    cycles for 100 * dw2hex
1689    cycles for 100 * NoCforMe
698    cycles for 100 * Bin2Hex
1187    cycles for 100 * Bin2Hex Avx2
1223    cycles for 100 * dwtoHex_Guga_SSE
1357    cycles for 100 * dwtoHex_Guga

4099    cycles for 100 * dw2hex
1689    cycles for 100 * NoCforMe
691    cycles for 100 * Bin2Hex
1187    cycles for 100 * Bin2Hex Avx2
1218    cycles for 100 * dwtoHex_Guga_SSE
1356    cycles for 100 * dwtoHex_Guga

4101    cycles for 100 * dw2hex
1689    cycles for 100 * NoCforMe
782    cycles for 100 * Bin2Hex
1191    cycles for 100 * Bin2Hex Avx2
1219    cycles for 100 * dwtoHex_Guga_SSE
1358    cycles for 100 * dwtoHex_Guga

4093    cycles for 100 * dw2hex
1701    cycles for 100 * NoCforMe
714    cycles for 100 * Bin2Hex
1187    cycles for 100 * Bin2Hex Avx2
1218    cycles for 100 * dwtoHex_Guga_SSE
1351    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
222    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
No Avx2!        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---
Title: Re: benchmark for another Bin2Hex, pls
Post by: TimoVJL on July 30, 2023, 03:05:17 PM
Quote from: jj2007 on July 30, 2023, 08:10:32 AMWow, this is weird - and it's not an old CPU (https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+9+5950X&id=3862) :rolleyes:

358    cycles for 100 * Bin2Hex
18324  cycles for 100 * Bin2Hex Avx2
Maybe AVX2 module power up on demand.
Title: Re: benchmark for another Bin2Hex, pls
Post by: TimoVJL on July 31, 2023, 05:18:29 PM
AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

6837    cycles for 100 * dw2hex
1796    cycles for 100 * NoCforMe
811     cycles for 100 * Bin2Hex
1406    cycles for 100 * Bin2Hex Avx2
602     cycles for 100 * dwtoHex_Guga_SSE
675     cycles for 100 * dwtoHex_Guga

6862    cycles for 100 * dw2hex
1729    cycles for 100 * NoCforMe
813     cycles for 100 * Bin2Hex
1406    cycles for 100 * Bin2Hex Avx2
597     cycles for 100 * dwtoHex_Guga_SSE
740     cycles for 100 * dwtoHex_Guga

6835    cycles for 100 * dw2hex
1709    cycles for 100 * NoCforMe
885     cycles for 100 * Bin2Hex
1419    cycles for 100 * Bin2Hex Avx2
604     cycles for 100 * dwtoHex_Guga_SSE
706     cycles for 100 * dwtoHex_Guga

6841    cycles for 100 * dw2hex
1792    cycles for 100 * NoCforMe
853     cycles for 100 * Bin2Hex
1323    cycles for 100 * Bin2Hex Avx2
608     cycles for 100 * dwtoHex_Guga_SSE
669     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572     bytes for NoCforMe
138     bytes for Bin2Hex
222     bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

-
Title: Re: benchmark for another Bin2Hex, pls
Post by: zedd151 on July 31, 2023, 05:22:15 PM
Quote from: TimoVJL on July 30, 2023, 03:05:17 PM
Quote from: jj2007 on July 30, 2023, 08:10:32 AMWow, this is weird - and it's not an old CPU (https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+9+5950X&id=3862) :rolleyes:

358    cycles for 100 * Bin2Hex
18324  cycles for 100 * Bin2Hex Avx2
Maybe AVX2 module power up on demand.

I'm wondering how is the program calculating cycles on my machine, I have no AVX2 apparently. But looking at the cycle count results, my computer is doing something to get those results... or is that from just running the 'overhead' code? guga? jj2007?
Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 31, 2023, 06:44:45 PM
QuoteI'm wondering how is the program calculating cycles on my machine, I have no AVX2 apparently. But looking at the cycle count results, my computer is doing something to get those results... or is that from just running the 'overhead' code? guga? jj2007?

Have a look at the source ;-)

  ; fall through
MbHxA2 proc
  mov edx, offset MbHexTable    ; edx
  cmp byte ptr [edx], "0"
  jne MbHxCT    ; create the table
Title: Re: benchmark for another Bin2Hex, pls
Post by: Greenhorn on July 31, 2023, 07:23:04 PM
Quote from: TimoVJL on July 31, 2023, 05:18:29 PMAMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

...
1406    cycles for 100 * Bin2Hex Avx2
...

How did you achieved this result, Timo ?
Title: Re: benchmark for another Bin2Hex, pls
Post by: jj2007 on July 31, 2023, 08:06:35 PM
Maybe it's time to close this thread - over 100x faster than the CRT (>170x on AMD) should be enough, right?

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3363    cycles for 100 * dw2hex (Masm32 SDK)
1255    cycles for 100 * NoCforMe
52270   cycles for 100 * CRT sprintf
525     cycles for 100 * Bin2Hex
1900    cycles for 100 * Bin2Hex Avx2
443     cycles for 100 * Bin2Hex JJ
782     cycles for 100 * dwtoHex_Guga_SSE
818     cycles for 100 * dwtoHex_Guga

3339    cycles for 100 * dw2hex (Masm32 SDK)
1255    cycles for 100 * NoCforMe
52685   cycles for 100 * CRT sprintf
525     cycles for 100 * Bin2Hex
1900    cycles for 100 * Bin2Hex Avx2
443     cycles for 100 * Bin2Hex JJ
780     cycles for 100 * dwtoHex_Guga_SSE
817     cycles for 100 * dwtoHex_Guga

3331    cycles for 100 * dw2hex (Masm32 SDK)
1254    cycles for 100 * NoCforMe
52593   cycles for 100 * CRT sprintf
525     cycles for 100 * Bin2Hex
1899    cycles for 100 * Bin2Hex Avx2
443     cycles for 100 * Bin2Hex JJ
781     cycles for 100 * dwtoHex_Guga_SSE
816     cycles for 100 * dwtoHex_Guga

3332    cycles for 100 * dw2hex (Masm32 SDK)
1255    cycles for 100 * NoCforMe
52551   cycles for 100 * CRT sprintf
525     cycles for 100 * Bin2Hex
1899    cycles for 100 * Bin2Hex Avx2
443     cycles for 100 * Bin2Hex JJ
772     cycles for 100 * dwtoHex_Guga_SSE
818     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex (Masm32 SDK)
572     bytes for NoCforMe
29      bytes for CRT sprintf
130     bytes for Bin2Hex
182     bytes for Bin2Hex Avx2
168     bytes for Bin2Hex JJ
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex (Masm32 SDK)
12345678        = eax NoCforMe
12345678        = eax CRT sprintf
12345678        = eax Bin2Hex
No Avx2!        = eax Bin2Hex Avx2
12345678        = eax Bin2Hex JJ
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
5536    cycles for 100 * dw2hex (Masm32 SDK)
1390    cycles for 100 * NoCforMe
47836   cycles for 100 * CRT sprintf
680     cycles for 100 * Bin2Hex
1085    cycles for 100 * Bin2Hex Avx2
287     cycles for 100 * Bin2Hex JJ
493     cycles for 100 * dwtoHex_Guga_SSE
603     cycles for 100 * dwtoHex_Guga

5470    cycles for 100 * dw2hex (Masm32 SDK)
1405    cycles for 100 * NoCforMe
48769   cycles for 100 * CRT sprintf
671     cycles for 100 * Bin2Hex
1074    cycles for 100 * Bin2Hex Avx2
287     cycles for 100 * Bin2Hex JJ
494     cycles for 100 * dwtoHex_Guga_SSE
547     cycles for 100 * dwtoHex_Guga

5461    cycles for 100 * dw2hex (Masm32 SDK)
1428    cycles for 100 * NoCforMe
48496   cycles for 100 * CRT sprintf
669     cycles for 100 * Bin2Hex
1085    cycles for 100 * Bin2Hex Avx2
273     cycles for 100 * Bin2Hex JJ
490     cycles for 100 * dwtoHex_Guga_SSE
547     cycles for 100 * dwtoHex_Guga

5452    cycles for 100 * dw2hex (Masm32 SDK)
1413    cycles for 100 * NoCforMe
47613   cycles for 100 * CRT sprintf
668     cycles for 100 * Bin2Hex
1098    cycles for 100 * Bin2Hex Avx2
273     cycles for 100 * Bin2Hex JJ
490     cycles for 100 * dwtoHex_Guga_SSE
578     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex (Masm32 SDK)
572     bytes for NoCforMe
29      bytes for CRT sprintf
130     bytes for Bin2Hex
182     bytes for Bin2Hex Avx2
168     bytes for Bin2Hex JJ
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex (Masm32 SDK)
12345678        = eax NoCforMe
12345678        = eax CRT sprintf
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax Bin2Hex JJ
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga
Title: Re: benchmark for another Bin2Hex, pls
Post by: fearless on July 31, 2023, 08:52:58 PM
AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4696    cycles for 100 * dw2hex (Masm32 SDK)
1379    cycles for 100 * NoCforMe
26732  cycles for 100 * CRT sprintf
362    cycles for 100 * Bin2Hex
18474  cycles for 100 * Bin2Hex Avx2
587    cycles for 100 * Bin2Hex JJ
428    cycles for 100 * dwtoHex_Guga_SSE
439    cycles for 100 * dwtoHex_Guga

4699    cycles for 100 * dw2hex (Masm32 SDK)
1382    cycles for 100 * NoCforMe
26661  cycles for 100 * CRT sprintf
360    cycles for 100 * Bin2Hex
18560  cycles for 100 * Bin2Hex Avx2
284    cycles for 100 * Bin2Hex JJ
428    cycles for 100 * dwtoHex_Guga_SSE
440    cycles for 100 * dwtoHex_Guga

4708    cycles for 100 * dw2hex (Masm32 SDK)
1380    cycles for 100 * NoCforMe
26639  cycles for 100 * CRT sprintf
360    cycles for 100 * Bin2Hex
18588  cycles for 100 * Bin2Hex Avx2
284    cycles for 100 * Bin2Hex JJ
429    cycles for 100 * dwtoHex_Guga_SSE
439    cycles for 100 * dwtoHex_Guga

4699    cycles for 100 * dw2hex (Masm32 SDK)
1378    cycles for 100 * NoCforMe
26673  cycles for 100 * CRT sprintf
366    cycles for 100 * Bin2Hex
18880  cycles for 100 * Bin2Hex Avx2
289    cycles for 100 * Bin2Hex JJ
430    cycles for 100 * dwtoHex_Guga_SSE
440    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex (Masm32 SDK)
572    bytes for NoCforMe
29      bytes for CRT sprintf
130    bytes for Bin2Hex
182    bytes for Bin2Hex Avx2
168    bytes for Bin2Hex JJ
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex (Masm32 SDK)
12345678        = eax NoCforMe
12345678        = eax CRT sprintf
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax Bin2Hex JJ
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga