News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

benchmark for another Bin2Hex, pls

Started by guga, July 28, 2023, 10:32:41 AM

Previous topic - Next topic

guga

Hi guys

cant someone test this Bin2hex i adapted from an old one ?

I rebuuld 2 of them to test:
dwtoHex_Guga_SSE
FastHex (Its similar to above, but don´t use SSE -  I didn´t changed the original name, just puted my code onto it)

Both uses a huge hexadecimal table for my tests.

Note: Despite the speed, the downside is that the function is huge, because of the table.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

zedd151

Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)

4149    cycles for 100 * dw2hex
4401    cycles for 100 * Hex$ Grincheux
61610  cycles for 100 * CRT sprintf
685    cycles for 100 * Bin2Hex
785    cycles for 100 * Bin2Hex2 cx
1286    cycles for 100 * dwtoHex_Guga_SSE
1434    cycles for 100 * dwtoHex_Guga

4070    cycles for 100 * dw2hex
4397    cycles for 100 * Hex$ Grincheux
61441  cycles for 100 * CRT sprintf
698    cycles for 100 * Bin2Hex
789    cycles for 100 * Bin2Hex2 cx
1285    cycles for 100 * dwtoHex_Guga_SSE
1436    cycles for 100 * dwtoHex_Guga

4073    cycles for 100 * dw2hex
4391    cycles for 100 * Hex$ Grincheux
61415  cycles for 100 * CRT sprintf
681    cycles for 100 * Bin2Hex
786    cycles for 100 * Bin2Hex2 cx
1287    cycles for 100 * dwtoHex_Guga_SSE
1434    cycles for 100 * dwtoHex_Guga

4070    cycles for 100 * dw2hex
4392    cycles for 100 * Hex$ Grincheux
61459  cycles for 100 * CRT sprintf
683    cycles for 100 * Bin2Hex
785    cycles for 100 * Bin2Hex2 cx
1289    cycles for 100 * dwtoHex_Guga_SSE
1437    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138    bytes for Bin2Hex
150    bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

second run
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (SSE4)

4076    cycles for 100 * dw2hex
4403    cycles for 100 * Hex$ Grincheux
61353   cycles for 100 * CRT sprintf
691     cycles for 100 * Bin2Hex
795     cycles for 100 * Bin2Hex2 cx
1301    cycles for 100 * dwtoHex_Guga_SSE
1444    cycles for 100 * dwtoHex_Guga

4080    cycles for 100 * dw2hex
4401    cycles for 100 * Hex$ Grincheux
61223   cycles for 100 * CRT sprintf
708     cycles for 100 * Bin2Hex
800     cycles for 100 * Bin2Hex2 cx
1295    cycles for 100 * dwtoHex_Guga_SSE
1444    cycles for 100 * dwtoHex_Guga

4309    cycles for 100 * dw2hex
4419    cycles for 100 * Hex$ Grincheux
61311   cycles for 100 * CRT sprintf
707     cycles for 100 * Bin2Hex
812     cycles for 100 * Bin2Hex2 cx
1316    cycles for 100 * dwtoHex_Guga_SSE
1460    cycles for 100 * dwtoHex_Guga

4092    cycles for 100 * dw2hex
4413    cycles for 100 * Hex$ Grincheux
61683   cycles for 100 * CRT sprintf
708     cycles for 100 * Bin2Hex
811     cycles for 100 * Bin2Hex2 cx
1315    cycles for 100 * dwtoHex_Guga_SSE
1444    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

Gunther

guga,

here are my results:
2787    cycles for 100 * dw2hex
2400    cycles for 100 * Hex$ Grincheux
41084   cycles for 100 * CRT sprintf
505     cycles for 100 * Bin2Hex
534     cycles for 100 * Bin2Hex2 cx
616     cycles for 100 * dwtoHex_Guga_SSE
658     cycles for 100 * dwtoHex_Guga

2807    cycles for 100 * dw2hex
2454    cycles for 100 * Hex$ Grincheux
41195   cycles for 100 * CRT sprintf
501     cycles for 100 * Bin2Hex
502     cycles for 100 * Bin2Hex2 cx
612     cycles for 100 * dwtoHex_Guga_SSE
659     cycles for 100 * dwtoHex_Guga

2886    cycles for 100 * dw2hex
2294    cycles for 100 * Hex$ Grincheux
40487   cycles for 100 * CRT sprintf
506     cycles for 100 * Bin2Hex
505     cycles for 100 * Bin2Hex2 cx
666     cycles for 100 * dwtoHex_Guga_SSE
694     cycles for 100 * dwtoHex_Guga

2751    cycles for 100 * dw2hex
2453    cycles for 100 * Hex$ Grincheux
40054   cycles for 100 * CRT sprintf
506     cycles for 100 * Bin2Hex
522     cycles for 100 * Bin2Hex2 cx
625     cycles for 100 * dwtoHex_Guga_SSE
662     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

I hope this has been helpful to you.
You have to know the facts before you can distort them.

guga

Tks guys

I´m trying to see if the algorithm is worthfull to maintain with a huge table, but it seems that on Intel, it looses speed somehow ? The original version is closer to the results from Bin2Hex, but AMD seems to handle better such huge tables. Maybe doing huge tables like that is not worthfull because at the end we are talking about a difference of less then 30%-40% of speed that also seems to depends heavily on the processor ?

From yours results, that's a bit surprising for me. I though Intel would deal better on such situations :dazzled: .

AMD Ryzen 5 2400G with Radeon Vega Graphics     (SSE4)

7155    cycles for 100 * dw2hex
2999    cycles for 100 * Hex$ Grincheux
46349   cycles for 100 * CRT sprintf
1017    cycles for 100 * Bin2Hex
854     cycles for 100 * Bin2Hex2 cx
655     cycles for 100 * dwtoHex_Guga_SSE
761     cycles for 100 * dwtoHex_Guga

7213    cycles for 100 * dw2hex
2883    cycles for 100 * Hex$ Grincheux
45477   cycles for 100 * CRT sprintf
863     cycles for 100 * Bin2Hex
893     cycles for 100 * Bin2Hex2 cx
655     cycles for 100 * dwtoHex_Guga_SSE
732     cycles for 100 * dwtoHex_Guga

7224    cycles for 100 * dw2hex
2890    cycles for 100 * Hex$ Grincheux
44463   cycles for 100 * CRT sprintf
866     cycles for 100 * Bin2Hex
815     cycles for 100 * Bin2Hex2 cx
693     cycles for 100 * dwtoHex_Guga_SSE
732     cycles for 100 * dwtoHex_Guga

7516    cycles for 100 * dw2hex
3078    cycles for 100 * Hex$ Grincheux
46002   cycles for 100 * CRT sprintf
929     cycles for 100 * Bin2Hex
880     cycles for 100 * Bin2Hex2 cx
661     cycles for 100 * dwtoHex_Guga_SSE
804     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3333    cycles for 100 * dw2hex
3600    cycles for 100 * Hex$ Grincheux
42239  cycles for 100 * CRT sprintf
540    cycles for 100 * Bin2Hex
629    cycles for 100 * Bin2Hex2 cx
736    cycles for 100 * dwtoHex_Guga_SSE
837    cycles for 100 * dwtoHex_Guga

3331    cycles for 100 * dw2hex
3600    cycles for 100 * Hex$ Grincheux
42094  cycles for 100 * CRT sprintf
565    cycles for 100 * Bin2Hex
621    cycles for 100 * Bin2Hex2 cx
732    cycles for 100 * dwtoHex_Guga_SSE
815    cycles for 100 * dwtoHex_Guga

3318    cycles for 100 * dw2hex
3602    cycles for 100 * Hex$ Grincheux
42300  cycles for 100 * CRT sprintf
555    cycles for 100 * Bin2Hex
621    cycles for 100 * Bin2Hex2 cx
829    cycles for 100 * dwtoHex_Guga_SSE
843    cycles for 100 * dwtoHex_Guga

3328    cycles for 100 * dw2hex
3603    cycles for 100 * Hex$ Grincheux
42090  cycles for 100 * CRT sprintf
539    cycles for 100 * Bin2Hex
621    cycles for 100 * Bin2Hex2 cx
732    cycles for 100 * dwtoHex_Guga_SSE
814    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138    bytes for Bin2Hex
150    bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

It's difficult to beat Bin2Hex ;-)

zedd151

I have an excuse... my computer is old and tired, like me.  :tongue:
Seeing your results  guga, compared to mine... I feel a little less old, and a little less tired. ... at least partly... :biggrin:
 Some fare better on your machine, others fare better on mine

guga

Quote from: jj2007 on July 28, 2023, 11:24:33 AMIntel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3333    cycles for 100 * dw2hex
3600    cycles for 100 * Hex$ Grincheux
42239   cycles for 100 * CRT sprintf
540     cycles for 100 * Bin2Hex
629     cycles for 100 * Bin2Hex2 cx
736     cycles for 100 * dwtoHex_Guga_SSE
837     cycles for 100 * dwtoHex_Guga

3331    cycles for 100 * dw2hex
3600    cycles for 100 * Hex$ Grincheux
42094   cycles for 100 * CRT sprintf
565     cycles for 100 * Bin2Hex
621     cycles for 100 * Bin2Hex2 cx
732     cycles for 100 * dwtoHex_Guga_SSE
815     cycles for 100 * dwtoHex_Guga

3318    cycles for 100 * dw2hex
3602    cycles for 100 * Hex$ Grincheux
42300   cycles for 100 * CRT sprintf
555     cycles for 100 * Bin2Hex
621     cycles for 100 * Bin2Hex2 cx
829     cycles for 100 * dwtoHex_Guga_SSE
843     cycles for 100 * dwtoHex_Guga

3328    cycles for 100 * dw2hex
3603    cycles for 100 * Hex$ Grincheux
42090   cycles for 100 * CRT sprintf
539     cycles for 100 * Bin2Hex
621     cycles for 100 * Bin2Hex2 cx
732     cycles for 100 * dwtoHex_Guga_SSE
814     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

Hi JJ. Tks...


Those results are a bit odd for me. It seems that on AMD dwtoHex_Guga_SSE is faster then Bin2Hex and on Intel, it´s the opposite.


The difference between both are not that big, but since this new version uses a huge table, i´ll keep the good old Bin2Hex. The benefits of speed (at least on my AMD) don´t seems to compensates the final size of the function - 245848 bytes (due to the tbl size).
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

NoCforMe

OK, since we seem to be racing different binary--> hex conversion routines, let me throw mine into the mix:

XlatTable    DB "0123456789ABCDEF"

;====================================
; Bin2Hex()
;
; On entry,
;    ECX--> buffer to write hex chars. to
;    EDX = value to convert
;====================================

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

Bin2Hex    PROC

    PUSH    EBX
    PUSH    EDI

    MOV    EBX, OFFSET XlatTable
    MOV    EDI, ECX
    MOV    CL, 24                ;# bits to shift right.
    MOV    CH, 4                ;# of bytes to convert.

hloop:    MOV    EAX, EDX            ;Original value.
    SHR    EAX, CL                ;Shift byte to right.
    MOV    AH, AL                ;Save byte.
    SHR    AL, 4                ;Get high nybble.
    XLATB
    STOSB
    MOV    AL, AH                ;Get back byte.
    AND    AL, 0FH                ;Get low nybble.
    XLATB
    STOSB
    SUB    CL, 8                ;Next byte over.
    DEC    CH
    JNZ    hloop

    MOV    BYTE PTR [EDI], 0        ;Terminate buffer.

    POP    EDI
    POP    EBX
    RET

Bin2Hex    ENDP

Anyone care to test it?

I did not design this at all with speed in mind (because I just don't care), but I'm curious nonetheless. I did eliminate all memory references inside the loop except for XLATB and STOSB.
Assembly language programming should be fun. That's why I do it.

NoCforMe

My entry #2: longer table, shorter code:

XlatTable    DB "000102030405060708090A0B0C0D0E0F"
        DB "101112131415161718191A1B1C1D1E1F"
        DB "202122232425262728292A2B2C2D2E2F"
        DB "303132333435363738393A3B3C3D3E3F"
        DB "404142434445464748494A4B4C4D4E4F"
        DB "505152535455565758595A5B5C5D5E5F"
        DB "606162636465666768696A6B6C6D6E6F"
        DB "707172737475767778797A7B7C7D7E7F"
        DB "808182838485868788898A8B8C8D8E8F"
        DB "909192939495969798999A9B9C9D9E9F"
        DB "A0A1A2A3A4A5A6A7A819AAABACADAEAF"
        DB "B011B2B3B4B5B6B7B8B9BABBBCBDBEBF"
        DB "C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF"
        DB "D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF"
        DB "E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF"
        DB "F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF"

;====================================
; Bin2Hex()
;
; On entry,
;    ECX--> buffer to write hex chars. to
;    EDX = value to convert
;====================================

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

Bin2Hex    PROC

    PUSH    EBX
    PUSH    EDI

    MOV    EBX, OFFSET XlatTable
    MOV    EDI, ECX
    MOV    CL, 24                ;# bits to shift right.
    MOV    CH, 4                ;# of bytes to convert.

hloop:    MOV    EAX, EDX            ;Original value.
    SHR    EAX, CL                ;Shift byte to right.
    AND    EAX, 0FFH            ;Isolate that byte.
    MOV    AX, [EBX + EAX*2]        ;Translate byte.
    STOSW
    SUB    CL, 8                ;Next byte over.
    DEC    CH
    JNZ    hloop

    MOV    BYTE PTR [EDI], 0        ;Terminate buffer.

    POP    EDI
    POP    EBX
    RET

Bin2Hex    ENDP

Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: guga on July 28, 2023, 11:40:19 AMThose results are a bit odd for me. It seems that on AMD dwtoHex_Guga_SSE is faster then Bin2Hex and on Intel, it´s the opposite.


The difference between both are not that big, but since this new version uses a huge table, i´ll keep the good old Bin2Hex. The benefits of speed (at least on my AMD) don´t seems to compensates the final size of the function - 245848 bytes (due to the tbl size).

Bin2Hex is pretty fast, indeed, and I've just spent hours trying to make it faster, without success.

Have you thought of gathering instructions such as VGATHERDPS?

Btw if table size bothers you, look at crtHexTable3 in the source :cool:

New version with NoCforMe's #2:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

3341    cycles for 100 * dw2hex
1358    cycles for 100 * NoCforMe
540    cycles for 100 * Bin2Hex
739    cycles for 100 * dwtoHex_Guga_SSE
833    cycles for 100 * dwtoHex_Guga

3336    cycles for 100 * dw2hex
1270    cycles for 100 * NoCforMe
540    cycles for 100 * Bin2Hex
734    cycles for 100 * dwtoHex_Guga_SSE
789    cycles for 100 * dwtoHex_Guga

3336    cycles for 100 * dw2hex
1271    cycles for 100 * NoCforMe
540    cycles for 100 * Bin2Hex
732    cycles for 100 * dwtoHex_Guga_SSE
1413    cycles for 100 * dwtoHex_Guga

3363    cycles for 100 * dw2hex
1270    cycles for 100 * NoCforMe
540    cycles for 100 * Bin2Hex
731    cycles for 100 * dwtoHex_Guga_SSE
790    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

guga

QuoteBin2Hex is pretty fast, indeed, and I've just spent hours trying to make it faster, without success.

Have you thought of gathering instructions such as VGATHERDPS?

Btw if table size bothers you, look at crtHexTable3 in the source :cool:

Hi JJ

Yeah, on Intel bin2hex is pretty fast. I´ll keep with it. Seems, in fact, not necessary for me reinventing the wheel becauuse mine variation only works faster on AMD and uses a huge table at teh costr of only 20-30% of better performance. I could, hoeever make a variation of suuch functions that checks the processor, but, it will definitelly kill the general performance anyway.

About VGATHERDPS, i didn´t implemented it yet. Is it for x86 (32 bits) ? Do you have an example using such instrucions ? I´ll try to port it later when i have more free time.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

There is indeed quite a difference between Intel and AMD:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

5494    cycles for 100 * dw2hex
1381    cycles for 100 * NoCforMe
675    cycles for 100 * Bin2Hex
496    cycles for 100 * dwtoHex_Guga_SSE
562    cycles for 100 * dwtoHex_Guga

5558    cycles for 100 * dw2hex
1425    cycles for 100 * NoCforMe
680    cycles for 100 * Bin2Hex
515    cycles for 100 * dwtoHex_Guga_SSE
564    cycles for 100 * dwtoHex_Guga

5536    cycles for 100 * dw2hex
1469    cycles for 100 * NoCforMe
676    cycles for 100 * Bin2Hex
466    cycles for 100 * dwtoHex_Guga_SSE
567    cycles for 100 * dwtoHex_Guga

5520    cycles for 100 * dw2hex
1438    cycles for 100 * NoCforMe
692    cycles for 100 * Bin2Hex
534    cycles for 100 * dwtoHex_Guga_SSE
579    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

Quote from: guga on July 28, 2023, 10:52:54 PMAbout VGATHERDPS, i didn´t implemented it yet. Is it for x86 (32 bits) ? Do you have an example using such instrucions ? I´ll try to port it later when i have more free time.

No, I never tested it. The documentation is lousy :sad:

Greenhorn

Another test result.

AMD Ryzen 7 3700X 8-Core Processor              (SSE4)

5736 cycles for 100 * dw2hex
2189 cycles for 100 * Hex$ Grincheux
29319 cycles for 100 * CRT sprintf
360 cycles for 100 * Bin2Hex
397 cycles for 100 * Bin2Hex2 cx
499 cycles for 100 * dwtoHex_Guga_SSE
523 cycles for 100 * dwtoHex_Guga

5731 cycles for 100 * dw2hex
2479 cycles for 100 * Hex$ Grincheux
29270 cycles for 100 * CRT sprintf
363 cycles for 100 * Bin2Hex
383 cycles for 100 * Bin2Hex2 cx
497 cycles for 100 * dwtoHex_Guga_SSE
525 cycles for 100 * dwtoHex_Guga

5741 cycles for 100 * dw2hex
2483 cycles for 100 * Hex$ Grincheux
29275 cycles for 100 * CRT sprintf
365 cycles for 100 * Bin2Hex
385 cycles for 100 * Bin2Hex2 cx
494 cycles for 100 * dwtoHex_Guga_SSE
524 cycles for 100 * dwtoHex_Guga

5732 cycles for 100 * dw2hex
2245 cycles for 100 * Hex$ Grincheux
29267 cycles for 100 * CRT sprintf
362 cycles for 100 * Bin2Hex
385 cycles for 100 * Bin2Hex2 cx
496 cycles for 100 * dwtoHex_Guga_SSE
523 cycles for 100 * dwtoHex_Guga

20 bytes for dw2hex
64 bytes for Hex$ Grincheux
29 bytes for CRT sprintf
138 bytes for Bin2Hex
150 bytes for Bin2Hex2 cx
245848 bytes for dwtoHex_Guga_SSE
76 bytes for dwtoHex_Guga

00345678 = eax dw2hex
12345678 = eax Hex$ Grincheux
345678 = eax CRT sprintf
12345678 = eax Bin2Hex
00345678 = eax Bin2Hex2 cx
12345678 = eax dwtoHex_Guga_SSE
12345678 = eax dwtoHex_Guga

--- ok ---
Kole Feut un Nordenwind gift en krusen Büdel un en lütten Pint.

guga

Tks Greenhorn

That´s a bit wierdi. On your AMD processor, my version produces the same results as in Intel. But, on mine processor, it seems to happens the opposite. I´m wonderin what is happenning for such different results.

AMD Ryzen 5 2400G with Radeon Vega Graphics     (SSE4)

9124    cycles for 100 * dw2hex
2980    cycles for 100 * Hex$ Grincheux
51106   cycles for 100 * CRT sprintf
864     cycles for 100 * Bin2Hex
807     cycles for 100 * Bin2Hex2 cx
623     cycles for 100 * dwtoHex_Guga_SSE
713     cycles for 100 * dwtoHex_Guga

7591    cycles for 100 * dw2hex
4216    cycles for 100 * Hex$ Grincheux
47734   cycles for 100 * CRT sprintf
899     cycles for 100 * Bin2Hex
814     cycles for 100 * Bin2Hex2 cx
657     cycles for 100 * dwtoHex_Guga_SSE
716     cycles for 100 * dwtoHex_Guga

7496    cycles for 100 * dw2hex
5396    cycles for 100 * Hex$ Grincheux
55117   cycles for 100 * CRT sprintf
936     cycles for 100 * Bin2Hex
1289    cycles for 100 * Bin2Hex2 cx
858     cycles for 100 * dwtoHex_Guga_SSE
708     cycles for 100 * dwtoHex_Guga

8734    cycles for 100 * dw2hex
2919    cycles for 100 * Hex$ Grincheux
46808   cycles for 100 * CRT sprintf
900     cycles for 100 * Bin2Hex
807     cycles for 100 * Bin2Hex2 cx
860     cycles for 100 * dwtoHex_Guga_SSE
1015    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Quote from: jj2007 on July 28, 2023, 11:09:15 PMNo, I never tested it. The documentation is lousy :sad:

Hi JJ

I´ll try to find some more info to see if it can be implemented on x86. I found some few information (in fasm as well), that maybe could give a idea where to start later

https://board.flatassembler.net/topic.php?t=22502
https://www.felixcloutier.com/x86/vgatherdps:vgatherqps
https://stackoverflow.com/questions/21774454/how-are-the-gather-instructions-in-avx2-implemented
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com