News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

benchmark for another Bin2Hex, pls

Started by guga, July 28, 2023, 10:32:41 AM

Previous topic - Next topic

zedd151

With jj's testbed:
Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)

3899    cycles for 100 * dw2hex
1692    cycles for 100 * NoCforMe
697    cycles for 100 * Bin2Hex
1224    cycles for 100 * dwtoHex_Guga_SSE
1374    cycles for 100 * dwtoHex_Guga

4106    cycles for 100 * dw2hex
1692    cycles for 100 * NoCforMe
701    cycles for 100 * Bin2Hex
1228    cycles for 100 * dwtoHex_Guga_SSE
1360    cycles for 100 * dwtoHex_Guga

4142    cycles for 100 * dw2hex
1689    cycles for 100 * NoCforMe
690    cycles for 100 * Bin2Hex
1220    cycles for 100 * dwtoHex_Guga_SSE
1357    cycles for 100 * dwtoHex_Guga

4129    cycles for 100 * dw2hex
1692    cycles for 100 * NoCforMe
690    cycles for 100 * Bin2Hex
1225    cycles for 100 * dwtoHex_Guga_SSE
1356    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

Greenhorn

Quote from: guga on July 29, 2023, 03:07:02 AMTks Greenhorn

That´s a bit wierdi. On your AMD processor, my version produces the same results as in Intel. But, on mine processor, it seems to happens the opposite. I´m wonderin what is happenning for such different results.

I guess it's due to optimization of the Ryzen Arch with each generation.
Would be interesting to see how Ryzen 5000 and 7000 performs.
Kole Feut un Nordenwind gift en krusen Büdel un en lütten Pint.

fearless

Bin2HexTimingsNew2:

AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4671    cycles for 100 * dw2hex
1920    cycles for 100 * Hex$ Grincheux
22845  cycles for 100 * CRT sprintf
361    cycles for 100 * Bin2Hex
368    cycles for 100 * Bin2Hex2 cx
465    cycles for 100 * dwtoHex_Guga_SSE
475    cycles for 100 * dwtoHex_Guga

4671    cycles for 100 * dw2hex
1917    cycles for 100 * Hex$ Grincheux
23276  cycles for 100 * CRT sprintf
364    cycles for 100 * Bin2Hex
368    cycles for 100 * Bin2Hex2 cx
480    cycles for 100 * dwtoHex_Guga_SSE
454    cycles for 100 * dwtoHex_Guga

4698    cycles for 100 * dw2hex
1923    cycles for 100 * Hex$ Grincheux
22942  cycles for 100 * CRT sprintf
362    cycles for 100 * Bin2Hex
357    cycles for 100 * Bin2Hex2 cx
431    cycles for 100 * dwtoHex_Guga_SSE
438    cycles for 100 * dwtoHex_Guga

4682    cycles for 100 * dw2hex
1933    cycles for 100 * Hex$ Grincheux
23104  cycles for 100 * CRT sprintf
362    cycles for 100 * Bin2Hex
367    cycles for 100 * Bin2Hex2 cx
430    cycles for 100 * dwtoHex_Guga_SSE
439    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138    bytes for Bin2Hex
150    bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga



Bin2HexTimingsNew2JJ:

AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4650    cycles for 100 * dw2hex
1359    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
445    cycles for 100 * dwtoHex_Guga_SSE
451    cycles for 100 * dwtoHex_Guga

4657    cycles for 100 * dw2hex
1385    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
485    cycles for 100 * dwtoHex_Guga_SSE
473    cycles for 100 * dwtoHex_Guga

4659    cycles for 100 * dw2hex
1375    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
434    cycles for 100 * dwtoHex_Guga_SSE
436    cycles for 100 * dwtoHex_Guga

4660    cycles for 100 * dw2hex
1376    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
466    cycles for 100 * dwtoHex_Guga_SSE
437    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

jj2007

Quote from: guga on July 29, 2023, 03:09:31 AM
Quote from: jj2007 on July 28, 2023, 11:09:15 PMNo, I never tested it. The documentation is lousy :sad:

Hi JJ

I´ll try to find some more info to see if it can be implemented on x86. I found some few information (in fasm as well), that maybe could give a idea where to start later

https://board.flatassembler.net/topic.php?t=22502
https://www.felixcloutier.com/x86/vgatherdps:vgatherqps
https://stackoverflow.com/questions/21774454/how-are-the-gather-instructions-in-avx2-implemented

At least, now I got the syntax right. This assembles with UAsm64:

  mov eax, offset sometext
  int 3
  nop
  vpgatherdd xmm0, [eax+xmm1], xmm2
  nop
  movd edx, xmm0

TimoVJL

AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

6854    cycles for 100 * dw2hex
2959    cycles for 100 * Hex$ Grincheux
44193   cycles for 100 * CRT sprintf
911     cycles for 100 * Bin2Hex
780     cycles for 100 * Bin2Hex2 cx
620     cycles for 100 * dwtoHex_Guga_SSE
691     cycles for 100 * dwtoHex_Guga

6836    cycles for 100 * dw2hex
2805    cycles for 100 * Hex$ Grincheux
43706   cycles for 100 * CRT sprintf
834     cycles for 100 * Bin2Hex
778     cycles for 100 * Bin2Hex2 cx
621     cycles for 100 * dwtoHex_Guga_SSE
692     cycles for 100 * dwtoHex_Guga

7309    cycles for 100 * dw2hex
2890    cycles for 100 * Hex$ Grincheux
45566   cycles for 100 * CRT sprintf
837     cycles for 100 * Bin2Hex
849     cycles for 100 * Bin2Hex2 cx
625     cycles for 100 * dwtoHex_Guga_SSE
694     cycles for 100 * dwtoHex_Guga

7069    cycles for 100 * dw2hex
2887    cycles for 100 * Hex$ Grincheux
45794   cycles for 100 * CRT sprintf
835     cycles for 100 * Bin2Hex
923     cycles for 100 * Bin2Hex2 cx
626     cycles for 100 * dwtoHex_Guga_SSE
689     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
64      bytes for Hex$ Grincheux
29      bytes for CRT sprintf
138     bytes for Bin2Hex
150     bytes for Bin2Hex2 cx
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

00345678        = eax dw2hex
12345678        = eax Hex$ Grincheux
345678  = eax CRT sprintf
12345678        = eax Bin2Hex
00345678        = eax Bin2Hex2 cx
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

-
May the source be with you

jj2007

#20
One more, please...

There is one Avx2 algo. If you run it on an older cpu, it will simply output "No Avx2" (I hope that works...)

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

5509    cycles for 100 * dw2hex
1372    cycles for 100 * NoCforMe
662    cycles for 100 * Bin2Hex
1082    cycles for 100 * Bin2Hex Avx2
653    cycles for 100 * dwtoHex_Guga_SSE
546    cycles for 100 * dwtoHex_Guga

5480    cycles for 100 * dw2hex
1373    cycles for 100 * NoCforMe
676    cycles for 100 * Bin2Hex
1074    cycles for 100 * Bin2Hex Avx2
495    cycles for 100 * dwtoHex_Guga_SSE
546    cycles for 100 * dwtoHex_Guga

5481    cycles for 100 * dw2hex
1381    cycles for 100 * NoCforMe
702    cycles for 100 * Bin2Hex
1085    cycles for 100 * Bin2Hex Avx2
495    cycles for 100 * dwtoHex_Guga_SSE
546    cycles for 100 * dwtoHex_Guga

5436    cycles for 100 * dw2hex
1520    cycles for 100 * NoCforMe
660    cycles for 100 * Bin2Hex
1074    cycles for 100 * Bin2Hex Avx2
495    cycles for 100 * dwtoHex_Guga_SSE
573    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
198    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

zedd151

Bin2HexTimingsNewG:
Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)

4111    cycles for 100 * dw2hex
1678    cycles for 100 * NoCforMe
1236    cycles for 100 * Bin2Hex
23921  cycles for 100 * Bin2Hex Avx2
1243    cycles for 100 * dwtoHex_Guga_SSE
1360    cycles for 100 * dwtoHex_Guga

4067    cycles for 100 * dw2hex
1678    cycles for 100 * NoCforMe
1242    cycles for 100 * Bin2Hex
23994  cycles for 100 * Bin2Hex Avx2
1294    cycles for 100 * dwtoHex_Guga_SSE
1334    cycles for 100 * dwtoHex_Guga

4079    cycles for 100 * dw2hex
1696    cycles for 100 * NoCforMe
1131    cycles for 100 * Bin2Hex
23931  cycles for 100 * Bin2Hex Avx2
1243    cycles for 100 * dwtoHex_Guga_SSE
1342    cycles for 100 * dwtoHex_Guga

4070    cycles for 100 * dw2hex
1678    cycles for 100 * NoCforMe
1236    cycles for 100 * Bin2Hex
23971  cycles for 100 * Bin2Hex Avx2
1239    cycles for 100 * dwtoHex_Guga_SSE
1364    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
198    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
No Avx2 = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

No Avx2 = eax Bin2Hex Avx2  :tongue:

fearless

Bin2HexTimingsNewG:

AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4677    cycles for 100 * dw2hex
1368    cycles for 100 * NoCforMe
356    cycles for 100 * Bin2Hex
18422  cycles for 100 * Bin2Hex Avx2
442    cycles for 100 * dwtoHex_Guga_SSE
436    cycles for 100 * dwtoHex_Guga

4670    cycles for 100 * dw2hex
1374    cycles for 100 * NoCforMe
361    cycles for 100 * Bin2Hex
18377  cycles for 100 * Bin2Hex Avx2
428    cycles for 100 * dwtoHex_Guga_SSE
442    cycles for 100 * dwtoHex_Guga

4703    cycles for 100 * dw2hex
1370    cycles for 100 * NoCforMe
355    cycles for 100 * Bin2Hex
18341  cycles for 100 * Bin2Hex Avx2
432    cycles for 100 * dwtoHex_Guga_SSE
436    cycles for 100 * dwtoHex_Guga

4731    cycles for 100 * dw2hex
1373    cycles for 100 * NoCforMe
390    cycles for 100 * Bin2Hex
18299  cycles for 100 * Bin2Hex Avx2
464    cycles for 100 * dwtoHex_Guga_SSE
439    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
198    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

jj2007

Ok, there was a little glitch. Corrected version attached, it even builds without MasmBasic.

As you can see, the Avx2 version is not particularly fast. However, it was fun to code it, and I almost understood one of the exotic gather instructions ;-)

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

5513    cycles for 100 * dw2hex
1383    cycles for 100 * NoCforMe
674     cycles for 100 * Bin2Hex
1077    cycles for 100 * Bin2Hex Avx2
493     cycles for 100 * dwtoHex_Guga_SSE
565     cycles for 100 * dwtoHex_Guga

5470    cycles for 100 * dw2hex
1377    cycles for 100 * NoCforMe
672     cycles for 100 * Bin2Hex
1075    cycles for 100 * Bin2Hex Avx2
501     cycles for 100 * dwtoHex_Guga_SSE
549     cycles for 100 * dwtoHex_Guga

5439    cycles for 100 * dw2hex
1375    cycles for 100 * NoCforMe
668     cycles for 100 * Bin2Hex
1077    cycles for 100 * Bin2Hex Avx2
490     cycles for 100 * dwtoHex_Guga_SSE
551     cycles for 100 * dwtoHex_Guga

5543    cycles for 100 * dw2hex
1377    cycles for 100 * NoCforMe
716     cycles for 100 * Bin2Hex
1088    cycles for 100 * Bin2Hex Avx2
491     cycles for 100 * dwtoHex_Guga_SSE
555     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572     bytes for NoCforMe
138     bytes for Bin2Hex
222     bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga


fearless

AMD Ryzen 9 5950X 16-Core Processor            (SSE4)

4713    cycles for 100 * dw2hex
1381    cycles for 100 * NoCforMe
358    cycles for 100 * Bin2Hex
18324  cycles for 100 * Bin2Hex Avx2
433    cycles for 100 * dwtoHex_Guga_SSE
452    cycles for 100 * dwtoHex_Guga

4699    cycles for 100 * dw2hex
1376    cycles for 100 * NoCforMe
357    cycles for 100 * Bin2Hex
18193  cycles for 100 * Bin2Hex Avx2
432    cycles for 100 * dwtoHex_Guga_SSE
438    cycles for 100 * dwtoHex_Guga

4716    cycles for 100 * dw2hex
1380    cycles for 100 * NoCforMe
357    cycles for 100 * Bin2Hex
18365  cycles for 100 * Bin2Hex Avx2
430    cycles for 100 * dwtoHex_Guga_SSE
438    cycles for 100 * dwtoHex_Guga

4695    cycles for 100 * dw2hex
1376    cycles for 100 * NoCforMe
357    cycles for 100 * Bin2Hex
18265  cycles for 100 * Bin2Hex Avx2
428    cycles for 100 * dwtoHex_Guga_SSE
450    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
222    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

jj2007

Wow, this is weird - and it's not an old CPU :rolleyes:

358    cycles for 100 * Bin2Hex
18324  cycles for 100 * Bin2Hex Avx2

zedd151

Bin2HexTimingsM32
Intel(R) Core(TM)2 Duo CPU    E8400  @ 3.00GHz (SSE4)

4101    cycles for 100 * dw2hex
1689    cycles for 100 * NoCforMe
698    cycles for 100 * Bin2Hex
1187    cycles for 100 * Bin2Hex Avx2
1223    cycles for 100 * dwtoHex_Guga_SSE
1357    cycles for 100 * dwtoHex_Guga

4099    cycles for 100 * dw2hex
1689    cycles for 100 * NoCforMe
691    cycles for 100 * Bin2Hex
1187    cycles for 100 * Bin2Hex Avx2
1218    cycles for 100 * dwtoHex_Guga_SSE
1356    cycles for 100 * dwtoHex_Guga

4101    cycles for 100 * dw2hex
1689    cycles for 100 * NoCforMe
782    cycles for 100 * Bin2Hex
1191    cycles for 100 * Bin2Hex Avx2
1219    cycles for 100 * dwtoHex_Guga_SSE
1358    cycles for 100 * dwtoHex_Guga

4093    cycles for 100 * dw2hex
1701    cycles for 100 * NoCforMe
714    cycles for 100 * Bin2Hex
1187    cycles for 100 * Bin2Hex Avx2
1218    cycles for 100 * dwtoHex_Guga_SSE
1351    cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572    bytes for NoCforMe
138    bytes for Bin2Hex
222    bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
No Avx2!        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

--- ok ---

TimoVJL

Quote from: jj2007 on July 30, 2023, 08:10:32 AMWow, this is weird - and it's not an old CPU :rolleyes:

358    cycles for 100 * Bin2Hex
18324  cycles for 100 * Bin2Hex Avx2
Maybe AVX2 module power up on demand.
May the source be with you

TimoVJL

AMD Ryzen 5 3400G with Radeon Vega Graphics     (SSE4)

6837    cycles for 100 * dw2hex
1796    cycles for 100 * NoCforMe
811     cycles for 100 * Bin2Hex
1406    cycles for 100 * Bin2Hex Avx2
602     cycles for 100 * dwtoHex_Guga_SSE
675     cycles for 100 * dwtoHex_Guga

6862    cycles for 100 * dw2hex
1729    cycles for 100 * NoCforMe
813     cycles for 100 * Bin2Hex
1406    cycles for 100 * Bin2Hex Avx2
597     cycles for 100 * dwtoHex_Guga_SSE
740     cycles for 100 * dwtoHex_Guga

6835    cycles for 100 * dw2hex
1709    cycles for 100 * NoCforMe
885     cycles for 100 * Bin2Hex
1419    cycles for 100 * Bin2Hex Avx2
604     cycles for 100 * dwtoHex_Guga_SSE
706     cycles for 100 * dwtoHex_Guga

6841    cycles for 100 * dw2hex
1792    cycles for 100 * NoCforMe
853     cycles for 100 * Bin2Hex
1323    cycles for 100 * Bin2Hex Avx2
608     cycles for 100 * dwtoHex_Guga_SSE
669     cycles for 100 * dwtoHex_Guga

20      bytes for dw2hex
572     bytes for NoCforMe
138     bytes for Bin2Hex
222     bytes for Bin2Hex Avx2
245848  bytes for dwtoHex_Guga_SSE
76      bytes for dwtoHex_Guga

12345678        = eax dw2hex
12345678        = eax NoCforMe
12345678        = eax Bin2Hex
12345678        = eax Bin2Hex Avx2
12345678        = eax dwtoHex_Guga_SSE
12345678        = eax dwtoHex_Guga

-
May the source be with you

zedd151

Quote from: TimoVJL on July 30, 2023, 03:05:17 PM
Quote from: jj2007 on July 30, 2023, 08:10:32 AMWow, this is weird - and it's not an old CPU :rolleyes:

358    cycles for 100 * Bin2Hex
18324  cycles for 100 * Bin2Hex Avx2
Maybe AVX2 module power up on demand.

I'm wondering how is the program calculating cycles on my machine, I have no AVX2 apparently. But looking at the cycle count results, my computer is doing something to get those results... or is that from just running the 'overhead' code? guga? jj2007?