News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Dword to Bin$

Started by jj2007, April 14, 2024, 11:01:23 PM

Previous topic - Next topic

jj2007

Ultimately, I often read posts (not here) by "experts" who claim the C/C++ are faster than assembly because, y'know, those compiler developers are real geniuses.

The geniuses have so far no decent algo to offer for translating a number to a binary string. However, you can you use invoke crt__itoa, DDvalue, ADDR _rv_bin_string_, 2 to achieve that (credits to MichaelW).
Does it work? Yes :thumbsup:
Is it faster than Assembly? Not really :rolleyes:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

6449    cycles for 100 * bin$ (CRT)
991     cycles for 100 * Bin$ (MasmBasic)
627     cycles for 100 * new Bin$

6537    cycles for 100 * bin$ (CRT)
993     cycles for 100 * Bin$ (MasmBasic)
629     cycles for 100 * new Bin$

6420    cycles for 100 * bin$ (CRT)
975     cycles for 100 * Bin$ (MasmBasic)
630     cycles for 100 * new Bin$

6471    cycles for 100 * bin$ (CRT)
981     cycles for 100 * Bin$ (MasmBasic)
618     cycles for 100 * new Bin$

6427    cycles for 100 * bin$ (CRT)
988     cycles for 100 * Bin$ (MasmBasic)
627     cycles for 100 * new Bin$

Averages:
6449    cycles for bin$ (CRT)
987     cycles for Bin$ (MasmBasic)
628     cycles for new Bin$

bin$ (CRT)                              11101010101010101010000111000111
Bin$ (MasmBasic)                        11101010101010101010000111000111
new Bin$                                11101010101010101010000111000111

Usually, we are already happy if we beat the CRT by a factor 2 :mrgreen:

jj2007

Version 2
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
loop overhead is approx. 106/100 cycles

240    cycles for 100 * bin$ (CRT)
975    cycles for 100 * Bin$ (MasmBasic)
553    cycles for 100 * new Bin$ P2 (4*glob/al, ah)
629    cycles for 100 * new Bin$ P3 (edi/al, ah)
570    cycles for 100 * new Bin$ P4 (4*glob/shr)
626    cycles for 100 * new Bin$ P5 (edi/shr)

234    cycles for 100 * bin$ (CRT)
986    cycles for 100 * Bin$ (MasmBasic)
552    cycles for 100 * new Bin$ P2 (4*glob/al, ah)
636    cycles for 100 * new Bin$ P3 (edi/al, ah)
552    cycles for 100 * new Bin$ P4 (4*glob/shr)
627    cycles for 100 * new Bin$ P5 (edi/shr)

254    cycles for 100 * bin$ (CRT)
975    cycles for 100 * Bin$ (MasmBasic)
560    cycles for 100 * new Bin$ P2 (4*glob/al, ah)
630    cycles for 100 * new Bin$ P3 (edi/al, ah)
552    cycles for 100 * new Bin$ P4 (4*glob/shr)
633    cycles for 100 * new Bin$ P5 (edi/shr)

230    cycles for 100 * bin$ (CRT)
977    cycles for 100 * Bin$ (MasmBasic)
550    cycles for 100 * new Bin$ P2 (4*glob/al, ah)
628    cycles for 100 * new Bin$ P3 (edi/al, ah)
559    cycles for 100 * new Bin$ P4 (4*glob/shr)
624    cycles for 100 * new Bin$ P5 (edi/shr)

241    cycles for 100 * bin$ (CRT)
986    cycles for 100 * Bin$ (MasmBasic)
552    cycles for 100 * new Bin$ P2 (4*glob/al, ah)
632    cycles for 100 * new Bin$ P3 (edi/al, ah)
553    cycles for 100 * new Bin$ P4 (4*glob/shr)
638    cycles for 100 * new Bin$ P5 (edi/shr)

Averages:
238    cycles for bin$ (CRT)
979    cycles for Bin$ (MasmBasic)
552    cycles for new Bin$ P2 (4*glob/al, ah)
630    cycles for new Bin$ P3 (edi/al, ah)
555    cycles for new Bin$ P4 (4*glob/shr)
629    cycles for new Bin$ P5 (edi/shr)

bin$ (CRT)                              11101010101010101010000111000111
Bin$ (MasmBasic)                        11101010101010101010000111000111
new Bin$ P2 (4*glob/al, ah)            11101010101010101010000111000111
new Bin$ P3 (edi/al, ah)                11101010101010101010000111000111
new Bin$ P4 (4*glob/shr)                11101010101010101010000111000111
new Bin$ P5 (edi/shr)                  11101010101010101010000111000111

Note that bin$ (CRT) runs with one tenth of the iterations.

Now it would be nice to see some Intel timings :rolleyes:

jj2007

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
Averages:
288     cycles for bin$ (CRT)
1112    cycles for Bin$ (MasmBasic)
720     cycles for new Bin$ P2 (4*glob/al, ah)
714     cycles for new Bin$ P3 (edi/al, ah)
637     cycles for new Bin$ P4 (4*glob/shr)
725     cycles for new Bin$ P5 (edi/shr)

sinsi

Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz (SSE4)
loop overhead is approx. 79/100 cycles

Averages:
337     cycles for bin$ (CRT)
825     cycles for Bin$ (MasmBasic)
434     cycles for new Bin$ P2 (4*glob/al, ah)
495     cycles for new Bin$ P3 (edi/al, ah)
426     cycles for new Bin$ P4 (4*glob/shr)
486     cycles for new Bin$ P5 (edi/shr)

jj2007

Thanks, Sinsi - so P4 wins. The original version dates from 2008, 16 years ago. and in my source I noted "credits go to Sinsi" ;-)