Ultimately, I often read posts (not here) by "experts" who claim the C/C++ are faster than assembly because, y'know, those compiler developers are real geniuses.
The geniuses have so far no decent algo to offer for translating a number to a binary string. However, you can you use invoke crt__itoa, DDvalue, ADDR _rv_bin_string_, 2 to achieve that (credits to MichaelW (https://masm32.com/board/index.php?msg=18357)).
Does it work? Yes :thumbsup:
Is it faster than Assembly? Not really :rolleyes:
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
6449 cycles for 100 * bin$ (CRT)
991 cycles for 100 * Bin$ (MasmBasic)
627 cycles for 100 * new Bin$
6537 cycles for 100 * bin$ (CRT)
993 cycles for 100 * Bin$ (MasmBasic)
629 cycles for 100 * new Bin$
6420 cycles for 100 * bin$ (CRT)
975 cycles for 100 * Bin$ (MasmBasic)
630 cycles for 100 * new Bin$
6471 cycles for 100 * bin$ (CRT)
981 cycles for 100 * Bin$ (MasmBasic)
618 cycles for 100 * new Bin$
6427 cycles for 100 * bin$ (CRT)
988 cycles for 100 * Bin$ (MasmBasic)
627 cycles for 100 * new Bin$
Averages:
6449 cycles for bin$ (CRT)
987 cycles for Bin$ (MasmBasic)
628 cycles for new Bin$
bin$ (CRT) 11101010101010101010000111000111
Bin$ (MasmBasic) 11101010101010101010000111000111
new Bin$ 11101010101010101010000111000111
Usually, we are already happy if we beat the CRT by a factor 2 :mrgreen:
Version 2
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
loop overhead is approx. 106/100 cycles
240 cycles for 100 * bin$ (CRT)
975 cycles for 100 * Bin$ (MasmBasic)
553 cycles for 100 * new Bin$ P2 (4*glob/al, ah)
629 cycles for 100 * new Bin$ P3 (edi/al, ah)
570 cycles for 100 * new Bin$ P4 (4*glob/shr)
626 cycles for 100 * new Bin$ P5 (edi/shr)
234 cycles for 100 * bin$ (CRT)
986 cycles for 100 * Bin$ (MasmBasic)
552 cycles for 100 * new Bin$ P2 (4*glob/al, ah)
636 cycles for 100 * new Bin$ P3 (edi/al, ah)
552 cycles for 100 * new Bin$ P4 (4*glob/shr)
627 cycles for 100 * new Bin$ P5 (edi/shr)
254 cycles for 100 * bin$ (CRT)
975 cycles for 100 * Bin$ (MasmBasic)
560 cycles for 100 * new Bin$ P2 (4*glob/al, ah)
630 cycles for 100 * new Bin$ P3 (edi/al, ah)
552 cycles for 100 * new Bin$ P4 (4*glob/shr)
633 cycles for 100 * new Bin$ P5 (edi/shr)
230 cycles for 100 * bin$ (CRT)
977 cycles for 100 * Bin$ (MasmBasic)
550 cycles for 100 * new Bin$ P2 (4*glob/al, ah)
628 cycles for 100 * new Bin$ P3 (edi/al, ah)
559 cycles for 100 * new Bin$ P4 (4*glob/shr)
624 cycles for 100 * new Bin$ P5 (edi/shr)
241 cycles for 100 * bin$ (CRT)
986 cycles for 100 * Bin$ (MasmBasic)
552 cycles for 100 * new Bin$ P2 (4*glob/al, ah)
632 cycles for 100 * new Bin$ P3 (edi/al, ah)
553 cycles for 100 * new Bin$ P4 (4*glob/shr)
638 cycles for 100 * new Bin$ P5 (edi/shr)
Averages:
238 cycles for bin$ (CRT)
979 cycles for Bin$ (MasmBasic)
552 cycles for new Bin$ P2 (4*glob/al, ah)
630 cycles for new Bin$ P3 (edi/al, ah)
555 cycles for new Bin$ P4 (4*glob/shr)
629 cycles for new Bin$ P5 (edi/shr)
bin$ (CRT) 11101010101010101010000111000111
Bin$ (MasmBasic) 11101010101010101010000111000111
new Bin$ P2 (4*glob/al, ah) 11101010101010101010000111000111
new Bin$ P3 (edi/al, ah) 11101010101010101010000111000111
new Bin$ P4 (4*glob/shr) 11101010101010101010000111000111
new Bin$ P5 (edi/shr) 11101010101010101010000111000111
Note that bin$ (CRT) runs with one tenth of the iterations.
Now it would be nice to see some Intel timings :rolleyes:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
Averages:
288 cycles for bin$ (CRT)
1112 cycles for Bin$ (MasmBasic)
720 cycles for new Bin$ P2 (4*glob/al, ah)
714 cycles for new Bin$ P3 (edi/al, ah)
637 cycles for new Bin$ P4 (4*glob/shr)
725 cycles for new Bin$ P5 (edi/shr)
Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz (SSE4)
loop overhead is approx. 79/100 cycles
Averages:
337 cycles for bin$ (CRT)
825 cycles for Bin$ (MasmBasic)
434 cycles for new Bin$ P2 (4*glob/al, ah)
495 cycles for new Bin$ P3 (edi/al, ah)
426 cycles for new Bin$ P4 (4*glob/shr)
486 cycles for new Bin$ P5 (edi/shr)
Thanks, Sinsi - so P4 wins. The original version dates from 2008, 16 years ago. and in my source I noted "credits go to Sinsi" ;-)