This is a by-product of the Qword to Unicode thread (http://masm32.com/board/index.php?topic=10022.0). What is remarkable here is that the 32-bit versions of all algos are at least a factor 2 faster than their 64-bit equivalents. No idea why...
This program was assembled with ml64 in 64-bit format.
70 bytes for q2asc
117 bytes for UINT64
3026 ticks for CRT
3042 ticks for CRT
Result=12345678901234567890
203 ticks for q2asc
203 ticks for q2asc
281 ticks for q2asc
Result=12345678901234567890
1217 ticks for Baseform
1263 ticks for Baseform
1202 ticks for Baseform
Result=12345678901234567890
670 ticks for CRT
671 ticks for CRT
Result=123
31 ticks for q2asc
32 ticks for q2asc
31 ticks for q2asc
Result=123
156 ticks for Baseform
172 ticks for Baseform
156 ticks for Baseform
Result=123
This program was assembled with ml in 32-bit format.
48 bytes for q2asc
126 bytes for UINT64
1872 ticks for CRT
1888 ticks for CRT
Result=123456789
109 ticks for q2asc
78 ticks for q2asc
109 ticks for q2asc
Result=123456789
281 ticks for Baseform
281 ticks for Baseform
296 ticks for Baseform
Result=123456789
811 ticks for CRT
812 ticks for CRT
Result=123
31 ticks for q2asc
31 ticks for q2asc
47 ticks for q2asc
Result=123
78 ticks for Baseform
78 ticks for Baseform
78 ticks for Baseform
Result=123
Hi JJ!
Quote from: jj2007 on April 29, 2022, 08:44:29 PM
What is remarkable here is that the 32-bit versions of all algos are at least a factor 2 faster than their 64-bit equivalents. No idea why...
You are comparing QWORD against DWORD :biggrin: (wich it's not fair)
Yet, for same number, like 123, 64 bits can be faster.
422 ticks for CRT
406 ticks for CRT
Result=123
16 ticks for q2asc
15 ticks for q2asc
16 ticks for q2asc
Result=123
109 ticks for Baseform
94 ticks for Baseform
109 ticks for Baseform
Result=123
515 ticks for CRT
516 ticks for CRT
Result=123
16 ticks for q2asc
46 ticks for q2asc
32 ticks for q2asc
Result=123
31 ticks for Baseform
47 ticks for Baseform
31 ticks for Baseform
Result=123
HSE
Note: you are using a JBasic that no body else have :biggrin:
Quote from: HSE on April 29, 2022, 09:50:59 PM
You are comparing QWORD against DWORD :biggrin: (wich it's not fair)
Yet, for same number, like 123, 64 bits can be faster.
Valid point :thumbsup:
Now, with identical strings, timings are more or less the same for q2asc but BitRake's Baseform algo is still slower in 64-bit:
This program was assembled with ml64 in 64-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
202 ticks for q2asc
203 ticks for q2asc
203 ticks for q2asc
Result=1234567890
1045 ticks for Baseform
1061 ticks for Baseform
1061 ticks for Baseform
Result=1234567890
This program was assembled with ml in 32-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
202 ticks for q2asc
250 ticks for q2asc
218 ticks for q2asc
Result=1234567890
656 ticks for Baseform
670 ticks for Baseform
656 ticks for Baseform
Result=1234567890
Quote from: HSE on April 29, 2022, 09:50:59 PMNote: you are using a JBasic that no body else have :biggrin:
You are right, sorry for that. If I find the time, I will update it tonight :cool:
Here is the modified bitRake algo I have tested; useW=1 is not faster but takes much more space:
;-------------------------------------------------------------------------------
; Proc UINT64__Baseform
; Modification from bitRAKE's fasmg_playground by JJ
; https://github.com/bitRAKE/fasmg_playground/blob/master/string/baseform.asm
;-------------------------------------------------------------------------------
align 16 ; align 64 makes ML64.exe choke
useW=0
if useW
digit_table dw '0','1','2','3','4','5','6','7','8','9'
dw 'A','B','C','D','E','F','G','H','I','J'
dw 'K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
else
digit_table db "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
endif
UINT64__Baseform:
; RAX number to convert
; RCX number base to use [2,36]
; RDI string buffer of length [65,14] bytes
push rbx
push rdi
lea rdi, q2aBuffer
push rdi
lea rbx, digit_table
push 0
A: xor edx, edx
div rcx
if useW
push SIZE_P ptr [rbx+rdx*2]
else
push SIZE_P ptr [rbx+rdx]
endif
test rax, rax
jnz A
B: pop rax
if useW
stosw
test al, al
elseif 0 ; not faster, one byte more
stosb
inc rdi
test al, al
else ; fastest option
and eax, 127
stosw
endif
jnz B
pop rax ; return start of buffer
pop rdi
pop rbx
ret
Latest timings for 10 Million strings (10 chars in 32-bit, 20 in 64-bit mode):
This program was assembled with ml64 in 64-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
70 bytes for q2asc
81 bytes for UINT64
421 ticks for q2asc
453 ticks for q2asc
436 ticks for q2asc
Result=12345678901234567890
2403 ticks for Baseform
2480 ticks for Baseform
2387 ticks for Baseform
Result=12345678901234567890
This program was assembled with ml in 32-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
48 bytes for q2asc
91 bytes for UINT64
187 ticks for q2asc
203 ticks for q2asc
202 ticks for q2asc
Result=123456789
578 ticks for Baseform
577 ticks for Baseform
577 ticks for Baseform
Result=123456789
Quote from: jj2007 on April 30, 2022, 02:28:23 AM
Here is the modified bitRake algo
:biggrin: :biggrin: Not bitRAKE, I maked that! (You came back to bitRAKE way :thumbsup:)
JJ your tests are not very usefull because most used numbers are O and 1 :biggrin: