Here is the modified bitRake algo I have tested; useW=1 is not faster but takes much more space:
;-------------------------------------------------------------------------------
; Proc UINT64__Baseform
; Modification from bitRAKE's fasmg_playground by JJ
; https://github.com/bitRAKE/fasmg_playground/blob/master/string/baseform.asm
;-------------------------------------------------------------------------------
align 16 ; align 64 makes ML64.exe choke
useW=0
if useW
digit_table dw '0','1','2','3','4','5','6','7','8','9'
dw 'A','B','C','D','E','F','G','H','I','J'
dw 'K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
else
digit_table db "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
endif
UINT64__Baseform:
; RAX number to convert
; RCX number base to use [2,36]
; RDI string buffer of length [65,14] bytes
push rbx
push rdi
lea rdi, q2aBuffer
push rdi
lea rbx, digit_table
push 0
A: xor edx, edx
div rcx
if useW
push SIZE_P ptr [rbx+rdx*2]
else
push SIZE_P ptr [rbx+rdx]
endif
test rax, rax
jnz A
B: pop rax
if useW
stosw
test al, al
elseif 0 ; not faster, one byte more
stosb
inc rdi
test al, al
else ; fastest option
and eax, 127
stosw
endif
jnz B
pop rax ; return start of buffer
pop rdi
pop rbx
ret
Latest timings for 10 Million strings (10 chars in 32-bit, 20 in 64-bit mode):
This program was assembled with ml64 in 64-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
70 bytes for q2asc
81 bytes for UINT64
421 ticks for q2asc
453 ticks for q2asc
436 ticks for q2asc
Result=12345678901234567890
2403 ticks for Baseform
2480 ticks for Baseform
2387 ticks for Baseform
Result=12345678901234567890
This program was assembled with ml in 32-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
48 bytes for q2asc
91 bytes for UINT64
187 ticks for q2asc
203 ticks for q2asc
202 ticks for q2asc
Result=123456789
578 ticks for Baseform
577 ticks for Baseform
577 ticks for Baseform
Result=123456789