News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

?WORD to Ascii

Started by jj2007, April 29, 2022, 08:44:29 PM

Previous topic - Next topic

jj2007

This is a by-product of the Qword to Unicode thread. What is remarkable here is that the 32-bit versions of all algos are at least a factor 2 faster than their 64-bit equivalents. No idea why...

This program was assembled with ml64 in 64-bit format.
70      bytes for q2asc
117     bytes for UINT64

3026 ticks for CRT
3042 ticks for CRT
    Result=12345678901234567890

203 ticks for q2asc
203 ticks for q2asc
281 ticks for q2asc
    Result=12345678901234567890

1217 ticks for Baseform
1263 ticks for Baseform
1202 ticks for Baseform
    Result=12345678901234567890


670 ticks for CRT
671 ticks for CRT
    Result=123

31 ticks for q2asc
32 ticks for q2asc
31 ticks for q2asc
    Result=123

156 ticks for Baseform
172 ticks for Baseform
156 ticks for Baseform
    Result=123


This program was assembled with ml in 32-bit format.
48      bytes for q2asc
126     bytes for UINT64

1872 ticks for CRT
1888 ticks for CRT
    Result=123456789

109 ticks for q2asc
78 ticks for q2asc
109 ticks for q2asc
    Result=123456789

281 ticks for Baseform
281 ticks for Baseform
296 ticks for Baseform
    Result=123456789


811 ticks for CRT
812 ticks for CRT
    Result=123

31 ticks for q2asc
31 ticks for q2asc
47 ticks for q2asc
    Result=123

78 ticks for Baseform
78 ticks for Baseform
78 ticks for Baseform
    Result=123

HSE

Hi JJ!

Quote from: jj2007 on April 29, 2022, 08:44:29 PM
What is remarkable here is that the 32-bit versions of all algos are at least a factor 2 faster than their 64-bit equivalents. No idea why...
You are comparing QWORD against DWORD  :biggrin: (wich it's not fair)

Yet, for same number, like 123, 64 bits can be faster.

Code (64 bits) Select
422 ticks for CRT
406 ticks for CRT
    Result=123

16 ticks for q2asc
15 ticks for q2asc
16 ticks for q2asc
    Result=123

109 ticks for Baseform
94 ticks for Baseform
109 ticks for Baseform
    Result=123


Code (32 bits) Select
515 ticks for CRT
516 ticks for CRT
    Result=123

16 ticks for q2asc
46 ticks for q2asc
32 ticks for q2asc
    Result=123

31 ticks for Baseform
47 ticks for Baseform
31 ticks for Baseform
    Result=123


HSE

Note: you are using a JBasic that no body else have  :biggrin:
Equations in Assembly: SmplMath

jj2007

Quote from: HSE on April 29, 2022, 09:50:59 PM
You are comparing QWORD against DWORD  :biggrin: (wich it's not fair)

Yet, for same number, like 123, 64 bits can be faster.

Valid point :thumbsup:

Now, with identical strings, timings are more or less the same for q2asc but BitRake's Baseform algo is still slower in 64-bit:
This program was assembled with ml64 in 64-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

202 ticks for q2asc
203 ticks for q2asc
203 ticks for q2asc
    Result=1234567890

1045 ticks for Baseform
1061 ticks for Baseform
1061 ticks for Baseform
    Result=1234567890


This program was assembled with ml in 32-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

202 ticks for q2asc
250 ticks for q2asc
218 ticks for q2asc
    Result=1234567890

656 ticks for Baseform
670 ticks for Baseform
656 ticks for Baseform
    Result=1234567890


Quote from: HSE on April 29, 2022, 09:50:59 PMNote: you are using a JBasic that no body else have  :biggrin:

You are right, sorry for that. If I find the time, I will update it tonight :cool:

jj2007

Here is the modified bitRake algo I have tested; useW=1 is not faster but takes much more space:

;-------------------------------------------------------------------------------
;  Proc UINT64__Baseform
;  Modification from bitRAKE's fasmg_playground by JJ
;  https://github.com/bitRAKE/fasmg_playground/blob/master/string/baseform.asm
;-------------------------------------------------------------------------------
    align 16 ; align 64 makes ML64.exe choke
    useW=0
    if useW
digit_table dw '0','1','2','3','4','5','6','7','8','9'
dw 'A','B','C','D','E','F','G','H','I','J'
dw 'K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
    else
digit_table db "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    endif
UINT64__Baseform:
; RAX number to convert
; RCX number base to use [2,36]
; RDI string buffer of length [65,14] bytes
  push rbx
  push rdi
  lea rdi, q2aBuffer
  push rdi
  lea rbx, digit_table
  push 0
A: xor edx, edx
div rcx
if useW
push SIZE_P ptr [rbx+rdx*2]
else
push SIZE_P ptr [rbx+rdx]
endif
test rax, rax
jnz A
B: pop rax
if useW
stosw
test al, al
elseif 0 ; not faster, one byte more
stosb
inc rdi
test al, al
else ; fastest option
and eax, 127
stosw
endif
jnz B
  pop rax ; return start of buffer
  pop rdi
  pop rbx
  ret


Latest timings for 10 Million strings (10 chars in 32-bit, 20 in 64-bit mode):

This program was assembled with ml64 in 64-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
70      bytes for q2asc
81      bytes for UINT64

421 ticks for q2asc
453 ticks for q2asc
436 ticks for q2asc
    Result=12345678901234567890

2403 ticks for Baseform
2480 ticks for Baseform
2387 ticks for Baseform
    Result=12345678901234567890

This program was assembled with ml in 32-bit format.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
48      bytes for q2asc
91      bytes for UINT64

187 ticks for q2asc
203 ticks for q2asc
202 ticks for q2asc
    Result=123456789

578 ticks for Baseform
577 ticks for Baseform
577 ticks for Baseform
    Result=123456789

HSE

Quote from: jj2007 on April 30, 2022, 02:28:23 AM
Here is the modified bitRake algo
:biggrin: :biggrin: Not bitRAKE, I maked that! (You came back to bitRAKE way  :thumbsup:)

JJ your tests are not very usefull because most used numbers are O and 1  :biggrin:
Equations in Assembly: SmplMath