Author Topic: qword to unicode?  (Read 4215 times)

HSE

  • Member
  • *****
  • Posts: 2257
  • AMD 7-32 / i3 10-64
Re: qword to unicode?
« Reply #15 on: April 28, 2022, 12:13:30 AM »
I found a very interesting procedure from bitRAKE for ASCII, pretty easy to make for Unicode:
Code: [Select]
.data
    align 64
    digit_table dw '0','1','2','3','4','5','6','7','8','9'
                dw 'A','B','C','D','E','F','G','H','I','J'
                dw 'K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
.code
      ;-------------------------------------------------------------------------------
      ;  Proc UINT64__Baseform
      ;  Modification from bitRAKE's fasmg_playground
      ;  https://github.com/bitRAKE/fasmg_playground/blob/master/string/baseform.asm
      ;-------------------------------------------------------------------------------

UINT64__Baseform:
; RAX number to convert
; RCX number base to use [2,36]
; RDI string buffer of length [65,14] bytes
       push rbx
       push rdi
       lea rdi, q2aBuffer
       lea rbx, digit_table
push 0
A: xor edx,edx
div rcx
push qword ptr [rbx+rdx*2]
test rax,rax
jnz A

B: pop rax
stosw
test al,al
jnz B
       mov rax, rdi         ; comment for timing
       pop rdi
       pop rbx
ret
; RCX unchanged
; RAX end of null-terminated string

Code: [Select]
    mov rax, 1500
    mov rcx, 10
    call UINT64__Baseform

LATER: JJ your algorithm fail because can not to manage a "00" byte  :thdn:    :sad:
Equations in Assembly: SmplMath

HSE

  • Member
  • *****
  • Posts: 2257
  • AMD 7-32 / i3 10-64
Re: qword to unicode?
« Reply #16 on: April 28, 2022, 01:15:25 AM »
JJ, this work:
Code: [Select]
q2a:
  push rsi
  push rdi
  mov rsi, offset q2aBuffer+32
  lea rdi, [rsi-32]
  FBSTP REAL10 ptr [rsi]
 
  push REAL10
  pop rdx
  mov r8, 0
@@:
movzx ecx, byte ptr [rsi+rdx]
        add r8, rcx
        test r8, r8
        je NoNumber
        mov r8, 1
mov eax, ecx
shr al, 4
or al, "0"
stosw
mov al, cl
and al, 15
or al, "0"
stosw
NoNumber:
dec rdx
jns @B
  pop rdi
  pop rsi
  ret

Once you have a number, "00" is valid,
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 13300
  • Assembly is fun ;-)
    • MasmBasic
Re: qword to unicode?
« Reply #17 on: April 28, 2022, 02:19:52 AM »
Clever :thumbsup:

Do you really need the test r8, r8?

HSE

  • Member
  • *****
  • Posts: 2257
  • AMD 7-32 / i3 10-64
Re: qword to unicode?
« Reply #18 on: April 28, 2022, 02:23:50 AM »
Do you really need the test r8, r8?

Clever :thumbsup:
Equations in Assembly: SmplMath

Biterider

  • Member
  • *****
  • Posts: 1045
  • ObjAsm Developer
    • ObjAsm
Re: qword to unicode?
« Reply #19 on: April 28, 2022, 07:09:06 AM »
Hi
I checked both procs, UINT64__Baseform (bitRAKE) and q2a.
Apart from the ugly leading "0" of q2a, UINT64__Baseform is faster (depending on argument size) and the representation base can be changed.

Argument = 123 => 2x faster (Base = 10)
Argument = 1234567890 => same performance

A signed version is not too hard to code.

Biterider

jj2007

  • Member
  • *****
  • Posts: 13300
  • Assembly is fun ;-)
    • MasmBasic
Re: qword to unicode?
« Reply #20 on: April 28, 2022, 10:44:42 AM »
Very nice, Biterider :thumbsup:

Code: [Select]
This program was assembled with ml64 in 64-bit format.
87      bytes for q2a
125     bytes for UINT64

1482 ticks for crt swprintf
Result=123456789
452 ticks for q2a
484 ticks for q2a
452 ticks for q2a
468 ticks for q2a
Result=123456789
468 ticks for UINT64__Baseform
484 ticks for UINT64__Baseform
468 ticks for UINT64__Baseform
468 ticks for UINT64__Baseform
Result=123456789

For short strings up to 12345678, your 64-bit code is faster; above 123456789 mine is faster. Your 32-bit version is significantly faster.

The leading zero problem is solved. My routine is signed, but that's a minor difference, of course.

Attached source and executables (built with ML64, but I recommend UAsm64).

HSE

  • Member
  • *****
  • Posts: 2257
  • AMD 7-32 / i3 10-64
Re: qword to unicode?
« Reply #21 on: April 28, 2022, 12:03:54 PM »
Hi Biterider!

UINT64__Baseform is faster (depending on argument size) and the representation base can be changed.

Yes, very elegant and versatil. I think uq2baseW is an enough descriptive name.

A signed version is not too hard to code.

That could be sq2baseW.

HSE
Equations in Assembly: SmplMath

HSE

  • Member
  • *****
  • Posts: 2257
  • AMD 7-32 / i3 10-64
Re: qword to unicode?
« Reply #22 on: April 28, 2022, 12:11:28 PM »
JJ:

Have you to adjust your glasses? :biggrin:

your 64-bit code is faster; above 123456789 mine is faster. Your 32-bit version is significantly faster.

bitRAKE could sound similar to Biterider but are different known persons. I also deserve some credit, essentially I changed some "e"  by "r"  :biggrin: :biggrin: :biggrin:
Equations in Assembly: SmplMath

TimoVJL

  • Member
  • *****
  • Posts: 1231
Re: qword to unicode?
« Reply #23 on: April 28, 2022, 04:53:49 PM »
Old AMD
Code: [Select]

This program was assembled with ml in 32-bit format.
81      bytes for q2a
126     bytes for UINT64

281 ticks for crt swprintf
Result=123456789
31 ticks for q2a
47 ticks for q2a
47 ticks for q2a
47 ticks for q2a
Result=123456789
46 ticks for UINT64__Baseform
63 ticks for UINT64__Baseform
47 ticks for UINT64__Baseform
62 ticks for UINT64__Baseform
Result=111111111

--- hit any key ---
Code: [Select]

This program was assembled with ml64 in 64-bit format.
87      bytes for q2a
125     bytes for UINT64

219 ticks for crt swprintf
Result=123456789
31 ticks for q2a
47 ticks for q2a
46 ticks for q2a
32 ticks for q2a
Result=123456789
62 ticks for UINT64__Baseform
47 ticks for UINT64__Baseform
62 ticks for UINT64__Baseform
63 ticks for UINT64__Baseform
Result=111111111

--- hit any key ---
May the source be with you

Biterider

  • Member
  • *****
  • Posts: 1045
  • ObjAsm Developer
    • ObjAsm
Re: qword to unicode?
« Reply #24 on: April 29, 2022, 06:09:02 AM »
Hi
While coding the signed version of UINT64__Baseform, I became unsure what we expect to see from the conversion from let's say -123 (decimal) to base 16 or to base 2. Are minus signs allowed on bases other than 10?
Does anyone know for sure the correct answer?

Biterider

HSE

  • Member
  • *****
  • Posts: 2257
  • AMD 7-32 / i3 10-64
Re: qword to unicode?
« Reply #25 on: April 29, 2022, 06:46:53 AM »
Are minus signs allowed on bases other than 10?
:biggrin: Maybe is a wrong question because that is obvious.

A negative number is negative in any base, the number is always the same.

Perhaps the question is: Are used negative numbers expressed in other bases than 10?  :thumbsup: 

Just that in computation a negative number in base 2 it's not a binary number, and a negative number in base 16 is not hexadecimal (because complement and fixed size of register for binary and hexadecimal).
Equations in Assembly: SmplMath

jj2007

  • Member
  • *****
  • Posts: 13300
  • Assembly is fun ;-)
    • MasmBasic
Re: qword to unicode?
« Reply #26 on: April 29, 2022, 09:42:14 AM »
Are minus signs allowed on bases other than 10?

They are not forbidden but highly unusual. In the meantime, I gave my routines a little speed boost - grateful for some timings:

Code: [Select]
This program was assembled with UAsm64 in 64-bit format.
87      bytes for q2a
72      bytes for q2asc
117     bytes for UINT64

2699 ticks for crt swprintf
2699 ticks for crt swprintf
    Result=123456789012345678

499 ticks for q2a
515 ticks for q2a
499 ticks for q2a
    Result=123456789012345678

187 ticks for q2asc
187 ticks for q2asc
187 ticks for q2asc
    Result=123456789012345678

1030 ticks for UINT64__Baseform
1045 ticks for UINT64__Baseform
1014 ticks for UINT64__Baseform
    Result=123456789012345678


686 ticks for crt swprintf
671 ticks for crt swprintf
    Result=123

453 ticks for q2a
436 ticks for q2a
453 ticks for q2a
    Result=123

31 ticks for q2asc
31 ticks for q2asc
31 ticks for q2asc
    Result=123

156 ticks for UINT64__Baseform
156 ticks for UINT64__Baseform
172 ticks for UINT64__Baseform
    Result=123

HSE

  • Member
  • *****
  • Posts: 2257
  • AMD 7-32 / i3 10-64
Re: qword to unicode?
« Reply #27 on: April 29, 2022, 10:02:48 AM »
Code: [Select]
2265 ticks for crt swprintf
2157 ticks for crt swprintf
    Result=123456789012345678

407 ticks for q2a
390 ticks for q2a
422 ticks for q2a
    Result=123456789012345678

125 ticks for q2asc
141 ticks for q2asc
109 ticks for q2asc
    Result=123456789012345678

766 ticks for UINT64__Baseform
781 ticks for UINT64__Baseform
766 ticks for UINT64__Baseform
    Result=123456789012345678


578 ticks for crt swprintf
609 ticks for crt swprintf
    Result=123

359 ticks for q2a
375 ticks for q2a
375 ticks for q2a
    Result=123

32 ticks for q2asc
15 ticks for q2asc
16 ticks for q2asc
    Result=123

109 ticks for UINT64__Baseform
110 ticks for UINT64__Baseform
125 ticks for UINT64__Baseform
    Result=123

--- hit any key ---
Equations in Assembly: SmplMath

TimoVJL

  • Member
  • *****
  • Posts: 1231
Re: qword to unicode?
« Reply #28 on: April 29, 2022, 04:01:09 PM »
Old AMD
Code: [Select]

This program was assembled with UAsm64 in 64-bit format.
87      bytes for q2a
72      bytes for q2asc
117     bytes for UINT64

4103 ticks for crt swprintf
4056 ticks for crt swprintf
    Result=123456789012345678

468 ticks for q2a
484 ticks for q2a
468 ticks for q2a
    Result=123456789012345678

265 ticks for q2asc
265 ticks for q2asc
281 ticks for q2asc
    Result=123456789012345678

1607 ticks for UINT64__Baseform
1607 ticks for UINT64__Baseform
1622 ticks for UINT64__Baseform
    Result=123456789012345678


936 ticks for crt swprintf
936 ticks for crt swprintf
    Result=123

375 ticks for q2a
358 ticks for q2a
359 ticks for q2a
    Result=123

63 ticks for q2asc
31 ticks for q2asc
32 ticks for q2asc
    Result=123

140 ticks for UINT64__Baseform
140 ticks for UINT64__Baseform
141 ticks for UINT64__Baseform
    Result=123



--- hit any key ---
May the source be with you

jj2007

  • Member
  • *****
  • Posts: 13300
  • Assembly is fun ;-)
    • MasmBasic
Re: qword to unicode?
« Reply #29 on: April 29, 2022, 04:12:32 PM »
Thanks, Timo & Hector :thup: