Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)

jimg · April 03, 2024, 01:15:53 PM

okay, that's going to have to wait for tomorrow.

But you do see that magic number is the mantissa of .1, right?

So you multiply that mantissa by the mantissa of the number you want to divide by 10.

I didn't design this system, and it took me a while to figure it out.

NoCforMe · April 03, 2024, 01:41:21 PM

Well, I have to give you points for seeming to understand it.

I can wait.

sinsi · April 03, 2024, 03:21:26 PM

It's called a magic number for a reason

TimoVJL · April 03, 2024, 07:16:15 PM

Code Select

AMD Athlon(tm) II X2 220 Processor (SSE3)
61      bytes for other

8105    cycles for 100 * dwtoa
6404    cycles for 100 * dw2str
48371   cycles for 100 * MasmBasic Str$()
34694   cycles for 100 * Ray's algo I
10410   cycles for 100 * Ray's algo, mod JimG

8207    cycles for 100 * dwtoa
6404    cycles for 100 * dw2str
26515   cycles for 100 * MasmBasic Str$()
34499   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

8109    cycles for 100 * dwtoa
6405    cycles for 100 * dw2str
26517   cycles for 100 * MasmBasic Str$()
34405   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

8104    cycles for 100 * dwtoa
6413    cycles for 100 * dw2str
27009   cycles for 100 * MasmBasic Str$()
34508   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

Averages:
8107    cycles for dwtoa
6404    cycles for dw2str
26763   cycles for MasmBasic Str$()
34504   cycles for Ray's algo I
10303   cycles for Ray's algo, mod JimG

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
153     bytes for Ray's algo, mod JimG

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo I                            171510507
Ray's algo, mod JimG                    -123456789

--- ok ---

jj2007 · April 03, 2024, 10:15:32 PM

For the sake of completeness, Ray's algo used for converting to a bin$:

Code Select

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
29156   cycles for crt__itoa
928     cycles for MasmBasic Bin$()
45257   cycles for Ray's algo

26      bytes for crt__itoa
16      bytes for MasmBasic Bin$()
110     bytes for Ray's algo

crt__itoa                               100101101011010000111
MasmBasic Bin$()                        00000000000100101101011010000111
Ray's algo                              00000000000100101101011010000111

As expected, the div ebx with ebx=2 is not efficient, compared to a simple sar eax, 1.

jj2007 · April 04, 2024, 01:21:34 AM

One more bin-to-dec, with a modification proposed by JimG:

Code Select

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4440    cycles for dwtoa
2843    cycles for dw2str
16348   cycles for MasmBasic Str$()
13864   cycles for Ray's algo I
3767    cycles for Ray's algo, mod JimG

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
1129    bytes for Ray's algo, mod JimG

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo I                            171510507
Ray's algo, mod JimG                    -123456789

jimg · April 04, 2024, 04:59:41 AM

Quote from: NoCforMe on April 03, 2024, 01:41:21 PMWell, I have to give you points for seeming to understand it.

I can wait.

I've given this a lot of thought, and I'm not good enough to explain this more than I have. If there is something in particular you don't understand, I'd be happy to explain that, but a blanket explain everything is beyond me.

six_L · April 04, 2024, 05:29:49 AM

Hi,Lingo

Quote00001220 clock cycles, (Ray_AnyToAny)x1000, OutPut: 987654321098
00001130 clock cycles, (lingo_i2aA)x1000, OutPut: 987654321098
00000416 clock cycles, (Roberts_dqtoa)x1000, OutPut: 987654321098

Roberts_dqtoa:

Code Select

dqtoa proc uses rbx rsi rdi dqValue:QWORD, lpBuffer:QWORD
	; -------------------------------------------------------------
	; convert QWORD to ascii string
	; dqValue is value to be converted
	; lpBuffer is the address of the receiving buffer
	; EXAMPLE:
	; invoke dqtoa,dqValue,ADDR buffer
	; -------------------------------------------------------------

	mov	rax, dqValue
	mov	rdi, lpBuffer
	test	rax,rax
	jnz	@pos
@zero:
	mov	word ptr [rdi],30h
	jmp	@exit
@pos:
	mov	rcx, 0cccccccccccccccdh	; 8 * 1/10
	mov	rsi, rdi
	.while (rax > 0)
		mov	rbx,rax		; save original
		mul	rcx		; num * 1/10 magic number. rax/10 = rax* magic number
		shr	rdx, 3		; magic number fixup
		mov	rax,rdx		;
		lea	rdx,[rdx*4+rdx]	; *5
		add	rdx,rdx		; *10
		sub	rbx,rdx		; remainder
		add	bl,'0'		; get the ascii value
		mov	[rdi],bl	; save it
		add	rdi, 1		; get next digit
	.endw
	mov	byte ptr [rdi], 0       ; terminate the string

	; We now have all the digits, but in reverse order.
	.while (rsi < rdi)
		sub	rdi, 1
		mov	al, [rsi]
		mov	ah, [rdi]
		mov	[rdi], al
		mov	[rsi], ah
		add	rsi, 1
	.endw
@exit:
	ret

dqtoa endp

NoCforMe · April 04, 2024, 05:35:24 AM

Quote from: jimg on April 04, 2024, 04:59:41 AM
Quote from: NoCforMe on April 03, 2024, 01:41:21 PMWell, I have to give you points for seeming to understand it.

I can wait.
I've given this a lot of thought, and I'm not good enough to explain this more than I have. If there is something in particular you don't understand, I'd be happy to explain that, but a blanket explain everything is beyond me.

Maybe you're better at explaining than you think you are.

Let's try this: can you explain, in some detail, how the bits of the number being "divided" (multiplied) and the magic number get shuffled and manipulated to come up with the final answer? This would explain how the virtual binary point comes into play, and how we turn a lowly integer (like 0CCCC ... CCCDh) into a virtual floating-point number. Possible?

I know I'm asking a lot, but at this point I really do not understand how this works. And that bothers me. I don't like taking things that I don't understand for granted.

jj2007 · April 04, 2024, 05:44:37 AM

Quote from: six_L on April 04, 2024, 05:29:49 AMlea rdx,[rdx*4+rdx]

That can be made faster, see reply #80.

lingo · April 04, 2024, 07:08:19 AM

Thank you six_L,

Perfect!

jimg · April 04, 2024, 07:35:50 AM

Quote from: NoCforMe on April 04, 2024, 05:35:24 AMLet's try this: can you explain, in some detail, how the bits of the number being "divided" (multiplied) and the magic number get shuffled and manipulated to come up with the final answer? This would explain how the virtual binary point comes into play, and how we turn a lowly integer (like 0CCCC ... CCCDh) into a virtual floating-point number. Possible?

I know I'm asking a lot, but at this point I really do not understand how this works. And that bothers me. I don't like taking things that I don't understand for granted.

Okay, you made me take a closer look and found out my assumptions were wrong all along.
The answers come out correct, the algorithm works, it just that it seems the extra
multiply by 8, divide by 8 was unnecessary as far as I can see. I'll have to write
a test program before I'm 100% sure, but it looks like it from here.
So, removing the factor of eight means we save a shr,3 instruction.
The "magic" number now becomes 1/8 less or 19999999 hex. Strange coincidence with
a decimal looking number, but it's not. Originally, I thought the extra factor of eight
was required to maximize the precision of the numbers, left justifying the binary in the
maximum 32 bits. For now, it seems it's unneccessary. I tested on the min and max
32 bit binary number (minimum value of -2,147,483,648 and a maximum value of 2,147,483,647)
and it seems to work.
This was my original explanation, in case it turns out we do need the extra *8
" 1/10 is a repeating number in binary 0.000110011001100....
we multiply by 8 to maximize the precision= .11001100110011001100
on the bottom end, we slid in additional bits from the repeating binary
then, since there is more repeating binary left, we round up to a D
We do all this because, without full 32 bits of precision, the last
digit of our result would suffer from truncation error and would be
incorrect.
"
What hogwash.

Anyway, here is my new dialog-

eax = number in binary = 1234

mov ebx,eax ; save original for subtraction below to get remainder
mul ecx ; 19999999h = 1/10
1/10 in binary = 0.00011001100110011001100110011001 = 19999999 hex

1234 * 19999999h = 1234 * 429,496,729.625 = 529,998,964,357.25 = 7B66666685 hex
7B66666685 => edx=7B eax=66666685 edx= 123
edx is the overflow, from the multiplication, the integer part of the answer
the fractional part being in eax.
mov eax,edx ; save answer
lea edx,[edx*4+edx] ; *5 = 123*5 = 615
add edx,edx ; * = *10 total = 1230
sub ebx,edx ; remainder 1234-1230 = 4 our last digit,
which we push onto the stack for saving

next time through we have 123 in eax and do the same thing

mov ebx,eax ; save original 123
mul ecx ; 19999999h = 1/10 gives 12 in edx, don't care about eax
mov eax,edx ; save answer
lea edx,[edx*4+edx] ; *5 12*5=60
add edx,edx ; *2=120
sub ebx,edx ; remainder 123-120 = 3, our next digit

etc until our remainder was zero, then we are done.
at the end we pop the digits off the stack and save them. You know this.

I hope this helps. If not, ask again.

jj2007 · April 04, 2024, 08:50:09 AM

Quote from: six_L on April 04, 2024, 05:29:49 AMRoberts_dqtoa:

No negative numbers?

NoCforMe · April 04, 2024, 10:09:28 AM

Quote from: jimg on April 04, 2024, 07:35:50 AMOkay, you made me take a closer look and found out my assumptions were wrong all along.

Wow. Very impressive answer. I haven't had time to delve into it yet, but I will. Thanks!

lingo · April 04, 2024, 10:40:09 AM

Thank you six_L

Would you like to include my new 64 bit code in your perfect speed test, together with the Robert's code?

; Int_to_string - Convert a big integer into an string
; IN: RAX = binary integer
; RCX = address of location to store string
; OUT: RAX = new start of the string in the buffer

Code Select

align 16
db 90h
 
i2a64   proc   
        push rbx
        lea  r9,  [rcx+24]                   ; lpBuffer 
        mov  r8,  0cccccccccccccccdh         ; 8 * 1/10
@@:        
        mov  rbx, rax
        mul  r8
        shr  rdx, 3
        sub  r9,  1
        lea  rcx, [rdx*4+rdx-18h]
        lea  rax, [rcx+rcx]
        sub  rbx, rax
        mov  rax, rdx
        mov  [r9],bl
        test eax, eax    
        jnz  @b
        mov  rax, r9                          ; return in RAX new start of the buffer
        pop  rbx
        ret
i2a64   endp

Note:
The code for the minus sign and leading zeros must be kept out of the algorithm.
It's not normal to bloat the algorithm with 99.99% unusable code.

The MASM Forum

News:

Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)

jimg

NoCforMe

sinsi

TimoVJL

jj2007

jj2007

jimg

six_L

NoCforMe

jj2007

lingo

jimg

jj2007

NoCforMe

lingo