News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)

Started by jj2007, April 01, 2024, 03:42:50 AM

Previous topic - Next topic

jimg

okay, that's going to have to wait for tomorrow.

But you do see that magic number is the mantissa of .1, right?

So you multiply that mantissa by the mantissa of the number you want to divide by 10.

I didn't design this system, and it took me a while to figure it out.

NoCforMe

Well, I have to give you points for seeming to understand it.

I can wait.
Assembly language programming should be fun. That's why I do it.

sinsi

🍺🍺🍺

TimoVJL

AMD Athlon(tm) II X2 220 Processor (SSE3)
61      bytes for other

8105    cycles for 100 * dwtoa
6404    cycles for 100 * dw2str
48371   cycles for 100 * MasmBasic Str$()
34694   cycles for 100 * Ray's algo I
10410   cycles for 100 * Ray's algo, mod JimG

8207    cycles for 100 * dwtoa
6404    cycles for 100 * dw2str
26515   cycles for 100 * MasmBasic Str$()
34499   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

8109    cycles for 100 * dwtoa
6405    cycles for 100 * dw2str
26517   cycles for 100 * MasmBasic Str$()
34405   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

8104    cycles for 100 * dwtoa
6413    cycles for 100 * dw2str
27009   cycles for 100 * MasmBasic Str$()
34508   cycles for 100 * Ray's algo I
10303   cycles for 100 * Ray's algo, mod JimG

Averages:
8107    cycles for dwtoa
6404    cycles for dw2str
26763   cycles for MasmBasic Str$()
34504   cycles for Ray's algo I
10303   cycles for Ray's algo, mod JimG

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
153     bytes for Ray's algo, mod JimG

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo I                            171510507
Ray's algo, mod JimG                    -123456789

--- ok ---
May the source be with you

jj2007

For the sake of completeness, Ray's algo used for converting to a bin$:
AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
29156   cycles for crt__itoa
928     cycles for MasmBasic Bin$()
45257   cycles for Ray's algo

26      bytes for crt__itoa
16      bytes for MasmBasic Bin$()
110     bytes for Ray's algo

crt__itoa                               100101101011010000111
MasmBasic Bin$()                        00000000000100101101011010000111
Ray's algo                              00000000000100101101011010000111

As expected, the div ebx with ebx=2 is not efficient, compared to a simple sar eax, 1.

jj2007

One more bin-to-dec, with a modification proposed by JimG:

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

Averages:
4440    cycles for dwtoa
2843    cycles for dw2str
16348   cycles for MasmBasic Str$()
13864   cycles for Ray's algo I
3767    cycles for Ray's algo, mod JimG

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
1129    bytes for Ray's algo, mod JimG

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo I                            171510507
Ray's algo, mod JimG                    -123456789

jimg

Quote from: NoCforMe on April 03, 2024, 01:41:21 PMWell, I have to give you points for seeming to understand it.

I can wait.

I've given this a lot of thought, and I'm not good enough to explain this more than I have.  If there is something in particular you don't understand, I'd be happy to explain that, but a blanket explain everything is beyond me.

six_L

Hi,Lingo
Quote00001220 clock cycles, (Ray_AnyToAny)x1000, OutPut: 987654321098
00001130 clock cycles, (lingo_i2aA)x1000, OutPut: 987654321098
00000416 clock cycles, (Roberts_dqtoa)x1000, OutPut: 987654321098

Roberts_dqtoa:
dqtoa proc uses rbx rsi rdi dqValue:QWORD, lpBuffer:QWORD
; -------------------------------------------------------------
; convert QWORD to ascii string
; dqValue is value to be converted
; lpBuffer is the address of the receiving buffer
; EXAMPLE:
; invoke dqtoa,dqValue,ADDR buffer
; -------------------------------------------------------------

mov rax, dqValue
mov rdi, lpBuffer
test rax,rax
jnz @pos
@zero:
mov word ptr [rdi],30h
jmp @exit
@pos:
mov rcx, 0cccccccccccccccdh ; 8 * 1/10
mov rsi, rdi
.while (rax > 0)
mov rbx,rax ; save original
mul rcx ; num * 1/10 magic number. rax/10 = rax* magic number
shr rdx, 3 ; magic number fixup
mov rax,rdx ;
lea rdx,[rdx*4+rdx] ; *5
add rdx,rdx ; *10
sub rbx,rdx ; remainder
add bl,'0' ; get the ascii value
mov [rdi],bl ; save it
add rdi, 1 ; get next digit
.endw
mov byte ptr [rdi], 0       ; terminate the string

; We now have all the digits, but in reverse order.
.while (rsi < rdi)
sub rdi, 1
mov al, [rsi]
mov ah, [rdi]
mov [rdi], al
mov [rsi], ah
add rsi, 1
.endw
@exit:
ret

dqtoa endp
Say you, Say me, Say the codes together for ever.

NoCforMe

Quote from: jimg on April 04, 2024, 04:59:41 AM
Quote from: NoCforMe on April 03, 2024, 01:41:21 PMWell, I have to give you points for seeming to understand it.

I can wait.
I've given this a lot of thought, and I'm not good enough to explain this more than I have.  If there is something in particular you don't understand, I'd be happy to explain that, but a blanket explain everything is beyond me.

Maybe you're better at explaining than you think you are.

Let's try this: can you explain, in some detail, how the bits of the number being "divided" (multiplied) and the magic number get shuffled and manipulated to come up with the final answer? This would explain how the virtual binary point comes into play, and how we turn a lowly integer (like 0CCCC ... CCCDh) into a virtual floating-point number. Possible?

I know I'm asking a lot, but at this point I really do not understand how this works. And that bothers me. I don't like taking things that I don't understand for granted.
Assembly language programming should be fun. That's why I do it.

jj2007


lingo

Quid sit futurum cras fuge quaerere.

jimg

#86
Quote from: NoCforMe on April 04, 2024, 05:35:24 AMLet's try this: can you explain, in some detail, how the bits of the number being "divided" (multiplied) and the magic number get shuffled and manipulated to come up with the final answer? This would explain how the virtual binary point comes into play, and how we turn a lowly integer (like 0CCCC ... CCCDh) into a virtual floating-point number. Possible?

I know I'm asking a lot, but at this point I really do not understand how this works. And that bothers me. I don't like taking things that I don't understand for granted.



Okay, you made me take a closer look and found out my assumptions were wrong all along.
The answers come out correct, the algorithm works, it just that it seems the extra
multiply by 8, divide by 8 was unnecessary as far as I can see.  I'll have to write
a test program before I'm 100% sure, but it looks like it from here.
So, removing the factor of eight means we save a shr,3 instruction.
The "magic" number now becomes 1/8 less or 19999999 hex.  Strange coincidence with
a decimal looking number, but it's not.  Originally, I thought the extra factor of eight
was required to maximize the precision of the numbers, left justifying the binary in the
maximum 32 bits.  For now, it seems it's unneccessary.  I tested on the min and max
32 bit binary number (minimum value of -2,147,483,648 and a maximum value of 2,147,483,647)
and it seems to work.
This was my original explanation, in case it turns out we do need the extra *8
"              1/10 is a repeating number in binary  0.000110011001100....
              we multiply by 8 to maximize the precision= .11001100110011001100
              on the bottom end, we slid in additional bits from the repeating binary
              then, since there is more repeating binary left, we round up to a D
              We do all this because, without full 32 bits of precision, the last
              digit of our result would suffer from truncation error and would be
              incorrect.
"
What hogwash.

Anyway, here is my new dialog-


  eax = number in binary = 1234

  mov ebx,eax        ; save original for subtraction below to get remainder
  mul ecx            ; 19999999h = 1/10 
              1/10 in binary = 0.00011001100110011001100110011001 = 19999999 hex

        1234 * 19999999h = 1234 * 429,496,729.625 = 529,998,964,357.25 = 7B66666685 hex
                        7B66666685 =>  edx=7B eax=66666685  edx= 123
                           edx is the overflow, from the multiplication, the integer part of the answer
                                  the fractional part being in eax.
  mov eax,edx        ; save answer
  lea edx,[edx*4+edx] ; *5  = 123*5 = 615
  add edx,edx        ; * = *10 total = 1230
  sub ebx,edx        ; remainder  1234-1230 = 4  our last digit,
                          which we push onto the stack for saving
                         
                      next time through we have 123 in eax and do the same thing
                     
  mov ebx,eax        ; save original 123
  mul ecx            ; 19999999h = 1/10  gives 12 in edx, don't care about eax
  mov eax,edx        ; save answer
  lea edx,[edx*4+edx] ; *5 12*5=60
  add edx,edx        ; *2=120
  sub ebx,edx        ; remainder 123-120 = 3, our next digit
 
  etc until our remainder was zero, then we are done.
  at the end we pop the digits off the stack and save them.  You know this.
 
  I hope this helps.  If not, ask again.




NoCforMe

Quote from: jimg on April 04, 2024, 07:35:50 AMOkay, you made me take a closer look and found out my assumptions were wrong all along.
Wow. Very impressive answer. I haven't had time to delve into it yet, but I will. Thanks!
Assembly language programming should be fun. That's why I do it.

lingo

Thank you  six_L :thumbsup:

Would you like to include my new 64 bit code in your perfect speed test, together with the Robert's code?

; Int_to_string - Convert a big integer into an string
; IN:  RAX = binary integer
;       RCX = address of location to store string
; OUT: RAX = new start of the string in the buffer
align 16
db 90h
 
i2a64   proc   
        push rbx
        lea  r9,  [rcx+24]                   ; lpBuffer
        mov  r8,  0cccccccccccccccdh         ; 8 * 1/10
@@:       
        mov  rbx, rax
        mul  r8
        shr  rdx, 3
        sub  r9,  1
        lea  rcx, [rdx*4+rdx-18h]
        lea  rax, [rcx+rcx]
        sub  rbx, rax
        mov  rax, rdx
        mov  [r9],bl
        test eax, eax   
        jnz  @b
        mov  rax, r9                          ; return in RAX new start of the buffer
        pop  rbx
        ret
i2a64   endp
Note:
The code for the minus sign and leading zeros must be kept out of the algorithm.
It's not normal to bloat the algorithm with 99.99% unusable code.
Quid sit futurum cras fuge quaerere.