Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)

ahsat · April 03, 2024, 07:08:35 AM

Quote from: NoCforMe on April 03, 2024, 06:28:08 AMhow it works?

Simply stated, you can divide by a number, or multiply by it's reciprocal and get the same result. The reciprocal of a number is dividing 1 by the number.

For example 1 divided by 2 is 0.5, thus 0.5 is the reciprocal of 2. So we can divide a number by 2, or we can multiply by 0.5 and get the same result.

Very simply stated, what they are doing is multiplying by the reciprocal of 10, instead of dividing my 10. But its the remainder of the divide they want, so that is what the rest of the code is about. I have not studied the code to see how they come up with the remainder.

NoCforMe · April 03, 2024, 07:14:28 AM

Thanks! That's what I was after, a simple explanation rather than a 5,000-word treatise on processor optimizations or whatever.

Still not sure how 0CCCCCCCDh works out to be the reciprocal of 10 ...

jimg · April 03, 2024, 07:42:47 AM

0ccccccc..ccc = 0.0001100110011001100110011..... in binary
you know the stuff to the left of the binary point are powers of two, 2^1,2^2,2^3, etc.
the stuff to the right are also, 2^-1,2^-2,2^-3,etc. (1/2, 1/4,1/8,1/16,etc)
so in .00001100110011 we have 0*1/2 + 0*1/4 + 0*1/8 + 1*1/16 + 1*1/32 + 0*1/64 etc. so

Code Select

                    sum
  1/16=.0625       .0625
+ 1/32=.03125      .09375
+ 1/256=.00390625  .09765625

etc, approaching .1
The final D is just the remaining CCCC... rounded up.

NoCforMe · April 03, 2024, 08:02:31 AM

OK ... I guess.

What's still hard to wrap my head around here is that what we're actually doing is multiplying, by a very large non-zero number, so I still don't get how we get multiplication by the reciprocal powers of two (2^-1, 2^-2, etc.) from that operation. How does that work? How do we get that "virtual" binary point?

Thanks to all who are putting up with my simple-mindedness here.

ahsat · April 03, 2024, 08:35:18 AM

Quote from: NoCforMe on April 03, 2024, 07:14:28 AMStill not sure how 0CCCCCCCDh works out to be the reciprocal of 10 ...

Neither am I, but I understand what they are doing, but not the details.

jimg · April 03, 2024, 08:56:04 AM

we are throwing away the overflow and keeping only the bits that line up the way we want. If you look closely, we are actually multiplying by 8*1/10 to get things lined up. Later down we shift edx over 3 places to divide out the extra 8.

jimg · April 03, 2024, 10:05:44 AM

okay, here is the latest and final for me. rearranged some registers and scrapped of a tiny bit of time.

Code Select

ctAnyToAnyMul:  ; setup   (needs a better name)
   test eax,eax
   jnz @f
   mov word ptr [edi],30h
   ret

@@: jns @f
    mov byte ptr [edi],'-'
    neg eax
    add edi, 1
@@: mov ecx,0CCCCCCCDh  ; 8 * 1/10
    call ctanydoit
    ret

align 16
ctanydoit:             ; this one needs a better name also
   push ebx            ; save ebx, or a digit
   mov ebx,eax         ; save original
   mul ecx             ; 0CCCCCCCDh 8 * 1/10   num * 1/10 magic number
   shr edx, 3          ; magic number fixup (divide by 8)
   mov eax,edx         ; save answer
   lea edx,[edx*4+edx] ; *5
   add edx,edx         ; *10
   sub ebx,edx         ; remainder
   test eax,eax
   .if !zero?
      call ctanydoit	; generate next digit
   .endif
   add bl,'0'
   mov [edi],bl
   inc edi
   pop ebx		; restore ebx, or get next digit
   ret

ahsat · April 03, 2024, 10:33:49 AM

Quote from: jimg on April 03, 2024, 10:05:44 AMokay, here is the latest and final for me.

How was the timing? Post what you got.

jj2007 · April 03, 2024, 11:00:09 AM

Quote from: jimg on April 03, 2024, 10:05:44 AMokay, here is the latest and final for me

Well done

Code Select

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)
Averages:
4483    cycles for dwtoa
2836    cycles for dw2str
16663   cycles for MasmBasic Str$()
13924   cycles for Ray's algo I
3870    cycles for Ray's algo, mod JimG

20      bytes for dwtoa
82      bytes for dw2str
16      bytes for MasmBasic Str$()
110     bytes for Ray's algo I
153     bytes for Ray's algo, mod JimG

dwtoa                                   -123456789
dw2a                                    4171510507
dw2str                                  -123456789
MasmBasic Str$()                        -123456789
Ray's algo I                            171510507
Ray's algo, mod JimG                    -123456789

lingo · April 03, 2024, 11:22:50 AM

Thank you jimg,

For big integers use my faster code.

Would you like to include my code in your speed test?

Code Select

; Int_to_string - Convert a big integer into an string
; IN:  RAX = binary integer
;      RCX = address of location to store string

align 16
db 15 dup(90h)
 
i2aA      proc        
          push   rdi
          xor    r9d, r9d
          mov    rdi, rcx
          mov    r8d, 10
          xor    ecx, ecx
          xor    edx, edx
@@1:
          div     r8                                                    ; divide by the number-base  
          shl     rcx, 8
          or      edx, 30h
          add     r9,  1  
          xchg    dl,  cl
          test    eax, eax
          je      @@3
          cmp     r9d, 8
          jne     @@1    
          push    rsi
          mov     rsi,rcx
          xor     ecx, ecx
@@2:
          div     r8                                                    ; divide by the number-base  
          shl     rcx, 8
          or      edx, 30h
          inc     r9                                                    ; r9 -> counter
          xchg    dl,  cl
          test    eax, eax
          jne     @@2    
          mov     [rdi],rcx
          mov     [rdi+r9-8],rsi
          pop     rsi        
          pop     rdi
          ret
align 16
@@3:
          mov    [rdi],rcx
          pop    rdi
          ret
i2aA      endp

Note:
The code for the minus sign and leading zeros must be kept out of the algorithm.
It's not normal to bloat the algorithm with 99.99% unusable code.

jj2007 · April 03, 2024, 11:28:46 AM

Quote from: jimg on April 03, 2024, 02:46:08 AMWhich one is faster depends on where it sits in the program and in memory I guess

Thanks, Ray. I checked that and you are right, there was a problem with the align_64 macro that I borrowed from Nidud years ago.

The latest test (above, reply #68) has an align 256. That assembles with UAsm only, though. M$ ml.exe throws an error.

jj2007 · April 03, 2024, 11:33:45 AM

Quote from: lingo on April 03, 2024, 11:22:50 AMWould you like to include my code in your speed test?

My speed test is 32-bit code. Besides, we all know how slow div is - no need to include a slow algo.

ahsat · April 03, 2024, 12:00:18 PM

Quote from: jj2007 on April 03, 2024, 11:28:46 AM
Quote from: jimg on April 03, 2024, 02:46:08 AMWhich one is faster depends on where it sits in the program and in memory I guess

Thanks, Ray. I checked that and you are right, there was a problem with the align_64 macro that I borrowed from Nidud years ago.

The latest test (above, reply #68) has an align 256. That assembles with UAsm only, though. M$ ml.exe throws an error.

I am glad you got it working.

jimg · April 03, 2024, 01:02:51 PM

Quote from: NoCforMe on April 03, 2024, 08:02:31 AMOK ... I guess.

What's still hard to wrap my head around here is that what we're actually doing is multiplying, by a very large non-zero number, so I still don't get how we get multiplication by the reciprocal powers of two (2^-1, 2^-2, etc.) from that operation. How does that work? How do we get that "virtual" binary point?

Thanks to all who are putting up with my simple-mindedness here.

You know how exponential number work, right?

2000 * 300 = (2.0*10^3) * (3.0*10^2) = (2*3)*10^(3+2)= 6* 10^5 = 600000
to multiply exponentials, you multiply the mantissas and add the exponents
Same for binary, it's just powers of 2 rather than 10.

So, if you look a the hex representation of .1 in IEEE754 floating point it looks like 3dcccccd hex.
The leftmost bit is the sign, followed by 8 bits of exponent followed by the mantissa with an implied 1 as the high bit.

exp=7bh=123, exponents are stored with 127 added, so 123-127=-4, so exponent is 2^-4
The mantissa for 1/10 is cccccdh. look familiar?

so to multiply by 1/10 you multiply your numbers mantissa by that mantissa, and adjust the exponent as needed.
The mantissa of your number is just the number with the exponent of (10^0=1) e.g. 1234 is 1234*10^0

Try these if you are hazy on floating point numbers.
FloatConverter
FloatToHex

NoCforMe · April 03, 2024, 01:06:22 PM

OK, I do understand floating-point. But what I don't understand is how you get an integer value (say 0CCCCCCCDh) to behave as if it were a floating-point value. It's the mechanics of the procedure that escape me: would you mind going through the process, step by step, including the shifts and all, for dummies like me?

I could actually use some pictures and diagrams (like in Alice's Restaurant: 8x10 color glossy photographs with circles and arrows and a paragraph on the back of each one explainin' what each one was).

The MASM Forum

News:

Dword to ascii (dw2a, dwtoa, dw2str, Str$, ...)

ahsat

NoCforMe

jimg

NoCforMe

ahsat

jimg

jimg

ahsat

jj2007

lingo

jj2007

jj2007

ahsat

jimg

NoCforMe