News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

qword to ascii conversion

Started by allynm, June 06, 2012, 02:36:32 AM

Previous topic - Next topic

jimg

Looking back over what's gone before, I find I was blinded by the apparent low number of lingo's routine, but as you said earlier, I think the Dixon routine is just as fast, and it is much better commented and logical, although they bear an uncanny resemblance to each other.  I have no idea which came first, but I'd put my money on Dixon.   I just wasted two days screwing around with Lingos routine but I'm switching to Dixon simply because of the documentation :)

jj2007

Check here. It looks a bit garbled, but if you search Dixon inside the page, you'll find it.

Here is January 2004 code by Paul Dixon (Assembler embedded in PowerBasic). Looks different but gives you an idea how old the final version might be. Hutch converted a similar one to Masm, also in 2010: This post is mainly for Paul Dixon as it is a modified version of his conversion algo that Ian_B modified
pushad                 'save registers
sub esp,12             'create a bit of workspace on the stack
mov edi,esp            'point edi at workspace
fild  n&&              'load the data into FPU
mov eax,n&&            'must dereference it since it was passed as a parameter
fild qword [eax]
fbstp tbyte [edi]      'convert data to BCD and save it
mov esi,xp&            'pointer to result string
mov ecx,1              'loop counter for the 2 DWORDS holding the result
mov edx,0              'need to count backwards too as string is stored the opposite way to integer
p:
mov eax,[edi+edx*4+1]  'do conversion 4 low nibbles at a time
and eax,&h0f0f0f0f
add eax,&h30303030
mov [ecx*8+esi+7],al
shr eax,8
mov [ecx*8+esi+5],al
shr eax,8
mov [ecx*8+esi+3],al
shr eax,8
mov [ecx*8+esi+1],al
shr eax,8
mov eax,[edi+edx*4+1]  'and 4 high nibbles at a time
and eax,&hf0f0f0f0
shr eax,4
add eax,&h30303030
mov [ecx*8+esi+6],al
shr eax,8
mov [ecx*8+esi+4],al
shr eax,8
mov [ecx*8+esi+2],al
shr eax,8
mov [ecx*8+esi],al
shr eax,8
inc edx
dec ecx                 'finished 2 DWORDs?
jns lp
mov eax,[edi]           'yes, now do the 2 left over digits that didn't fit (18 digits)
and eax,&h0f
add eax,&h30
mov [esi+17],al
mov eax,[edi]
and eax,&hf0
shr eax,4
add eax,&h30
mov [esi+16],al
add esp,12         'remove workspace from stack
popad              'restore registers

jimg

#77
Thanks for everything, JJ.

Otherwise I'm done.  I'm happy with my cleanup of the Dixon routine.
Hopefully, this is my last post on the topic and I can move up one level in my stack.


edit:
And of course, the first time I went to use it, I needed the size of the string, so I modified the attached code to return in in eax.

jj2007


raymond

@jimg
Maybe I should have mentioned this earlier.

The Dixon procedure is fine as long as you realize and understand the limitations of using the fbstp instruction for the conversion. See
http://www.ray.masmcode.com/tutorial/fpuchap6.htm#fbstp

The above link also has a sub-link to an explanation of the 'packed BCD format' used by the FPU. That may help you understand what those conversion procedures are attempting to do.
Whenever you assume something, you risk being wrong half the time.
http://www.ray.masmcode.com

jj2007

Ray,
Can you explain to us mere mortals where the BCD elements are in Dixon's code? I can't see them...
uqword proc ; lpbuf:DWORD, lpNumber:DWORD    ;unsigned DWORD to ASCII, Paul Dixon
mov ecx, [esp+2*4] ; lp qword number
mov eax, [ecx] ; eax->low dword
mov edx, [ecx+4] ; edx->high dword
mov ecx, [esp+1*4] ; ecx, lpbuf
or edx, edx ;if top word is not used then..
jz udword ; .. use unsigned Dword routine as it;s faster
push ebp ;save registers that need to be saved
push esi
mov ebp, eax ; save a copy of low word for later
mov esi, edx ; save a copy of high word for later
mov pAnswer, ecx ; save a copy of buffer pointer for later, don;t stack it or it;s awkward to get back
; do 64 bit multiply by 2^110\1e14+1 to make it more likely I get no rounding errors = 0B424DC35 095CD810h
; this is a 4 part operation, LOxLO, LOxHI, HIxLO, HIxHI and add the 4 results offset appropriately
mov ecx, 095CD810h ; 2^110\1e14+1 low word
mul ecx ;
mov eax, esi ; get number high word, LSBs of MUL not needed so they;re ignored
; nop
push edi
push ebx
mov edi, edx ; save high word of result
mul ecx ; now do high word mul
mov ecx, 0B424DC35h ; get ready for other half of MUL
add edi, eax ; Add low word into result
adc edx, 0 ; and handle the possible carry
mov eax, ebp ; Get low word of number again
mov ebx, edx ; save high word of answer
mul ecx ; do next part
add edi, eax ; add into answer
mov eax, esi ; get high word of number
adc ebx, edx ; add it in to answer #####? possible carry to higher word? probably not..
mul ecx ; do final mul
mov ecx, 1000000 ; ready for later
add ebx, eax ; add in result
adc edx, 0 ; and carry
add edi, 16384 ; round up last bit to decrease error to within that required.
adc ebx, 0
adc edx, 0
shrd edi, ebx, 14 ;correct for the 14 bit shift used to increase accuracy
shrd ebx, edx, 14
shr edx, 14 ; edx contains top 6 digits, edi:ebx contain the information to get the next 6 digits
; edx = 2D093h ebx= 70D42573h edi=603A5EDAh
; 64 bit multiply done
; result in edx:ebx:edi , original number in esi:ebp
; x 1000000 to get next 6 digits
mov eax, edi
mov esi, edx ;save top6 in esi
mul ecx
mov eax, ebx ;do low word x1 000 000
mov ebx, edx ;
mul ecx ;do high word x 1 000 000
add eax, ebx ;add both together
adc edx, 0
mov ebx, edx ;save 2nd 6 in ebx
; now get ((top6 x 1e6) + next6)*1e8 and sub from original number to leave last 8 digits
; since the 8 digits we want are all contained in the low word we can completely ignore the high word
; this allows imul and does away with carries to the high word
mov eax, esi ;get top 6
imul eax, ecx ;top 6 x 1 000 000
mov ecx, 100000000
add eax, ebx ;top6 x 1 000 000 + next 6
imul eax, ecx
sub ebp, eax ;ebp=last 8 digits
; 20 digits broken into 6,6,8 now display them
; esi=top6, ebx=next 6, ebp=last 8
; do top 6 digits
mov eax, esi ; get top 6 digits
mov edi, 68DB9h ; =2^32\10000+1
mul edi
mov esi, pAnswer ;offset TimeBuffer ;get pointer into answer buffer
mov ecx, 100 ;multiplier for later
jnc qnextrw1 ;if zero, supress them by ignoring
cmp edx, 9 ;1 digit or 2?
ja qZeroSupressedo ;2 digits, just continue with pairs of digits to the end
mov edx, dword ptr chartab[edx+edx] ;look up 2 digits
mov [esi], dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp QZoS1 ;continue with pairs of digits to the end
qnextrw1:
mul ecx ;get next 2 digits
jnc qnextrw2 ;if zero, supress them by ignoring
cmp edx, 9 ;1 digit or 2?
ja QZoS1a ;2 digits, just continue with pairs of digits to the end
mov edx, dword ptr chartab[edx+edx] ;look up 2 digits
mov [esi], dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp QZoS2 ;continue with pairs of digits to the end
qnextrw2:
mul ecx ;get next 2 digits
jnc qnextrw3 ;if zero, supress them by ignoring
cmp edx, 9 ;1 digit or 2?
ja QZoS2a ;2 digits, just continue with pairs of digits to the end
mov edx, dword ptr chartab[edx+edx] ;look up 2 digits
mov [esi], dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp QZoS3 ;continue with pairs of digits to the end
; next 6 digits
qnextrw3:
mov eax,ebx ;get 2nd 6 digits
mov ebx, 28F5C29h ;=2^32\100+1 ready for later
mul edi ;edi=2^32\10000+1
jnc qnextrw4 ;if zero, supress them by ignoring
cmp edx, 9 ;1 digit or 2?
ja QZSo3a ;2 digits, just continue with pairs of digits to the end
mov edx, dword ptr chartab[edx+edx] ;look up 2 digits
mov [esi], dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp QZSo4 ;continue with pairs of digits to the end
qnextrw4:
mul ecx ;get next 2 digits
jnc QZSo5 ;if zero, supress them by ignoring
cmp edx, 9 ;1 digit or 2?
ja QZSo4a ;2 digits, just continue with pairs of digits to the end
mov edx, dword ptr chartab[edx+edx] ;look up 2 digits
mov [esi], dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp QZSo5 ;continue with pairs of digits to the end
; done top 10 digits
; since we took a short cut to the DWORD routine at the start we can never get beyond here
; as the DWORD routine would handle it instead.
; At this point we are guaranteed to have exactly 10 digits to print so just jump to the relevant spot.
qZeroSupressedo:
mov edx, dword ptr chartab[edx+edx] ;look up the 2 digits
mov [esi], dx
add esi, 2
QZoS1:
mul ecx
QZoS1a:
mov edx, dword ptr chartab[edx+edx] ;look up the 2 digits
mov [esi], dx
add esi, 2
QZoS2:
mul ecx
QZoS2a:
mov edx, dword ptr chartab[edx+edx] ;look up the 2 digits
mov [esi], dx
add esi, 2
sj:
QZoS3:
;do next 6 digits
mov eax, ebx ;get 2nd 6 digits
mov ebx, 28F5C29h ;=2^32\100+1 ready for later
mul edi ;edi=2^32\10000+1
QZSo3a:
mov edx, dword ptr chartab[edx+edx]
mov [esi], dx
add esi, 2
QZSo4:
mul ecx
QZSo4a:
mov edx, dword ptr chartab[edx+edx]
mov [esi], dx
add esi, 2
QZSo5:
mul ecx
;QZSo5a:
mov edx, dword ptr chartab[edx+edx]
mov [esi], dx
add esi, 2
;do final 8 digits
mov eax, ebp ;get last 8 digits
mul ebx ;ebx=2^32\100+1
mov eax, edx
mov ebx, edx
mul edi ;edi=2^32\10000+1
mov edx, dword ptr chartab[edx+edx] ;look up next 2 digits
mov [esi], dx
add esi, 2
mul ecx
mov edx, dword ptr chartab[edx+edx] ;look up next 2 digits
mov [esi], dx
add esi, 2
mul ecx
mov edx, dword ptr chartab[edx+edx] ;look up next 2 digits
mov [esi], dx
add esi, 2
mov eax, ebx ;first 6 digits of last 8
imul eax, ecx ;x100 to shift into place
pop ebx
pop edi
mov edx, ebp ;last 8 - 6 just done gives final 2 digits
sub edx, eax ;look up final 2 digits
mov edx, dword ptr chartab[edx+edx]
mov [esi], dx
add esi, 2
mov byte ptr [esi],0 ;need to zero terminate
pop esi
pop ebp
AllDone:
mov eax, [esp+4]
ret 2*4
udword:
push edi ;save registers that need to be saved
push esi
mov esi, ecx ; sptr
mov edi,eax ;eax= x ;save a copy of the number
mov edx, 0D1B71759h ;=2^45\10000 13 bit extra shift
mul edx ;gives 6 high digits in edx
mov eax,68DB9h ;=2^32\10000+1
shr edx,13 ;correct for multiplier offset used to give better accuracy
jz short skiphighdigits ;if zero then don;t need to process the top 6 digits
mov ecx,edx ;get a copy of high digits
imul ecx,10000 ;scale up high digits
sub edi,ecx ;subtract high digits from original. EDI now = lower 4 digits
mul edx ;get first 2 digits in edx
mov ecx,100 ;load ready for later
jnc short next1 ;if zero, supress them by ignoring
cmp edx,9 ;1 digit or 2?
ja short ZeroSupressed ;2 digits, just continue with pairs of digits to the end
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp short ZS1 ;continue with pairs of digits to the end
next1:
mul ecx ;get next 2 digits
jnc short next2 ;if zero, supress them by ignoring
cmp edx,9 ;1 digit or 2?
ja short ZS1a ;2 digits, just continue with pairs of digits to the end
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp short ZS2 ;continue with pairs of digits to the end
next2:
mul ecx ;get next 2 digits
jnc short next3 ;if zero, supress them by ignoring
cmp edx,9 ;1 digit or 2?
ja short ZS2a ;2 digits, just continue with pairs of digits to the end
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp short ZS3 ;continue with pairs of digits to the end
next3:
skiphighdigits:
mov eax,edi ;get lower 4 ditigs
mov ecx,100
mov edx,28F5C29h ;2^32\100 +1
mul edx
jnc short next4 ;if zero, supress them by ignoring
cmp edx,9 ;1 digit or 2?
ja short ZS3a ;2 digits, just continue with pairs of digits to the end
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dh ;but only write the 1 we need, supress the leading zero
inc esi ;update pointer by 1
jmp short ZS4 ;continue with pairs of digits to the end
next4:
mul ecx ;this is the last pair so don;t supress a single zero
cmp edx,9 ;1 digit or 2?
ja short ZS4a ;2 digits, just continue with pairs of digits to the end
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dh ;but only write the 1 we need, supress the leading zero
mov byte ptr [esi+1],0 ;zero terminate string
jmp short xit ;all done
ZeroSupressed:
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dx
add esi,2 ;write them to answer
ZS1:
mul ecx ;get next 2 digits
ZS1a:
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dx ;write them to answer
add esi,2
ZS2:
mul ecx ;get next 2 digits
ZS2a:
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dx ;write them to answer
add esi,2
ZS3:
mov eax,edi ;get lower 4 digits
mov edx,28F5C29h ;2^32\100 +1
mul edx ;edx= top pair
ZS3a:
mov edx,dword ptr chartab[edx*2] ;look up 2 digits
mov [esi],dx ;write to answer
add esi,2 ;update pointer
ZS4:
mul ecx ;get final 2 digits
ZS4a:
mov edx,dword ptr chartab[edx*2] ;look them up
mov [esi],dx ;write to answer
mov byte ptr [esi+2],0 ;zero terminate string
xit:
pop esi ;restore used registers
pop edi
jmp AllDone
uqword endp

raymond

Quotesub esp,12             'create a bit of workspace on the stack
mov edi,esp            'point edi at workspace
fild  n&&              'load the data into FPU
mov eax,n&&            'must dereference it since it was passed as a parameter
fild qword [eax]
fbstp tbyte [edi]      'convert data to BCD and save it
mov esi,xp&            'pointer to result string
mov ecx,1              'loop counter for the 2 DWORDS holding the result
mov edx,0              'need to count backwards too as string is stored the opposite way to integer
p:
mov eax,[edi+edx*4+1]  'do conversion 4 low nibbles at a time

Above is part of Dixon's code which you posted previously.
The first red line shows that EDI is used for pointing to a workspace reserved on the stack by the previous instruction.
The following green line indicates that the target dword is loaded on the FPU.
The next red instruction specifies that the dword gets converted to the BCD format and stored in the reserved workspace on the stack.
Then, the BCD nibbles get recovered sequentially from the stack for unpacking, as pointed to by EDI/EDX in the next red instruction (and other subsequent similar instructions).
Whenever you assume something, you risk being wrong half the time.
http://www.ray.masmcode.com

jj2007

Quote from: raymond on July 11, 2017, 02:19:33 AMAbove is part of Dixon's code which you posted previously.

Oops, I forgot - the old routine he posted in 2004. Yes, that is BCD code, of course.

raymond

 :dazzled: Kind of confusing when the same author issues separate procedures for the same purpose.

Anyway, my initial comment was primarily to warn jimg about the use of such instructions without knowing about its limitations, in addition to explaining what it does.
Whenever you assume something, you risk being wrong half the time.
http://www.ray.masmcode.com

jimg

I started out trying to use the FPU, but quickly realized it takes twice as long to do a single FINIT than the whole non-FPU routine.  Rather disheartening.

jj2007

Quote from: jimg on July 11, 2017, 03:19:39 PM
I started out trying to use the FPU, but quickly realized it takes twice as long to do a single FINIT than the whole non-FPU routine.

You need finit only once in your program, or at the beginning of a loop. In this earlier post, there are three FPU routines that are, for example, faster than umqtoa. Problem is that a handful of values are, ehm, incorrect because of rounding errors. Tweaking might solve this issue. As I wrote earlier, this is quite experimental :biggrin:

jimg

So I can assume that if I call some other routine in some other library not written by me, that it will not screw up the FPU?  Which implies by extension that if I write a general purpose routine, it is incumbent upon me to return the FPU in a clean state?

jj2007

Good question, Jim :t

There is a thread on MSDN social saying "The ABI requires that the control word is preserved", and a Masm32 thread on "Application Binary Interface (ABI), calling conventions and the like". On SOF, they discuss Is it necessary to save the FPU state here?

If anybody has an official ABI info on the fpu in x86/x64, please post a link.

In practice, I never have seen the fpu control word change when calling a Windows function. If that function uses the fpu, and doesn't fully restore everything, then some register contents will be gone, of course.

The MasmBasic library has 65 occurrences of ffree st(7) - the usual way to ensure that the fpu behaves well when using it. Just tested with a Window application under Win7-64, and it seems that Windows doesn't touch the fpu at all. Even after the WM_PAINT handler, all content is still there. Don't rely on that, it may differ between Windows versions.

jimg

In a program I've been working on, I always do a FINIT at the start of the several procs that use the FPU, and set the control word to truncate.  If the ABI requires that the control word be preserved, then I need to restore it before leaving each of these procs?

How does ffree st(7) make the fpu behave?  The description says it just sets it to empty.

Sounds like you have to worry about both ends.  You can't count on what state you get, but have to leave in a clean state.

RuiLoureiro

#89
Quote from: jimg on July 12, 2017, 12:37:36 AM
In a program I've been working on, I always do a FINIT at the start of the several procs that use the FPU, and set the control word to truncate.  If the ABI requires that the control word be preserved
...
TheCalculator starts with FINIT and uses FINIT in the conversion routines only - string to real10 and real10 to string. And control word is preserved. It seems it works correctly always. Whenever it finds an invalid operation it does FCLEX and exit with an error code.