FputoString format

NoCforMe · May 26, 2025, 08:27:25 AM

Quote from: daydreamer on May 26, 2025, 07:15:06 AMUnicode Character "∞" (U+221E)
For those who want to use it in gui control,use with invoke sendwmessage

I think "+INF" and "-INF" would be far safer bets than trying to display Unicode characters.

guga · May 26, 2025, 09:15:42 AM

Quote from: NoCforMe on May 26, 2025, 08:27:25 AM
Quote from: daydreamer on May 26, 2025, 07:15:06 AMUnicode Character "∞" (U+221E)
For those who want to use it in gui control,use with invoke sendwmessage

I think "+INF" and "-INF" would be far safer bets than trying to display Unicode characters.

Indeed. Currently, the function output this messages in case of errors:

+INFINITE
INDEFINITE
-INFINITE
QNAN
SNAN
Special INDEFINITE +INFINITE
Special INDEFINITE -INFINITE
Special INDEFINITE QNAN
Special INDEFINITE SNAN
Unknown FPU error

And for internal debugging, i added this while i´m testing for valid numbers

Valid Normal Negative
Valid Normal Positive
Valid Subnormal Negative
Valid Subnormal Positive
Valid Zero Number
Denormalized

I believe this covers all common FPU errors. Not sure if there are additional error modes to output.

And when the function identifies a Positive or Negative Subnormal value, it tries to de-normalize multiplying the input (Which is always converted to a tenbyte 1st) by 2^64 and immediately multiplied by it´s opposite (1/(2 ^64)) (That is the same as if i divided it again by 2^64)

NoCforMe · May 26, 2025, 12:51:10 PM

Quote from: guga on May 26, 2025, 09:15:42 AMIndeed. Currently, the function output this messages in case of errors:
QNAN
SNAN

Are those "quiet" and "signaling" NANs? Do users really want to be informed of internal FPU conditions at that level of detail?

Have you checked with the FPU-meister here, Raymond F? (I don't know enough about it to be of much help.) He should be able to tell us what information here is relevant and what isn't.

guga · May 26, 2025, 03:12:28 PM

Hi David. Yeah, i´m using a variation of his function to check for the categories. I´m just trying to output as many as possible, so it can be used in debuggers for additional information, for example.

RosAsm uses those information on the debugger (on a similar way as in Ollydbg or IdaPro), but i´m updating the function on a way it can be more portable so others can use as well.

This is the older FloattoAscii routine based on Raymonds fpu. Btw...I´ll try porting this to masm once i´m done.

Code Select

; Flags:

[REGULAR 0    SCIENTIFIC 1]

Proc FloatToAscii:
    Arguments @Source, @Destination, @Decimal, @Flag
    Local @temporary, @eSize, @oldcw, @truncw, @stword
    Structure @BCD 12, @bcdstr 0

        fclex                   ;clear exception flags on FPU

      ; Get the specified number of decimals for result (MAX = 15):
        On D@Decimal > 0F, mov D@Decimal 0F

      ; The FPU will be initialized only if the source parameter is not taken
      ; from the FPU itself (D@ Source <> &NULL):
        .If D@Source = &NULL
            fld st0             ;copy it to preserve the original value
        .Else
            mov eax D@Source
            If eax > 0400_000
                finit | fld T$eax
              ; Check first if value on FPU is valid or equal to zero:
                ftst                    ;test value on FPU
                fstsw W@stword          ;get result
                test W@stword 04000     ;check it for zero or NAN
                jz L0>                  ;continue if valid non-zero
                test W@stword 0100      ;now check it for NAN
                jnz L1>                 ;Src is NAN or infinity - cannot convert
                  ; Here: Value to be converted = 0
                    mov eax D@Destination | mov W$eax '0' ; Write '0', 0 szstring
                    mov eax &TRUE | finit | ExitP
            Else
L1:             finit | mov eax &FALSE | ExitP
            End_If
        .End_If

      ; Get the size of the number:
L0:     fld st0                 ;copy it
        fabs                    ;insures a positive value
        fld1 | fldl2t
        fdivp ST1 ST0           ;->1/[log2(10)]
        fxch | fyl2x            ;->[log2(Src)]/[log2(10)] = log10(Src)

        fstcw W@oldcw           ;get current control word
        mov ax W@oldcw
        or ax 0C00              ;code it for truncating
        mov W@truncw ax
        fldcw W@truncw          ;change rounding code of FPU to truncate

        fist D@eSize            ;store characteristic of logarithm
        fldcw W@oldcw           ;load back the former control word

        ftst                    ;test logarithm for its sign
        fstsw W@stword          ;get result
        test W@stword 0100      ;check if negative
        jz L0>
            dec D@eSize

L0:     On D@eSize > 15, mov D@Flag SCIENTIFIC

      ; Multiply the number by a power of 10 to generate a 16-digit integer:
L0:     fstp st0                ;get rid of the logarithm
        mov eax 15
        sub eax D@eSize         ;exponent required to get a 16-digit integer
        jz L0>                  ;no need if already a 16-digit integer
            mov D@temporary eax
            fild D@temporary
            fldl2t | fmulp ST1 ST0       ;->log2(10)*exponent
            fld st0 | frndint | fxch
            fsub st0 st1        ;keeps only the fractional part on the FPU
            f2xm1               ;->2^(fractional part)-1
            fld1
            faddp ST1 ST0       ;add 1 back
            fscale              ;re-adjust the exponent part of the REAL number
            fxch
            fstp st0
            fmulp ST1 ST0       ;->16-digit integer

L0:     fbstp T@bcdstr          ;transfer it as a 16-digit packed decimal
        fstsw W@stword          ;retrieve exception flags from FPU
        test W@stword 1         ;test for invalid operation
        jnz L1<<                ;clean-up and return error

      ; Unpack bcd, the 10 bytes returned by the FPU being in the little-endian style:
        push ecx, esi, edi
            lea esi D@bcdstr+9
            mov edi D@Destination
            mov al B$esi        ;sign byte
            dec esi | dec esi
            If al = 080
                mov al minusSign      ;insert sign if negative number
            Else
                mov al Space      ;insert space if positive number
            End_If
            stosb

            ...If D@Flag = REGULAR
              ; Verify number of decimals required vs maximum allowed:
                mov eax 15 | sub eax D@eSize
                cmp eax D@Decimal | jae L0>
                    mov D@Decimal eax

              ; ;check for integer digits:
L0:             mov ecx D@eSize
                or ecx ecx           ;is it negative
                jns L3>
                  ; Insert required leading 0 before decimal digits:
                    mov ax '0o' | stosw
                    neg ecx
                    cmp ecx D@Decimal | jbe L0>
                        jmp L8>>

L0:                 dec ecx | jz L0>
                        stosb | jmp L0<
L0:
                    mov ecx D@Decimal | inc ecx
                    add ecx D@eSize | jg L4>
                        jmp L8>>

              ; Do integer digits:
L3:             inc ecx
L0:             movzx eax B$esi | dec esi | ror ax 4 | ror ah 4
                add ax '00' | stosw | sub ecx 2 | jg L0<
                jz L0>
                    dec   edi

L0:             cmp D@Decimal 0 | jz L8>>
                    mov al pointSign | stosb
                    If ecx <> 0
                        mov al ah | stosb
                        mov ecx D@Decimal | dec ecx | jz L8>>
                    Else
                        mov ecx D@Decimal
                    End_If

              ; Do decimal digits:
L4:             movzx eax B$esi
                dec esi
                ror ax 4 | ror ah 4 | add ax 03030 | stosw
                sub ecx 2 | jg L4<
                jz L1>
                    dec edi
L1:             jmp L8>>

          ; scientific notation
            ...Else
                 mov ecx D@Decimal | inc ecx
                movzx eax B$esi | dec esi
                ror ax 4 | ror ah 4 | add ax '00' | stosb
                mov al pointSign | stosb
                mov al ah | stosb
                sub ecx 2 | jz L7>
                jns L0>
                    dec edi | jmp L7>
L0:             movzx eax B$esi
                dec esi
                ror ax 4 | ror ah 4
                add ax '00' | stosw | sub ecx 2 | jg L0<
                jz L7>
                    dec edi

L7:             mov al 'E' | stosb
                mov al plusSign, ecx D@eSize | or ecx ecx | jns L0>
                    mov al minusSign | neg ecx
L0:             stosb
              ; Note: the absolute value of the size could not exceed 4931
                mov eax ecx
                mov cl 100
                div cl          ;->thousands & hundreds in AL, tens & units in AH
                push eax
                    and eax 0FF ;keep only the thousands & hundreds
                    mov cl 10
                    div cl      ;->thousands in AL, hundreds in AH
                    add ax '00' ;convert to characters
                    stosw       ;insert them
                pop eax
                shr eax 8       ;get the tens & units in AL
                div cl          ;tens in AL, units in AH
                add ax '00'     ;convert to characters
                stosw           ;insert them
            ...End_If

L8:         mov B$edi Space         ;string terminating character
        pop edi, esi, ecx

        finit | mov eax D@eSize
EndP

And this is the one i´m currently updating.

Code Select

Proc FloatToString:
    Arguments @Float80Pointer, @InputFlag, @DestinationPointer, @TruncateBytes, @AddSubNormalMsg
    Local @ExponentSize, @ControlWord, @FPUStatusHandle, @tempdw, @extra10x, @FPUMode
    Structure @TmpStringBuff 128, @pTempoAsciiFpuDis 0, @pBCDtempoDis 64, @pTmpInputDis 96
    Uses esi, edi, edx, ecx, ebx

    ; @FPUStatusHandle = 0 Default
    ; @FPUStatusHandle = 1 Increased. Positive exponent value
    ; @FPUStatusHandle = 2 Decreased. Negative exponent value

    call 'RosMem.FastZeroMem' D@TmpStringBuff, 128
    mov D@ExponentSize 0, D@FPUStatusHandle 0, D@extra10x 0 D@FPUMode SpecialFPU_PosValid; D@IsNegative &FALSE,
    mov edi D@DestinationPointer, eax D@Float80Pointer

    ; always work with a cOpy of the input to prevent it being changed, specially when the number is subnormal.
    lea ebx D@pTmpInputDis
    finit | fclex | fstcw W@ControlWord ; Save FPU control word, clear exceptions, reset FPU
    If D@InputFlag = FPU_STR_REAL4_INT
        fild F$eax
    Else_if D@InputFlag = FPU_STR_REAL4_FLOAT
        fld F$eax
    Else_if D@InputFlag = FPU_STR_REAL8_INT
        fild R$eax
    Else_if D@InputFlag = FPU_STR_REAL8_FLOAT
        fld R$eax
    Else
        fld T$eax
    End_If
    fstp T$ebx
    fldcw W@ControlWord | fwait         ; Restore FPU control word

    ; Check for zero (positive or negative)
    .If_and D$ebx = 0, D$ebx+4 = 0
        If_Or W$ebx+8 = 0, W$ebx+8 = 08000
            ;mov W$ebx+8 0 ; no longer needed since we are working only with a copy
            mov B$edi '0', B$edi+1 0
            mov eax D@FPUMode
            ExitP
        End_If
    .End_If


    ; Handle sign and special number categories
    call RealTenFPUNumberCategory ebx
    mov D@FPUMode eax
    If eax >= SpecialFPU_QNAN ; do we have any special FPU being used ? Yes, display the proper message and exit
        mov ebx eax
        call WriteFPUErrorMsg eax, edi
        mov eax ebx
        ExitP
    End_If

    Test_If B$ebx+9 0_80
        mov B$edi '-' | inc edi
        xor B$ebx+9 0_80
    Test_End

    ; Special handling for subnormal numbers
    .If_Or D@FPUMode = SpecialFPU_PosSubNormal, D@FPUMode = SpecialFPU_NegSubNormal

        finit | fclex | fstcw W@ControlWord ; Save FPU control word, clear exceptions, reset FPU

        fld T$ebx                           ; Load the subnormal number into ST(0) (e.g., X)
        fld T$Float_NormalizationFactor ; Load 2^60 into ST(0), pushing X to ST(1)
        fmulp ST1 ST0                      ; Compute X * 2^60. Result (normalized) is in ST(0). ST(1) is popped.
                                            ; FPU Stack: ST(0) = X * 2^60

        fld T$Float_DenormalizationFactor ; Load 1/2^60 into ST(0), pushing (X * 2^60) to ST(1)
        fmulp ST1, ST0                      ; Compute (X * 2^60) * (1/2^60). Result (original X, now normalized) is in ST(0).
                                            ; FPU Stack: ST(0) = X (normalized)

        fstp T$ebx                          ; Store the normalized original value back to memory (original @Float80Pointer location)

        ; NO ADJUSTMENT TO D@ExponentSize IS NEEDED HERE,
        ; because the number's actual mathematical value hasn't changed.
        ; Its internal representation is just optimized.

        fldcw W@ControlWord | fwait         ; Restore FPU control word
    .End_If


    ; extract the exponent. 1e4933
    finit | fclex | fstcw W@ControlWord
    fld T$ebx
    call GetExponentFromST0 &FPU_EXCEPTION_INVALIDOPERATION__&FPU_EXCEPTION_DENORMALIZED__&FPU_EXCEPTION_ZERODIV__&FPU_EXCEPTION_OVERFLOW__&FPU_EXCEPTION_UNDERFLOW__&FPU_EXCEPTION_PRECISION__&FPU_PRECISION_64BITS
    mov D@ExponentSize eax
    ffree ST0
    .If D@ExponentSize < FPU_ROUND
        fld T$ebx
        fld st0 | frndint | fcomp st1 | fstsw ax
        Test_If ax &FPU_EXCEPTION_STACKFAULT
            lea ecx D@pBCDtempoDis
            fbstp T$ecx     ; -> TBYTE containing the packed digits
            fwait
            lea eax D@pTempoAsciiFpuDis
            lea ecx D@pBCDtempoDis
            call FloatToBCD_SSE ecx, eax
            mov eax FPU_MAXDIGITS+1 | mov ecx D@ExponentSize | sub eax ecx | inc ecx
            lea esi D@pTempoAsciiFpuDis | add esi eax

            If B$esi = '0'
                inc esi | dec ecx
            End_If

            call 'FastCRT.StrncpyEx' edi, esi, ecx | add edi eax
            xor eax eax
            jmp L9>>
        Test_End
        ffree ST0
    .End_If

    ; Necessary for FPU 80 Bits. If it is 0, the correct is only 0 and not 0.e+XXXXX.
    If D@ExponentSize = 080000000
        mov D@ExponentSize 0
    Else_If D@ExponentSize = 0
        mov D@ExponentSize 0
    End_If

    ; multiply the number by the power of 10 to generate required integer and store it as BCD

    ; We need to extract here all the exponents of a given number and multiply the result by the power of FPU_MAXDIGITS+1 (1e17)
    ; So, if our number is 4.256879e9, the result must be 4.256879e17. If we have 3e-2 the result is 3e17.
    ; If we have 0.1 the result is 1e17 and so on.
    ; If we have as a result a power of 1e16. It means that we need to decrease the iExp by 1, because the original
    ; exponential value is wrong.
    ; This result will be stored in ST0


    ..If D@ExponentSize <s 0
        mov eax D@ExponentSize | neg eax | add eax FPU_MAXDIGITS ; always add MaxDigits-1
        mov edx D@ExponentSize | lea edx D$eax+edx
        .If eax > 4932
            mov edx eax | sub edx 4932 | sub edx FPU_MAXDIGITS
            mov D@extra10x edx | add D@extra10x FPU_MAXDIGITS
            mov eax 4932
            If D@extra10x >= FPU_MAXDIGITS
                inc D@ExponentSize
            End_If
        .Else_If edx >= FPU_MAXDIGITS
            inc D@ExponentSize
        .End_If
    ..Else_If D@ExponentSize > 0
        mov eax FPU_MAXDIGITS+1 | sub eax D@ExponentSize
    ..Else ; Exponent size = 0
        mov eax FPU_MAXDIGITS+1
    ..End_If
    mov D@tempdw eax


    ; Apply scaling
    fild D@tempdw
    call ST0PowerOf10
    fld T$ebx | fmulp ST0 ST1

    If D@extra10x > 0
        ; Calculate the exponencial value of the extrabytes
        fild F@extra10x
        call ST0PowerOf10
        fmulp ST0 ST1 ; and multiply it to we get XXe17 or xxe16
    End_If

    ; now we must get the power of FPU_MAXDIGITS+1. In this case, we will get the value 1e17.
    fstp T$FloatToStr_ScaledValue ; Load scaled number back into ST0

    Fpu_If T$FloatToStr_ScaledValue < T$FloatToStr_Reference
        fld T$FloatToStr_ScaledValue ; Reload the scaled value into ST0
        fmul R$FloatToStr_Ten             ; Perform the multiplication
        dec D@ExponentSize
        fstp T$FloatToStr_ScaledValue ; Store the result back, or use it directly if this is the last FPU operation
    Fpu_End_If

    ; Now, get the FPU status *after* the macro has executed its comparison and potentially fmul.
    ; This will capture the FPU status word including any exception flags from the fmul.
    ; If FPURoundFix needs the comparison result (CF, ZF, PF), this still won't work perfectly
    ; because the SAHF already transferred them to EFLAGS, and we need to get the full FSW.
    ; However, if FPURoundFix mostly cares about *exceptions* (invalid, underflow, overflow), this might suffice.

    fstsw ax | fwait | mov D@FPUStatusHandle eax ; save exception flags. THis is the best place to capture all exceptions
                                                 ; including the oned related to the comparitions

    fld T$FloatToStr_ScaledValue ; Reload the final scaled value for fbstp.

    ; Final conversion to BCD
    lea ecx D@pBCDtempoDis
    fbstp T$ecx             ; ->TBYTE containing the packed digits
    fwait

    lea ecx D@pBCDtempoDis
    lea eax D@pTempoAsciiFpuDis
    call FloatToBCD_SSE ecx, eax

    ; Adjust the Exponent when some Exceptions occurs and try to Fix whenever is possible the rounding numbers
    lea eax D@pTempoAsciiFpuDis
    call FPURoundFix eax, D@FPUStatusHandle D@ExponentSize, D@TruncateBytes
    mov D@ExponentSize eax

    lea esi D@pTempoAsciiFpuDis | mov ecx D@ExponentSize
    inc ecx

    ..If_And ecx <= FPU_ROUND, ecx > 0
        mov eax 0
        While B$esi <= ' ' | inc esi | End_While
        While B$esi = '0'
            inc esi
            ; It may happens that on rare cases where we had an ecx = 0-1, we have only '0' on esi.
            ; So while we are cleaning it, if all is '0', we set edi to one single '0', to avoid we have a
            ; Empty String.
            If B$esi = 0
                mov B$edi '0' | inc edi
                jmp L9>>
            End_If
        End_While

        C_call 'FastCRT.FormatStr' edi, {'%s' 0}, esi | add edi D@ExponentSize | inc edi
        add esi D@ExponentSize | inc esi
        .If B$esi <> 0
            mov B$edi '.' | inc edi
            C_call 'FastCRT.FormatStr' edi, {'%s' 0}, esi | add edi eax
            While B$edi-1 = '0' | mov B$edi-1 0 | dec edi | End_While
            If B$edi-1 = '.'
                dec edi
            End_If
        .End_If
    ..Else
        While B$esi <= ' ' | inc esi | End_While
        .While B$esi = '0'
            inc esi
            If B$esi = 0
                mov B$edi '0' | inc edi
                jmp L9>>
            End_If
         .End_While

        .If B$esi <> 0
            movsb | mov B$edi '.' | inc edi
            C_call 'FastCRT.FormatStr' edi, {'%s' 0}, esi | add edi eax
            ; Clean last Zeros at the end of the Number String.
            While B$edi-1 = '0' | mov B$edi-1 0 | dec edi | End_While
            If B$edi-1 = '.'
                dec edi
            End_If

            mov B$edi 'e' | mov eax D@ExponentSize
            mov B$edi+1 '+'

            Test_If eax 0_8000_0000
                neg eax | mov B$edi+1 '-'
            Test_End

            inc edi | inc edi
            C_call 'FastCRT.FormatStr' edi, {'%d' 0}, eax | add edi eax
        .Else
            mov B$edi '0' | inc edi
        .End_If
    ..End_If

    ; For developers only:
    ; Uncomment these function if you want to analyse the Exceptions modes of the FPU Data.
    ;call TestingFPUExceptions D@FPUStatusHandle ; Control Word exceptions
    ;call TestingFPUStatusRegister D@FPUStatusHandle ; Status Registers envolved on the operation

L9:

    mov B$edi 0
    fldcw W@ControlWord | fwait

    .If D@AddSubNormalMsg = &TRUE
        If_Or D@FPUMode = SpecialFPU_PosSubNormal, D@FPUMode = SpecialFPU_NegSubNormal
            call 'FastCRT.StrCpy' edi, {B$ " (Denormalized)", 0}
        End_If
    .End_If
    mov eax D@FPUMode

EndP

The new version wil include the output formats that lacks in RosAsm. So, one can output the numbers in a wide range of formats as needed.

The goal is create a general usage function and use it as an export on a dll. And also, add this function as an extension to a FormatString i created for RosAsm (A variation of ws_printf). I´m quite done. Just need to review the code, clean and create the routines to handle non-scientific outputs, such as 0.000121234, 12344.5645 etc etc

daydreamer · May 26, 2025, 03:41:55 PM

dont forget to also include the common usage of SSE/SSE2 denormal and such,which is high probability fp code is run with SSE/SSE2 instead,even if you dont write SSE/SSE2 code yourself,invoking math function from a library might be coded in SSE/SSE2

I am curious if it works on all OSes used today :win7,win8,win10,win11 with unicode in a gui control,or functionality on earliest win7 lack unicode caps ?

guga · May 26, 2025, 10:37:49 PM

Quote from: daydreamer on May 26, 2025, 03:41:55 PMdont forget to also include the common usage of SSE/SSE2 denormal and such,which is high probability fp code is run with SSE/SSE2 instead,even if you dont write SSE/SSE2 code yourself,invoking math function from a library might be coded in SSE/SSE2

I am curious if it works on all OSes used today :win7,win8,win10,win11 with unicode in a gui control,or functionality on earliest win7 lack unicode caps ?

Hi Daydreamer

SSE/SSE2 denormal i´m using specific for a pow, log and exp functions. For FPU it´s not necessary, since i´m focusing in precision as well and FPU grant´s that when dealing with 80 bits. The helper functions i created that uses SSE inside the FPUtoString are for the BCD conversion, dwtoAscii conversion and to copy the error messages to the output.

The older FloatToBCD inside RosAsm was this:

Code Select


; new on 20/05/2025
Proc FloatToBCD:
    Arguments @Input, @Output
    Uses esi, edi, ecx, eax

    mov esi D@Input
    add esi 9
    mov edi D@Output

    ;  The 1st bytes of the BCD Will always be 0. So we will need to bypass them to properly
    ; achieve a better result in case we are dealing with negative exponent values or other non
    ; integer values that may result in more then One zero byte at the start. Example:
    ; Sometimes when we are analysing the number 1, it can have the following starting bytes:
    ; 00999999999999999 . So, the 1st Byte is ignored to the form in edi this: 09999999999.
    ; On this example it is the number 0.999999999999999e-6 that is in fact 1e-6
    mov ecx 10

    If B$esi = 0
        dec ecx
        dec esi
    End_If

    Do
        ; Process two nibbles at once
        mov al B$esi | mov ah al | shr al 4 | and ah 0F
        add ax '00'
        mov W$edi ax
        add edi 2
        dec esi
        dec ecx
    Repeat_Until_Zero
    mov B$edi 0

EndP

But, i updated it again to use SSE2 as:

Code Select

; equates for pshufd and friends
[SSE_INVERT_DWORDS 27]      ; invert the order of dwords
[SSE_ROTATE_64BITS 78]       ; the same things and result as SSE_SWAP_QWORDS, SSE_ROTATE_RIGHT_64BITS, SSE_ROTATE_LEFT_64BITS, SSE_ROTATE_64BITS

; global variables used:
[<16 SSE2_NibbleMask: B$ 0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F,0F]
[<16 SSE2_AsciiZero:  B$ '0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0']

Proc FloatToBCD_SSE:
    Arguments @Input, @Output
    Structure @XmmPreserve 128, @XMMReg0Dis 0, @XMMReg1Dis 16, @XMMReg2Dis 32, @XMMReg3Dis 48
    Uses esi, edi, ecx, eax

    ; Preserve XMM registers efficiently
    movdqu X@XMMReg0Dis XMM0
    movdqu X@XMMReg1Dis XMM1
    movdqu X@XMMReg2Dis XMM2
    movdqu X@XMMReg3Dis XMM3

    mov esi D@Input
    mov edi D@Output

    ; Check for leading zero byte
    mov ecx 2
    If B$esi+9 = 0
        dec ecx
    End_If
    add esi ecx

    ; Prepare SSE2 constants
    movdqa xmm3 X$SSE2_NibbleMask    ; 0x0F0F0F0F...
    movdqa xmm2 X$SSE2_AsciiZero     ; '00' repeated

    ; Process 8 bytes at a time (SSE2 can do 16, but we're limited by BCD format)
    movq xmm0 Q$esi        ; Load bytes [esi+ecx..esi]
    MOVDQA xmm1 xmm0       ; xmm1 for lower nibbles
    PAND xmm1 xmm3         ; Isolate lower nibbles
    POR xmm1 xmm2          ; Convert lower nibbles to ASCII

    PSRLW xmm0 4           ; Shift upper nibbles to lower positions
    PAND xmm0 xmm3         ; Isolate shifted upper nibbles
    POR xmm0 xmm2          ; Convert upper nibbles to ASCII

    ;SSE_SWAP_D_HI_LOW xmm2 xmm0
    ; --- Interleave the digits (Upper, Lower, Upper, Lower...) ---
    ; The result for the first 16 digits will be in xmm0
    PUNPCKLBW xmm0 xmm1    ; xmm0 now has [U0, L0, U1, L1, U2, L2, U3, L3, U4, L4, U5, L5, U6, L6, U7, L7]


    PSHUFHW xmm0 xmm0 SSE_INVERT_DWORDS   ; Reverse 16-bit word order in high 64 bits of xmm0
    PSHUFLW xmm0 xmm0 SSE_INVERT_DWORDS   ; Reverse 16-bit word order in low 64 bits of xmm0
    PSHUFD  xmm0 xmm0 SSE_ROTATE_64BITS   ; Swap 64-bit halves and reverse dword order within them.

    ; Store 16 ASCII characters
    movdqu X$edi xmm0
    add edi 16

    ; process remaining:
   sub esi ecx
    Do
        ; Process two nibbles at once
        mov al B$esi | mov ah al | shr al 4 | and ah 0F
        add ax '00'
        mov W$edi ax
        add edi 2
        dec esi
        dec ecx
    Repeat_Until_Zero

    mov B$edi 0


    ; Restore XMM registers
    movdqu XMM0 X@XMMReg0Dis
    movdqu XMM1 X@XMMReg1Dis
    movdqu XMM2 X@XMMReg2Dis
    movdqu XMM3 X@XMMReg3Dis

EndP

I´m using just a few SSE2 functions to optimize the whole function a little bit. It´s not completely necessary, but it won´t hurt optimizing it, specially if the function is to be used in large databases where someone needs to convert the FPU to Ascii string. For example, i´ll have to use it anyway on a large set of 130.000 variables i'll use in the pow function. So it won´t hurt forcing the function to be a bit faster, while maintaining performance.

daydreamer · May 27, 2025, 04:55:03 PM

Guga great to code BCD using SSE2

,also because in x64 mode,the special x86 bcd opcodes wont work

but best performance when optimizing big ascii files are
1:allocate big enough buffer in memory which is fastest and convert all numbers to ascii text
and code that adds together strings "3.14159 "+"1.4215 "+ lot faster than print/fprint
2:block read/write whole buffer to files,compared to fprint 130.000 numbers lot faster

guga · May 27, 2025, 06:51:53 PM

Hi Daydreamer

Tks

I didn´t knew that problem in x64 related to bcd opcodes. I thought it could be done easily. When i ported this BCD routine to SSE2 for x86, it was a hell, because i wasn´t being able to reverse the order of all 18 to 20 ascii chars at once (i could use pshufb, but for now, i can´t try using SSE4 opcodes yet). Fortunately, i remembered a old test in another function that swapped words, and i gave a last try to see if that could be adapted to work as expected :)

About optimizing Big ascii files. Yeah, i´m aware that using the data in memory of such larger dataset is faster. But, i didn´t tested it yet on the pow function. My original idea still is use a small table of variables varying in few blocks (3 perhaps) of only 128, 256 or 512 Real8 as i did when succeeded to port the M$ function. The problem is that in both (mine and M$) it starts loosing precision after the 15th to 16th digit, and i have no idea how M$ scaled this thing. I mean, i couldn´t succeed to recreate the exact Math equations they did to produce the values on those tables.

That´s why i tried a new table regardless the size. The problem is that, as you saw, it may loose performance, and it will occupy a large space in the data section, resulting on a even larger dll (The math functions will be used on a dll, btw).
On the other hand, if i use them in memory, i´ll have to precalculate the values of all used tables, before pass them onto the pow (or log, ln etc etc) function. And it will kill performance making the code slow. Alternatively, to bypass this performance issue, i can simply add a Generate Table function, (or initially/Setup function - or whatever name it can be) and inform the users that if he wants to use the math functions, such initialization functions must be settled 1st.

Or since the functions will be used as exports apis inside a dll, i can use the initialization functions on the start of the Dll (Right after DLL_PROCESS_ATTACH), and if someone wants to use the pow, log, etc etc, it won´t loose his speed since the tables will be previously generated in memory.

I´ll have to test all of this to make sure what is the better to use. Ideally it would be better use just smaller tables but, no matter what math equation i try, i didn´t succeeded to make the result be precise up to the 17th digit yet and grant me the same value as in wolframalpha. The only way i found (yet) is with those larger tables.

NoCforMe · May 28, 2025, 05:01:37 AM

Quote from: guga on May 27, 2025, 06:51:53 PMHi Daydreamer

Tks I didn´t knew that problem in x64 related to bcd opcodes. I thought it could be done easily. When i ported this BCD routine to SSE2 for x86, it was a hell, because i wasn´t being able to reverse the order of all 18 to 20 ascii chars at once (i could use pshufb, but for now, i can´t try using SSE4 opcodes yet).

This is all very interesting, but tell us this:
Who the hell even uses BCD anymore?
I'm pretty sure that format went the way of the dinosaurs with "big iron" IBM mainframes and such.
Do you know of even one instance of someone actually using that nowadays?
Sheesh, the amount of effort that people expend here on things that nobody^[1] is never going to use ...

[1] For certain values of "nobody".

FORTRANS · May 28, 2025, 07:39:39 AM

Hi,

Quote from: guga on May 26, 2025, 06:25:53 AM
Quote from: NoCforMe on May 26, 2025, 04:36:58 AM
Quote from: sinsi on May 25, 2025, 03:19:43 PMThen again, the FPU can have -0 and +0

True, but do we really need to display that?
I mean, when does it matter to us which zero the FPU is giving us?

yeah. I saw that too. But it´s useless IMHO (Unless we are working with Imaginary numbers, i suppose). Allowing the function output only +Infinite or -Infinite is enough.

Any normal usage of imaginary numbers does not have the concept of
a "signed" zero. Plus and minus zeroes are an artifact of the internal
representation of a floating point number used by the FPU. At least those
made by Intel.

Cheers,

Steve N.

NoCforMe · May 28, 2025, 07:51:40 AM

Good. So we're paring down the number of cases that actually need to be presented to the user here.
That should make your life a little bit easier, @guga.

guga · May 28, 2025, 08:35:56 AM

Quote from: NoCforMe on May 28, 2025, 05:01:37 AM
Quote from: guga on May 27, 2025, 06:51:53 PMHi Daydreamer

Tks I didn´t knew that problem in x64 related to bcd opcodes. I thought it could be done easily. When i ported this BCD routine to SSE2 for x86, it was a hell, because i wasn´t being able to reverse the order of all 18 to 20 ascii chars at once (i could use pshufb, but for now, i can´t try using SSE4 opcodes yet).

This is all very interesting, but tell us this:
Who the hell even uses BCD anymore?
I'm pretty sure that format went the way of the dinosaurs with "big iron" IBM mainframes and such.
Do you know of even one instance of someone actually using that nowadays?
Sheesh, the amount of effort that people expend here on things that nobody^[1] is never going to use ...

[1] For certain values of "nobody".

Hi David.

The bcd part of the code is necessary to unpack the values stored in FBSTP opcode.
https://c9x.me/x86/html/file_module_x86_id_83.html
https://masm32.com/masmcode/rayfil/BCDtut.html

About other uses, I don't know if it is used in other ways besides these conversion operations with FPU.

QuoteGood. So we're paring down the number of cases that actually need to be presented to the user here.
That should make your life a little bit easier, @guga.

Yes. I don´t see the purpose of forcing the function to output +0 or -0. It already exports the proper error messages and message codes related to FPU.

I finished today all main representations and corresponding flags. So, i allowed to the user have full control of the rounding mode and amount of digits to output. Ex: transforming numbers as 1.99999965858 to 2.0 etc, truncating the output to a certain amount of digits, padding the ending digits with 0 (If someone needs this for alignment) etc. It works for scientific and not scientific format.

I´ll finish now the cases of negative exponents smaller than 18 (Ex: 1.5e-16 etc), so it can be represented as 0.00000000000000015 etc. And then, i´ll clean up the code and port it to masm.

Later i´ll adapt a function i created to mimic the behavior of ws_printf (and printf) to allow input things like: printf("%f", myFloat) Also allowing the user to have full control on the output.

A general function that converts FPU to string is helpful for a wide range of things people can create, since representing the values on controls (edit controls, static controls etc) or using it in huge tables and so on. I believe a single and general function is easier than forcing people to create their own FPU conversion routines for each needs.

NoCforMe · May 28, 2025, 09:27:10 AM

Quote from: guga on May 28, 2025, 08:35:56 AM
Quote from: NoCforMe on May 28, 2025, 05:01:37 AM
Quote from: guga on May 27, 2025, 06:51:53 PMHi Daydreamer

Tks I didn´t knew that problem in x64 related to bcd opcodes. I thought it could be done easily. When i ported this BCD routine to SSE2 for x86, it was a hell, because i wasn´t being able to reverse the order of all 18 to 20 ascii chars at once (i could use pshufb, but for now, i can´t try using SSE4 opcodes yet).

This is all very interesting, but tell us this:
Who the hell even uses BCD anymore?
I'm pretty sure that format went the way of the dinosaurs with "big iron" IBM mainframes and such.
Do you know of even one instance of someone actually using that nowadays?
Sheesh, the amount of effort that people expend here on things that nobody^[1] is never going to use ...

[1] For certain values of "nobody".
The bcd part of the code is necessary to unpack the values stored in FBSTP opcode.

Yes, but again, who is going to use FBSTP? That would indicate someone who is doing computations in BCD; who does that nowadays?

The only use case I can think of is someone either running a COBOL program or using a COBOL emulator, which might use BCD as a numeric storage format.

NoCforMe · May 28, 2025, 09:36:19 AM

... unless you plan on advertising your FPU functions as "capable of handling all FPU data types" ...

guga · May 28, 2025, 10:06:06 AM

Take a look at Raymond´s FpuFLtoA here. I don't know other faster method to convert these values (which are packed) stored in a TenByte without using his method (which i adapted to work with SSE2 only on this specific part of the code).

Quote... unless you plan on advertising your FPU functions as "capable of handling all FPU data types" ...

I did it already. It uses all FPU types and exports the error messages according (Expect for those +0 or -0, which are useless, IMHO. After all all it represents are a type of +INF and -INF - Already existent on the function). The idea is make a general function simple to use as possible, and yet able to output all types of messages (and values) needed to work with FPU.

The MASM Forum

News:

FputoString format

NoCforMe

guga

NoCforMe

guga

daydreamer

guga

daydreamer

guga

NoCforMe

FORTRANS

NoCforMe

guga

NoCforMe

NoCforMe

guga