SIMD Real4 to ASCII string routine

Siekmanski · September 29, 2018, 12:52:19 PM

Hi Rui,
You are right, this is not what we want. :(
The only logical thing I can think of right now is that the 8 digit calculation is not enough to cover the 32bit floating-point rounding phenomena.
Have to think this all over, suggestions are welcome of course.

RuiLoureiro · September 30, 2018, 04:01:08 AM

Quote from: Siekmanski on September 29, 2018, 12:52:19 PM
Hi Rui,
You are right, this is not what we want. :(
The only logical thing I can think of right now is that the 8 digit calculation is not enough to cover the 32bit floating-point rounding phenomena.
Have to think this all over, suggestions are welcome of course.

Hi Siekmanski,
It seems that you need to study the problem or you may try another algo to get the digits. It seems that there is a problem when it prints the string in the scientific format or it doesnt do it. When we multiply -12345.678 by -123456.78 it gives 000000005 but the result is 1.5241577E+9 (last digit rounded). So i think you need time and a little bit of work. Try another algo. :t

Siekmanski · September 30, 2018, 04:42:44 AM

Tomorrow i'll try to do something else, my purpuse was to write it in SSE only with 32bit float calculations. ( max 8 digits )
1.5241577E+7 fits inside a real4, but 1.5241577E+9 doesn't.

AFAIK the largest number that fits inside a real4 is 16777215 (24bit), maybe I'm wrong here?
If so I need to do calculations for more than 8 digits.

RuiLoureiro · September 30, 2018, 04:59:58 AM

Quote from: Siekmanski on September 30, 2018, 04:42:44 AM
Tomorrow i'll try to do something else, my purpuse was to write it in SSE only with 32bit float calculations. ( max 8 digits )
1.5241577E+7 fits inside a real4, but 1.5241577E+9 doesn't.

AFAIK the largest number that fits inside a real4 is 16777215 (24bit), maybe I'm wrong here?
If so I need to do calculations for more than 8 digits.

You are not right, in real4 the exponent goes from -38 to +38. So the converter should show some numbers d.xxxxxxE-38 to d.xxxxxxE+38 (see simplyFPU). So if it shows 1.5241577E+9 it is far from the limit. You dont need to do calculations for more than 8 digits but you need to decode the exponent part. It seems that you dont do it.

Siekmanski · October 01, 2018, 09:10:04 AM

Thanks Rui,

Now I know what to do. :t

nidud · October 01, 2018, 09:27:33 AM

deleted

Siekmanski · October 01, 2018, 10:01:11 AM

Thanks nidud, :t

My mistake was that I thought there would be no more than 8 digits in the largest value.
I misunderstood the real4 format.

Thanks to Rui I'm a bit wiser now.

So far I tested this in masm to see what the largest real4 value would be:

masm real4 3.40282356E+38 maximum input for a real4 value
sprintf 340282306073709650000000000000000000000.000000 39:6 digits this is the result from sprintf
sprintf 3.40282e+038 scientific notation

The maximum possible digits before the floating-point is 39, from which the first most significant 7 digits are reliable values, the rest is just garbage but need to be counted as digits to present the number and the rest are just zeros.
sprintf, prints the first 7-8 digits, then followed by 9 or 10 garbage numbers, the rest are zeros.

If the number fits as a whole in 8 digits, i'll print it as such else, I will print it as scientific notation with 8 digits.

RuiLoureiro · October 02, 2018, 03:32:44 AM

Hi Siekmanski,
>> If the number fits as a whole in 8 digits, i'll print it as such else, I will print it as scientific notation with 8 digits. (which means 7 decimal places)

Very well, seems to be a good decision (we dont need to see garbage) :t

Siekmanski · October 12, 2018, 07:57:32 AM

In the previous sources I calculated with 8 digits which causes the occasional rounding errors.
And I didn't had enough knowledge of the internal workings of the floating point format.

In this new routine 7 digits are used for the calculations and does the job without errors ( so far as I have tested it, no errors occurred ).
And it now covers the whole range -1.175494E-38 to 3.402823E+38

I'm still not happy with the speed of the maximum digits count routine.
It now uses a fast Log10(x)+1 approximation routine but, it needs a few checks to get the exact number of digits from the float.
For now it only prints in scientific notation but, it's a fast one and without memory swaps to insert the decimal point to construct the string.
The decimal point is now integrated into the ascii converter constant.

I'll continue and try to write the fastest possible float to ascii routine.
I still have another idea to write a maximum digits count routine in a totally other way and hope it will be faster than the Log10(x)+1 approach.
Next week I'll start coding it and will see if it is faster or not.
Will be continued.

I have posted the fully commented source code in the first post. http://masm32.com/board/index.php?topic=7441.msg81351#msg81351

Code Select

SIMD Real4 to ASCII conversion by Siekmanski 2018.

1000000 calls per Run for the Cycle counter and the Routine timer.

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz

 Routine timers starting now....

Real4_2_ASCII Cycles: 69 RoutineTime: 0.022400193 seconds
sprintf       Cycles: 1955 RoutineTime: 0.600757429 seconds

Result Real4_2_ASCII:  1.234567e+14
Result sprintf      : 1.234567e+014

Press any key to continue...

mabdelouahab · October 12, 2018, 09:48:43 AM

Code Select


SIMD Real4 to ASCII conversion by Siekmanski 2018.

1000000 calls per Run for the Cycle counter and the Routine timer.

Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz

 Routine timers starting now....

Real4_2_ASCII Cycles: 61 RoutineTime: 0.028886304 seconds
sprintf       Cycles: 1866 RoutineTime: 3.438464466 seconds

Result Real4_2_ASCII:  1.234567e+14
Result sprintf      : 1.234567e+014

Press any key to continue...

HSE · October 12, 2018, 10:17:54 AM

Code Select


AMD A6-3500 APU with Radeon(tm) HD Graphics

Real4_2_ASCII Cycles: 98 RoutineTime: 0.049566970 seconds
sprintf       Cycles: 2630 RoutineTime: 1.251230629 seconds

:t

Siekmanski · October 12, 2018, 10:54:45 AM

Rui found a typo in the floating-point exceptions list for the +Infinity message.

change 0FF800000h to 07F800000h
Should be:

cmp eax,07F800000h
je message_Inf

HSE · October 12, 2018, 11:09:41 AM

More easy to read that lines:

Code Select


    ; check floating-point exceptions
    check macro value, message
        cmp         eax, &value
        je          &message
    endm    

    check 0FFFFFFFFh, message_QnegNaN
    check 0FFC00001h, message_QnegNaN
    check 0FFBFFFFFh, message_SnegNaN
    check 0FF800001h, message_SnegNaN
    check 0FFC00000h, message_Indeterm
    check 0FF800000h, message_NegInf
    check 0FF7FFFFFh, message_NegNorm
    check 07F7FFFFFh, message_Norm
    check 07F800000h, message_Inf
    check 07F800001h, message_SNaN
    check 07FBFFFFFh, message_SNaN
    check 07FC00000h, message_QNaN
    check 07FFFFFFFh, message_QNaN

Siekmanski · October 12, 2018, 11:16:10 AM

Quote from: HSE on October 12, 2018, 11:09:41 AM
More easy to read that lines:
Code Select Expand
; check floating-point exceptions check macro value, message cmp eax, &value je &message endm check 0FFFFFFFFh, message_QnegNaN check 0FFC00001h, message_QnegNaN check 0FFBFFFFFh, message_SnegNaN check 0FF800001h, message_SnegNaN check 0FFC00000h, message_Indeterm check 0FF800000h, message_NegInf check 0FF7FFFFFh, message_NegNorm check 07F7FFFFFh, message_Norm check 07F800000h, message_Inf check 07F800001h, message_SNaN check 07FBFFFFFh, message_SNaN check 07FC00000h, message_QNaN check 07FFFFFFFh, message_QNaN

:t

FORTRANS · October 13, 2018, 01:09:16 AM

Hi.

Code Select

= = =
Redirect to file.

SIMD Real4 to ASCII conversion by Siekmanski 2018.

1000000 calls per Run for the Cycle counter and the Routine timer.

Intel(R) Pentium(R) M processor 1.70GHz

 Routine timers starting now....

Real4_2_ASCII Cycles: 106 RoutineTime: 0.082408239 seconds
sprintf       Cycles: 4925 RoutineTime: 3.293436177 seconds

Result Real4_2_ASCII:  1.234567e+14
Result sprintf      : 1.234567e+014

Press any key to continue...
= = =
Screen Capture
F:\TEMP\TEST>REAL4_2_.EXE

SIMD Real4 to ASCII conversion by Siekmanski 2018.

1000000 calls per Run for the Cycle counter and the Routine timer.

Intel(R) Pentium(R) M processor 1.70GHz

 Routine timers starting now....

Real4_2_ASCII Cycles: 107 RoutineTime: 0.066058396 seconds
sprintf       Cycles: 4941 RoutineTime: 2.925534111 seconds

Result Real4_2_ASCII:  1.234567e+14
Result sprintf      : 1.234567e+014

Press any key to continue...
= = =

SIMD Real4 to ASCII conversion by Siekmanski 2018.

1000000 calls per Run for the Cycle counter and the Routine timer.

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz

 Routine timers starting now....

Real4_2_ASCII Cycles: 68 RoutineTime: 0.043186625 seconds
sprintf       Cycles: 2923 RoutineTime: 1.514711201 seconds

Result Real4_2_ASCII:  1.234567e+14
Result sprintf      : 1.234567e+014

Press any key to continue...

Some timing difference between a redirect results to a file
and a screen capture of the results.

HTH,

Steve N.

The MASM Forum

News:

SIMD Real4 to ASCII string routine

Siekmanski

RuiLoureiro

Siekmanski

RuiLoureiro

Siekmanski

nidud

Siekmanski

RuiLoureiro

Siekmanski

mabdelouahab

HSE

Siekmanski

HSE

Siekmanski

FORTRANS