The MASM Forum
General => The Laboratory => Topic started by: hutch on September 05, 2018, 02:14:34 PM

I am testing the precision of the 64 bit MSVCRT version of sprintf. The default precision is 6 numbers after the decimal point but I have stretched it as far as I can get it on Win10 to about 11. I need the test done on Win7 64 or later to see if MSVCRT has remained the same over those OS versions. I would post the code but there are a number of things that have not been published yet.
This is the result I am getting on Win10 Professional.
//////////////////////
sprintf precision test
\\\\\\\\\\\\\\\\\\\\\\
Input = 123456.1234567890123456
Output = 123456.12345678901
Press any key ....

(https://www.dropbox.com/s/qnb13hsd0vowaf8/xp64bits.png?dl=1)

Win 8.1
//////////////////////
sprintf precision test
\\\\\\\\\\\\\\\\\\\\\\
Input = 123456.1234567890123456
Output = 123456.12345678901
Press any key ....

Same on Win764. It's a double, 16 digits.

I tried to separate the integer and the fractional parts, but in 32 bit assembly the cvtpd2dq and cvtdq2pd instructions convert only 32 bit.
In 64 bit assembly 64 bits are converted.
At the moment I'm not able to run it in 64 bit assembly to test if we gain some resolution, probably not but.... :idea:
Maybe someone can test it in 64 bit assembly?
.data
align 16
Fabsmask qword 7FFFFFFFFFFFFFFFh,7FFFFFFFFFFFFFFFh
float64bit real8 123456.1234567890123456
integer real8 0.0
fraction real8 0.0
szFloatNum db "%0.0f.%0.016f",0
szFloat64 db 64 dup (0)
.code
movsd xmm0,float64bit
movsd xmm2,xmm0
cvtpd2dq xmm1,xmm0 ; get rid of fractional part
cvtdq2pd xmm1,xmm1 ; integer part
subpd xmm2,xmm1 ; subtract integer part from number to get fractional part
andpd xmm2,Fabsmask ; make fractional part absolute by removing sign bit
movsd integer,xmm1
movsd fraction,xmm2
invoke sprintf,addr szFloat64,addr szFloatNum,integer,fraction
invoke strlen,addr szFloat64
lea ecx,szFloat64
movups xmm0,[ecx+eax16]
movups [ecx+eax18],xmm0
mov byte ptr [ecx+eax2],0
; print the szFloat64 string for the result.
; result with 32 bit assembly: 123456.1234567890060134

Marinus,
Like this ?
123456.1234567890060134
Press any key to continue...
I just plugged in your code into a 64 bit test piece, changed the 32 bit registers to 64 bit and used a 64 bit version of StrLen. Works fine.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.data
align 16
Fabsmask qword 7FFFFFFFFFFFFFFFh,7FFFFFFFFFFFFFFFh
float64bit real8 123456.1234567890123456
integer real8 0.0
fraction real8 0.0
szFloatNum db "%0.0f.%0.016f",0
szFloat64 db 64 dup (0)
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
movsd xmm0,float64bit
movsd xmm2,xmm0
cvtpd2dq xmm1,xmm0 ; get rid of fractional part
cvtdq2pd xmm1,xmm1 ; integer part
subpd xmm2,xmm1 ; subtract integer part from number to get fractional part
andpd xmm2,Fabsmask ; make fractional part absolute by removing sign bit
movsd integer,xmm1
movsd fraction,xmm2
invoke vc_sprintf,addr szFloat64,addr szFloatNum,integer,fraction
invoke StrLen,addr szFloat64
lea rcx,szFloat64
movups xmm0,[rcx+rax16]
movups [rcx+rax18],xmm0
mov byte ptr [rcx+rax2],0
conout ptr$(szFloat64),lf
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end

Integer version.
mov rax,123456.1234567890123456
movq xmm0,rax
cvtpd2dq xmm1,xmm0
movq rdx,xmm1
cvtdq2pd xmm1,xmm1
subpd xmm0,xmm1
mov rax,10000000000.0
movq xmm1,rax
mulpd xmm0,xmm1
cvtpd2dq xmm0,xmm0
movq r8,xmm0
printf("%I64d.%I64d\n", rdx, r8)
123456.1234567890
pcmpeqw xmm2,xmm2 ; Fabsmask
psrlq xmm2,1

xx\simd3>asmc c simd3.asm
Asmc Macro Assembler Version 2.28.04
Copyright (C) The Asmc Contributors. All Rights Reserved.
Assembling: simd3.asm
***********
ASCII build
***********
simd3.asm(47) : error A2070: invalid instruction operands
"andpd xmm2,Fabsmask ; make fractional part absolute by removing sign bit"
·······················································································································
xx\simd3>ml14 c simd3.asm
Microsoft (R) Macro Assembler Version 14.00.23026.0
Copyright (C) Microsoft Corporation. All rights reserved.
Assembling: simd3.asm
***********
ASCII build
***********
xx\simd3>link simd3.obj /subsystem:console
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 19921998. All rights reserved.
xx\simd3>simd3
Hello Siemanski
123456.1234567890060134
Press any key to continue ...

Thanks.
Useless attempt, no gain in resolution. :biggrin:

My previous post don't say it's 32bit, just showing AsmC error

Hi HSE,
maybe this works,
andpd xmm2,oword ptr Fabsmask
or
andpd xmm2,xmmword ptr Fabsmask

My result executing marinus.exe:
123456.1234567890060134

Hi Siemanski!
Also work:
Fabsmask xmmword 7FFFFFFFFFFFFFFF7FFFFFFFFFFFFFFFh (idem oword)
But where is MASM compatibility?

But where is MASM compatibility?
What do you mean exactly?

I don't see any way of going there with SSE instructions.
Playing with FPU instructions we can obtain something like 123456.1234567890123444 which is closer.

What do you mean exactly?
If code that work with ML don't work with AsmC,
then AsmC is not compatible with ML

include \masm32\MasmBasic\MasmBasic.inc ; download
.data
MyR10 REAL10 1234567890.1234567890123
Init
Inkey Str$("This is the highest resolution possible without a bignum library:\n%Jf", MyR10)
EndOfCode
This is the highest resolution possible without a bignum library:
1234567890.123456789

123456.1234567890060134
Press any key to continue...

What do you mean exactly?
If code that work with ML don't work with AsmC,
then AsmC is not compatible with ML
That seems to be correct. New version uploaded.
Test case (https://github.com/nidud/asmc/blob/master/source/asmc/regress/src/bin/andpd.asm):
;
; v2.28.05  HSE
;
.x64
.model flat
.data
m64 label qword
.code
andpd xmm0,m64
andnpd xmm0,m64
xorpd xmm0,m64
orpd xmm0,m64
end

New version uploaded.
Thanks Nidud :t

I think I know what Marinus was trying to do, split the input data between the lead integer and the decimal fraction and while you cannot extract more out of 64 bit maths in terms of precision, split into 2 components you may be able to handle each one separately to get a 64 x 64 precision level.
ALSO : Thanks to those folks who fed back the results on the sprints test piece, it means sprintf is safe to use from XP64 upwards.

I think I know what Marinus was trying to do, split the input data between the lead integer and the decimal fraction and while you cannot extract more out of 64 bit maths in terms of precision, split into 2 components you may be able to handle each one separately to get a 64 x 64 precision level.
You are correct, but I was wrong in assuming it could work that way.

split into 2 components you may be able to handle each one separately to get a 64 x 64 precision level.
NOT UNLESS you use some bignum library.
As far as I know, MASM can only handle the equivalent of 64 bits with REAL10, AND that would hold only if you modify the control word for precision in your program (finit does that); otherwise the default precision under Windows is 53 bits for floats.
If you declare a float in the data section with more than the maximum usable number of digits for a REAL10, the surplus ones are simply disregarded in the resulting binary equivalent which is stored.
If you try to split a float into its integer and fractional components as two separate entries in your data and later attempt to recombine them as an entity, any surplus 'digit' would again get dropped out, negating any intended gain in accuracy.

Working with REAL10 we have 19 significant digits.
So, is not possible to print 123456.1234567890123456 even using the trick of subtracting the integer part from the whole number and print in 2 parts.
It is however possible to print: 123.1234567890123456 (eventually also 1234.1234567890123456, but is not guaranteed and we have to confirm).

You are only ever going to get the number of digits that the data size handles at its maximum. My own view is that in most instances, REAL4 or REAL8 do the job and while there are application for REAL10 or big number libraries but unless you are plotting points in other galaxies, the extra precision is not much use to you. The "sprintf" CRT function I have used has a number of flags for truncation or reduced precision levels, the alternative is to truncate the decimal fraction to an arbitrary count depending one what is required.
With the distinction between floating point string display and the actual floating point value, you can simply truncate the string to whatever is needed in the application but retain the actual fp value in the variable it is stored in. This would not be all that hard to code as you allocate the memory for the string plus 8 bytes, write the fp value at the start of the memory and write the string after it.

The windows runtime does not provide any function to print an 80bit floating point value (Delphi and Free Pascal have functions for that). We can do it with ASM, but as you said it is not needed at all. If we really need a huge amount of decimals, for example to calculate the value of PI with 1 billion decimals we will need to resort to a Big Number library which will do it in software.

in most instances, REAL4 or REAL8 do the job
Right. My Str$() loads whatever size on the FPU and spits out variable precision up to REAL10, simply because a single routine can handle all sizes, and the performance penalty REAL10 vs REAL4 is negligible (if any). But of course, if you want to determine the "exact" position of a pixel on the screen, then REAL4 is more than enough. GdiPlus takes integer and REAL4 args but never REAL8.
OTOH "in assembly, we can print 19 digits" can be an argument when discussing advantages over the CRT ;)

Integer version.
mov rax,123456.1234567890123456
movq xmm0,rax
cvtpd2dq xmm1,xmm0
movq rdx,xmm1
cvtdq2pd xmm1,xmm1
subpd xmm0,xmm1
mov rax,10000000000.0
movq xmm1,rax
mulpd xmm0,xmm1
cvtpd2dq xmm0,xmm0
movq r8,xmm0
Using cvtpd2dq and cvtdq2pd will fail here given the result is a 32bit integer value. The correct version should be something this:
DoubleToInt proc val:real8
cvtsd2si rdx,xmm0
cvtsi2sd xmm1,rdx
subsd xmm0,xmm1
mov rcx,10000000000000000.0
movq xmm1,rcx
mulsd xmm0,xmm1
cvtsd2si rax,xmm0
ret
DoubleToInt endp
main proc
DoubleToInt(9007199254740991.0)
printf("%I64d.%I64d\n", rdx, rax)
DoubleToInt(1.2345678901234567)
printf("%I64d.%I64d\n", rdx, rax)
DoubleToInt(0.1234567890123456)
printf("%I64d.%I64d\n", rdx, rax)
DoubleToInt(123456.1234567890123456)
printf("%I64d.%I64d\n", rdx, rax)
xor eax,eax
ret
main endp
9007199254740991.0
1.2345678901234567
0.1234567890123456
123456.1234567890060134

DoubleToInt proc val:real8
DoubleToInt(123456.1234567890123456)
> 123456.1234567890060134
This proves the point I tried to make in my previous post. The data supplied as input for a REAL8 was cropped after the initial 16 significant digits in the decimal system. Trying to recover the cropped digits in print only resulted in garbage being displayed (060134) for those dropped digits (123456), certainly not improving any precision and possibly misleading the recipient.
But, as programmers, let's not mix "apples and oranges" and forget about mathematical principles. The precision limit of a real8 is stated as having a maximum of 16 significant digits in the decimal system, which means the total number of the digits in both the integer and fractional portions.
For example, 5000000000000000000.10203 would have 23 significant digits,
while, 0.00000000000000000010203 would have only 5 significant digits.

The problem here was to fit the value into a 64bit register. The total storage is 53bit but fractions may extend way beyond 64 as your last sample shows. However, this may also be converted:
cvtsd2si rdx,xmm0
cvtsi2sd xmm1,rdx
subsd xmm0,xmm1
mov rcx,10000000000000000.0
movq xmm1,rcx
mulsd xmm0,xmm1
cvtsd2si rax,xmm0
mulsd xmm0,xmm1
cvtsd2si r9,xmm0
ret
DoubleToInt(0.00000000000000000010203)
printf("%I64d.%016I64d%016I64d\n", rdx, rax, r9)
0.00000000000000000010203000000000
The former value need some help thought.
.data
q real16 5000000000000000000.10203
...
printf("%llf\n", q)
5000000000000000000.102030

Hi Raymond,
As far as I understand correctly, the maximum number of digits:
real4 7 digits (32/41)
real8 15 digits (64/41)
real10 19 digits (80/41)
Is this correct?

Not really, no, but in order to be converted to a 64bit integer value the number has to be <= to 64bit, else the return value will be cropped to zero/maxmin int.
DoubleToInt(1000000000000000000000000000000000000000000000000000000.0)
printf("%I64d.%I64d\n", rdx, rax)
DoubleToInt(1.0E100)
printf("%I64d.%I64d\n", rdx, rax)
9223372036854775808.9223372036854775808
9223372036854775808.9223372036854775808

Hi Raymond,
As far as I understand correctly, the maximum number of digits:
real4 7 digits (32/41)
real8 15 digits (64/41)
real10 19 digits (80/41)
Is this correct?
Interesting rule of thumb, but is not based on the number of available decimal digits according to the IEEE format.
These are: 64, 52 and 23 bits for REAL10, REAL8 and REAL4 respectively
My calculation is the following (who knows better please contradict):
X=(ln 10)/(ln 2)=3.3219280948873623478703194294894
64/X = 19.26 => 19/20
52/X 15.65 => 15/16
23/X = 6.92 => 6/7
So, to be on the safe side, the number of guaranteed significant digits is: 19, 15, 6 respectively for REAl10, REAL8 and REAL4.

From MSDN,
The mantissa is stored as a binary fraction greater than or equal to 1 and less than 2. For types float and double, there is an implied leading 1 in the mantissa in the mostsignificant bit position, so the mantissas are actually 24 and 53 bits long, respectively, even though the mostsignificant bit is never stored in memory.
https://msdn.microsoft.com/enus/library/hd7199ke.aspx

so the mantissas are actually 24 and 53 bits long
Test it... note the last digits in the range 8.xx and above (53/3.321928095=15.95 digits):
include \masm32\include\masm32rt.inc
.data
TmpR8 REAL8 ?
.code
start:
xor ebx, ebx ; set two nonvolatile
finit
fld1
fld FP10(1.2345678901234567890)
.Repeat
fld st
fstp TmpR8
printf("%.17f\n", TmpR8)
fadd ST, ST(1)
inc ebx
.Until ebx>=8
.Repeat
fld st
fstp TmpR8
printf("%.17f\n", TmpR8)
fsub ST, ST(1)
dec ebx
.Until Sign?
inkey
exit
end start

Same for real16, 113/112bit. The bit is always set unless the value is zero.

Hi Raymond,
As far as I understand correctly, the maximum number of digits:
real4 7 digits (32/41)
real8 15 digits (64/41)
real10 19 digits (80/41)
Is this correct?
I generally agree with those limits, except that I consider that the range for the real8 to be 16 significant digits for all intents and purposes.
Same for real16, 113/112bit. The bit is always set unless the value is zero.
I'm not familiar with the real16. However, for real4 and real8, the above statement is not 100% correct. It should read:
"The bit is always set unless the biased exponent is zero." i.e. the value of the real number is either 0 or that of a 'denormalized real number'. For those following this thread but not entirely familiar with this subject, some more info on 'denormalized real numbers' is available at
http://www.ray.masmcode.com/tutorial/fpuchap2.htm#denormal

Hi Ray,
I well understand what can be fitted into a REAL8 variable but when defining a .data section entry, the assembler simply truncates values that are larger than REAL8 so I can routinely do things like this.
.data
align 16
pi REAL8 3.14159265358979323846
A large number library is probably the only way to do really big numbers, REAL10 gets some extra precision but probably not enough for some of the scientific folk.

that I consider that the range for the real8 to be 16 significant digits for all intents and purposes.
Simple pure Masm32 test, using REAL8 precision with CRT printf (source & exe attached):
include \masm32\include\masm32rt.inc
.data
TmpR8 REAL8 ?
.code
start:
xor ebx, ebx ; set two nonvolatile
finit
fld1
fld FP10(1.0000000000000011111111)
print "1.2345678901234567", 13, 10
print "1.0000000000000011111 as REAL10", 13, 10
.Repeat
fld st
fstp TmpR8
printf("%.16f\n", TmpR8)
fadd ST, ST(1)
inc ebx
.Until ebx>=8
.Repeat
fld st
fstp TmpR8
printf("%.16f\n", TmpR8)
fsub ST, ST(1)
dec ebx
.Until Sign?
inkey "0.0000000000000011111 as REAL10 on exit"
exit
end start
Output:
1.2345678901234567
1.0000000000000011111 as REAL10
1.0000000000000011 +0
2.0000000000000013 +2 at pos 17
3.0000000000000013 +2
4.0000000000000009 2
5.0000000000000009
6.0000000000000009
7.0000000000000009
8.0000000000000018 +7
9.0000000000000018 +7 <<<<< max deviation
8.0000000000000018 +7
7.0000000000000009
6.0000000000000009
5.0000000000000009
4.0000000000000009 2
3.0000000000000013 +2
2.0000000000000013
1.0000000000000011 +0
0.0000000000000011111 as REAL10 on exit
It is the middle range where the 15.95 digits show a small distortion.

I had to laugh, working with formulas involves stuff I learnt in primary school over 60 years ago. Back then before the digital era it was taught as fractions and little has changed apart from it being taught in digital format.

If the accuracy of that 16th digit with the real8 becomes sooooooooo important, the use of a real10 should then be investigated!!!!
The accuracy of a 17th digit with real8 is definitely not probable.

1.2345678901234567
1.0000000000000011111 as REAL10
1.000000000000001
2.000000000000001
3.000000000000001
4.000000000000001
5.000000000000001
6.000000000000001
7.000000000000001
8.000000000000002
9.000000000000002
8.000000000000002
7.000000000000001
6.000000000000001
5.000000000000001
4.000000000000001
3.000000000000001
2.000000000000001
1.000000000000001
0.0000000000000011111 as REAL10 on exit

I am missing something for sure but when Raymond says "I generally agree with those limits, except that I consider that the range for the real8 to be 16 significant digits for all intents and purposes." and I look at the JJ output where almost half the values have 15 significant digits, can I conclude that it is really bad luck for a calculated 15.95 value?

The size of the mantissa is in bits and digits have a different radix. The same logic for a byte. Three digits may fit if the value is less than 256.

almost half the values have 15 significant digits
When using rounding, only 3 out of 17 are off by one at the 16th digit  see Marinus' output.

There's a parser here (http://masm32.com/board/index.php?topic=6454.msg69196#msg69196) that calculate the number of digits based on the size of the mantissa. The function is not used anymore but should (I think) work for REAL32 as well.
mov radix,10 ; assume desimal
mov eax,bits ; 53bit > 17
mov ecx,eax ; 64bit > 22
mov edx,eax ;113bit > 39
REAL10 is the only one which stores the high bit and the mantissa is 64bit + 15bit exponent and a sign bit. The mantissa are shifted left so the value 1.0 are stored as 1 << 63:
0x0000000000000001
0x8000000000000000
The exponent bias for REAL10 and REAL16 is 0x3FFF.
0x3FFF8000000000000000 ; 1.0  REAL10
0xBFFF8000000000000000 ;1.0  REAL10
0x40008000000000000000 ; 2.0  REAL10
112bit mantissa + one "hidden bit"
0x3FFF0000000000000000000000000000 ; 1.0  REAL16
0xBFFF0000000000000000000000000000 ;1.0  REAL16
0x40000000000000000000000000000000 ; 2.0  REAL16
A simple conversion from REAL16 to REAL10:
movq rax,xmm0 ; low 64bit
shufps xmm0,xmm0,01001110B
movq rdx,xmm0 ; high 64bit
shld rcx,rdx,16
mov [rdi+8],cx
shld rdx,rax,16 ; add bit 63
and ecx,0x7FFF ;  remove sign
neg cx ;  set carry if not zero
rcr rdx,1
mov [rdi],rdx ; store mantissa
REAL10 to REAL16:
xor eax,eax
mov rdx,[rsi]
shl rdx,1
mov cx,[rsi+8]
shrd rax,rdx,16
shrd rdx,rcx,16
movq xmm1,rdx
movq xmm0,rax
shufpd xmm0,xmm1,0

:biggrin:
If you are worried about the final digit count in a FLOAT or DOUBLE, you need a large number library. With the testing I have been doing you do a range of calculations in DOUBLE (REAL8) then reverse them to see how much difference (if any) and in most instances you end up with the same number. There are of course applications for very large numbers, physics, astronomy and even location on the planet in terms of latitude and longitude if real accuracy is required but the vast number of tasks are overkilled with DOUBLE. A 32 bit float is not all that useful for engineering and similar calculations but with the speed of floating point units in modern hardware, graphics are viable in 32 bit floating point.

If you are worried about the final digit count in a FLOAT or DOUBLE, you need a large number library.
I beg to differ from that.
 If you are worried about the final digit count in a FLOAT (real4), you use a DOUBLE which has significantly better accuracy than the FLOAT.
 If you are worried about the final digit count in a DOUBLE (real8), you use an extended DOUBLE which has a somewhat better accuracy than the DOUBLE.
 Only if you are worried about the final digit count in an extended DOUBLE (real10) would you need to use a large number library (or some other hardware which can process real16s).
As a scientist, I would really need concrete examples where applied sciences could need accuracy requiring bignum libraries. The only field I could imagine would be theoretical mathematics which may have very little application in real life.

Hmmmm,
Astronomy, precision navigation to exact locations, calculations related to travelling up near the speed of light, some branches of physics etc .... anything that REAL10 does not have a high enough level of precision for.

When we talk about Significant Digits we talk about precision not accuracy. The IEEE standard defines four different precisions: single, double, singleextended, and doubleextended.
For a quick about the differences between precision and accuracy (https://www.youtube.com/watch?v=hRAFPdDppzs).
We need High Precision Floating Point libraries also to escape from accumulated errors when we do zillions of operations on a data set. Although we know that the pyramids were done without any electronic calculator or even a slide rule and that for playing a DirectX game a Real4 is enough, cutting edge physics, astronomy and other fields will not do with the precision levels of the IEEE standards.

IEEE 7541985 was replaced by IEEE 7542008. This includes among other things a 16bit half precision and a 128bit quad precision.
The IEEE 854 (the radix independent floatingpoint standard) was withdrawn in 2008. The Babylonians and Egyptians used a radix of 60 to calculate angles, so a third of an hour is 20 minutes.
The most obvious usage for larger numbers is to verify and test smaller numbers for errors in a creation process.

The 32, 64 and 80 bits IEEE standards are built into the FPU and is better to conform than invent our own. Above that let's see, no need to rush and pay $99.00 to read IEEE 7542008.
I personally like the MPFR (https://www.mpfr.org/) (*) library  in their own words, it copies the good ideas from the ANSI/IEEE754, namely for correct rounding and exceptions. It has no flying limits  you choose the precision you want.
* I have ported MPFR 3.15 to Windows (not to the Cygwin or Mingw crap) and is in my website. The MPFR guys do not mention it because it is related to a dispute over, guess what:
The ZIP file containing MPIR and MPFR binaries that you are
distributing does not meet the license requirements for GPL/LGPL
software because it does not contain any license text and it does not
indicate where the source code for MPIR and MPFR and any necessary
build files can be obtained.
It is NOT sufficient to only update your web page.
I would be most grateful if you would either update the ZIP file to
provided these details or, iIf you are not able to do this, stop
distributing it.
Well, I actually mention it but the guy has not seen it or wanted a spotlight turned to it. These GNU guys are like this.

There is (as nidud has already noted ) a relatively new 16 bit floatingpoint format.
sign bit: 1
exponent: 5 bits
mantissa: 10 bits
This format is used in several computer graphics environments. ( such as OpenGl and DirectX )
The advantage over 32bit singleprecision binary formats is that it requires half the storage and bandwidth (at the expense of precision and range).
F16C instruction set,
There are variants that convert four floatingpoint values in an XMM register or 8 floatingpoint values in a YMM register.
The instructions are abbreviations for "vector convert packed half to packed single" and vice versa:
VCVTPH2PS xmmreg,xmmrm64 – convert four halfprecision floating point values in memory or the bottom half of an XMM register to four singleprecision floatingpoint values in an XMM register.
VCVTPH2PS ymmreg,xmmrm128 – convert eight halfprecision floating point values in memory or an XMM register (the bottom half of a YMM register) to eight singleprecision floatingpoint values in a YMM register.
VCVTPS2PH xmmrm64,xmmreg,imm8 – convert four singleprecision floating point values in an XMM register to halfprecision floatingpoint values in memory or the bottom half an XMM register.
VCVTPS2PH xmmrm128,ymmreg,imm8 – convert eight singleprecision floating point values in a YMM register to halfprecision floatingpoint values in memory or an XMM register.
The 8bit immediate argument to VCVTPS2PH selects the rounding mode. Values 0–4 select nearest, down, up, truncate, and the mode set in MXCSR.RC.
Support for these instructions is indicated by bit 29 of ECX after CPUID with EAX=1.
X=(ln 10)/(ln 2)=3.3219280948873623478703194294894
64/X = 19.26 => 19/20
52/X 15.65 => 15/16
23/X = 6.92 => 6/7
Your X is in fact the reciprocal of Log10(2.0) > 1.0 / 0.30102999566398119521373889472449
Simplified calculation,
X=0.30102999566398119521373889472449
64*X = 19.26
52*X = 15.65
23*X = 6.92

Hmmmm,
Astronomy, precision navigation to exact locations, calculations related to travelling up near the speed of light, some branches of physics etc .... anything that REAL10 does not have a high enough level of precision for.
 Precision navigation to exact locations
The circumference of the earth is generally considered to be 25000 miles. Assuming that such a figure is perfectly exact, it would be approximately equal to 40233600000 millimeters. This means that the location or the distance between any two points on the earth's surface can be computed to within 1 mm of precision with only 11 significant digits. What could be the need for higher precision?
One basic math principle is that the accuracy of any computation cannot be anymore accurate than the least accurate component used for the computation. For example, if only the first 4 digits of the value used above for the circumference of the earth are accurate (i.e. the circumference would be +/ 10 miles), any computation using it should not be reported with more than 4 significant digits; even if it would be obtained with a precision of 7 or more significant digits, only the first 4 may be accurate and any additional ones would only distort the real accuracy of the result.
The speed of light is given in the literature with 9 significant digits (299792458 m/s) with a measurement uncertainty of 4 parts per billion. If, for example, you wanted to convert that constant to ft/s with the same accuracy, you must then use the ratio of feet/meter with a precision of 9 significant digits.
This precision/accuracy debate always reminds me of a detail when I was working. A document from a U.S. association needed to be modified to include information in the metric system along those of the U.S. system. One item pertained to taking a quart sample of water for analysis. Believe it or not, this had been converted into a 0,95 liter sample!!!!

Ray,
I think you are reading this the wrong way, successive levels of precision have always had their place but you will see over time that increased levels of precision keep being needed. Long ago what we call integers were simple enough up to 10 fingers, later we had roman numerals, later still we had arabic numerals and over time fractions. converting this to decimal is handy as it currently suits digital hardware but the range limits start to be a problem in at least some tasks.
The example that Jose posted with the landing area on Mars is a clear example of the need for higher precision. Travelling at very high speed (light) leaves room for tiny percentages of a calculation being light years out and again it depends on the level of precision required. Then there are things done at the atomic level where you don't need accumulated errors creep in. You may not need the precision to calculate the interest on investments but someone will.
A well known programmer who was involved in the creation of Microsoft once said that you would never need more than 64k of memory. These types of predictions tend to die very quickly, my current work computer has 64 gig of memory and the next one will probably need much more.
Just as an example, the best angle measuring tool I own divides degrees into minutes (A Brown & Sharpe vernier protractor) but in most instances a school kid's approximate degree protractor does the job, it purely depends on the task.

NASA has a dedicated site: Space Math Problems Sorted by NASA Mission and Program (https://spacemath.gsfc.nasa.gov/mission.html). Maybe somebody can find an example there demonstrating that REAL10 is not enough.

I had a boss who used to claim that black and white Hercules graphics cards would become the standard for professionals  color monitors are distracting and only suitable for games. Fear of change, attachment to the habits and the old things is a symptom of ageing. I can understand that. :t
But on this thread's line lets read David H. Bailey and see what he thinks (https://www.davidhbailey.com/dhbpapers/highprecarith.pdf).

I had a boss who used to claim that black and white Hercules graphics cards would become the standard for professionals  color monitors are distracting and only suitable for games. Fear of change, attachment to the habits and the old things is a symptom of ageing. I can understand that. :t
But on this thread's line lets read David H. Bailey and see what he thinks (https://www.davidhbailey.com/dhbpapers/highprecarith.pdf).
Very interesting article. But, most of the quoted examples generally require bignum libraries. Those examples also indicate that requirements for greater precision is primarily for a micro niche of applications in advanced research; very little of it, if any, is related to common activities.
Most probably less than one in a million programmers would ever have a real need for such precision (and may also prefer using a HLL. :()

There is (as nidud has already noted ) a relatively new 16 bit floatingpoint format.
sign bit: 1
exponent: 5 bits
mantissa: 10 bits
This format is used in several computer graphics environments. ( such as OpenGl and DirectX )
The advantage over 32bit singleprecision binary formats is that it requires half the storage and bandwidth (at the expense of precision and range).
F16C instruction set,
Given hardware support for half precision is now available this should probably be added to the assembler as REAL2.

REAL2 typedef WORD
Visual Studio uses:
typedef uint16_t HALF

It seems we all agree for real2 type definition. :t
I'm using real2 in my code, to be consistent with real4 real8 and real10.
In DirectX:
D3DXFLOAT16 typedef word
//===========================================================================
//
// 16 bit floating point numbers
//
//===========================================================================
#define D3DX_16F_DIG 3 // # of decimal digits of precision
#define D3DX_16F_EPSILON 4.8875809e4f // smallest such that 1.0 + epsilon != 1.0
#define D3DX_16F_MANT_DIG 11 // # of bits in mantissa
#define D3DX_16F_MAX 6.550400e+004 // max value
#define D3DX_16F_MAX_10_EXP 4 // max decimal exponent
#define D3DX_16F_MAX_EXP 15 // max binary exponent
#define D3DX_16F_MIN 6.1035156e5f // min positive value
#define D3DX_16F_MIN_10_EXP (4) // min decimal exponent
#define D3DX_16F_MIN_EXP (14) // min binary exponent
#define D3DX_16F_RADIX 2 // exponent radix
#define D3DX_16F_ROUNDS 1 // addition rounding: near

There is (as nidud has already noted ) a relatively new 16 bit floatingpoint format.
sign bit: 1
exponent: 5 bits
mantissa: 10 bits
This format is used in several computer graphics environments. ( such as OpenGl and DirectX )
The advantage over 32bit singleprecision binary formats is that it requires half the storage and bandwidth (at the expense of precision and range).
isnt 16bit floats old,I already saw it in use many years ago when I had Nvidia's CG+,advice was given to use HALF's to double performance on pixelshaders
so if you dont have cpu that support that,should you use GPU instead?
or some small PROC that expands them to 32bit floats?