Author Topic: Small favour needed, test precision with sprintf.  (Read 13605 times)


  • Member
  • *****
  • Posts: 2370
Re: Small favour needed, test precision with sprintf.
« Reply #60 on: September 14, 2018, 10:09:11 PM »
It seems we all agree for real2 type definition.  :t
I'm using real2 in my code, to be consistent with real4 real8 and real10.

In DirectX:

Code: [Select]
D3DXFLOAT16 typedef word

// 16 bit floating point numbers

#define D3DX_16F_DIG          3                // # of decimal digits of precision
#define D3DX_16F_EPSILON      4.8875809e-4f    // smallest such that 1.0 + epsilon != 1.0
#define D3DX_16F_MANT_DIG     11               // # of bits in mantissa
#define D3DX_16F_MAX          6.550400e+004    // max value
#define D3DX_16F_MAX_10_EXP   4                // max decimal exponent
#define D3DX_16F_MAX_EXP      15               // max binary exponent
#define D3DX_16F_MIN          6.1035156e-5f    // min positive value
#define D3DX_16F_MIN_10_EXP   (-4)             // min decimal exponent
#define D3DX_16F_MIN_EXP      (-14)            // min binary exponent
#define D3DX_16F_RADIX        2                // exponent radix
#define D3DX_16F_ROUNDS       1                // addition rounding: near
Creative coders use backward thinking techniques as a strategy.


  • Member
  • *****
  • Posts: 1746
  • building nextdoor
Re: Small favour needed, test precision with sprintf.
« Reply #61 on: September 14, 2018, 11:54:34 PM »
There is (as nidud has already noted ) a relatively new 16 bit floating-point format.

sign bit: 1
exponent: 5 bits
mantissa: 10 bits

This format is used in several computer graphics environments. ( such as OpenGl and DirectX )
The advantage over 32-bit single-precision binary formats is that it requires half the storage and bandwidth (at the expense of precision and range).
isnt 16bit floats old,I already saw it in use many years ago when I had Nvidia's CG+,advice was given to use HALF's to double performance on pixelshaders
so if you dont have cpu that support that,should you use GPU instead?
or some small PROC that expands them to 32bit floats?
SIMD fan and macro fan
why assembly is fastest is because its switch has no (brakes) breaks
only in 16bit assembly you can get away with "Only words" :P