News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Checking floating point numbers in the 64-bit kernel-mode

Started by Gunther, December 29, 2014, 02:54:41 PM

Previous topic - Next topic

Gunther

For a 64-bit embedded system running Windows 8.1, I've to write a special kernel-mode driver to support PSTricks with several packages. The advantage is that PSTricks supports floating point numbers and the appropriate calculations.

My driver has to check if the floating point value X is a finite value (X can be REAL4, REAL8 or REAL10). Under usual circumstances one would use the fxam instruction, which is described in Raymond's excellent tutorial. But this is impossible:
QuoteThe use of the MMX/x87 registers is strictly prohibited in 64-bit kernel-mode code.
More information can be found here. As a workaround I'm calling isfinite(X) from the libc. But that's awkward and slow, because this macro (in C) or function (in C++) isn't racing car class. Does anyone know another, more elegant and faster solution? Any help is appreciated.

Gunther
You have to know the facts before you can distort them.

qWord

Because Infinity is a NaN-Value, it sufficient to test if all exponent bits are set:
int isfinite_bin32(const float x)
{
   return ((*((const uint32_t *)&x) & 0x7F800000U) != 0x7F800000U) ? 1 : 0;
}

int isfinite_bin64(const double x)
{
   return ((*((const uint64_t *)&x) & 0x7FF0000000000000ULL) != 0x7FF0000000000000ULL)  ? 1 : 0;
}

int isfinite_bin80(const long double x)
{
   return ((((const uint16_t *)&x)[4] & 0x7FFFU) != 0x7FFFU)  ? 1 : 0;
}

MREAL macros - when you need floating point arithmetic while assembling!

Gunther

Hi qWord,

thank you for the fast and qualified answer.  :t Your solution seems to be elegant and hopefully fast. I'll try that right away.

Gunther
You have to know the facts before you can distort them.

Gunther

The kernel-mode driver is nearly finished (95% to be honest). The main part is written in C, but I've realized some time critical parts with the inline assembler. Especially the finite value test is faster by a factor of 2 compared with the traditional C macros. The right idea gave qWord with his answer. Thank you again.  :t

At the moment I'm fighting with the REAL10 values to complete the remaining 5%. If that's done, I'll prepare a small test bed and post it inside the 64-bit forum. Maybe that will help other forum members, too.

Gunther 
You have to know the facts before you can distort them.

dedndave

real conversion is interesting to me
wish i had more time to play with it
even so, i would be writing 32-bit code, and you'd have to convert it - lol

Gunther

Dave,

Quote from: dedndave on January 02, 2015, 03:49:28 PM
real conversion is interesting to me

it is not a question of number conversion. That's made inside the user layer by PSTricks. My driver has only to check the validity of the numbers (not infinity, not a NaN). Of course, the driver has to do a lot of other things, but these are not so complicated.

Quote from: dedndave on January 02, 2015, 03:49:28 PM
even so, i would be writing 32-bit code, and you'd have to convert it - lol

That would be nice. But PSTricks files will be compiled by Latex, which is running in 64-bit LongMode on that embedded system. I don't know if 32-bit drivers are working properly under that circumstances. Therefore, I've written all the stuff as a native 64-bit kernel-mode driver. That works fine and the results are very good. I can hope to finish my work this weekend.

Gunther
You have to know the facts before you can distort them.

dedndave

oh - well, that part isn't too difficult
maybe these tables will help...

;REAL4/REAL8
;Exponent   Fraction   Meaning
;
;   0          0       Signed Zero
;   0        <> 0      Signed Denormal
; Other       Any      Signed Normal
;All 1's       0       Signed Infinity
;All 1's     <> 0      NaN


;REAL10
;  Exponent                     Significand                  Meaning
;
;    0000       0000_0000_0000_0000                          Signed Zero
;    0000       0000_0000_0000_0001 to 7FFF_FFFF_FFFF_FFFF   Signed Denormal
;    0000       8000_0000_0000_0000 to FFFF_FFFF_FFFF_FFFF   Signed Pseudo-Denormal
;
;0001 to 7FFE   0000_0000_0000_0000 to 7FFF_FFFF_FFFF_FFFF   Invalid
;0001 to 7FFE   8000_0000_0000_0000 to FFFF_FFFF_FFFF_FFFF   Signed Normal
;
;    7FFF       0000_0000_0000_0000 to 7FFF_FFFF_FFFF_FFFF   Invalid
;    7FFF       8000_0000_0000_0000                          Signed Infinity
;    7FFF       8000_0000_0000_0001 to BFFF_FFFF_FFFF_FFFF   Signaling NaN
;    7FFF       C000_0000_0000_0000                          Indefinite Quiet NaN
;    7FFF       C000_0000_0000_0001 to FFFF_FFFF_FFFF_FFFF   Quiet NaN


notice that some of them are different for 8087 vs modern FPU's   :P
also, the 8087 had a bit in the control word to handle infinity
affine or projective (unsigned infinity) - modern FPU's always use affine infinity

Gunther

Dave,

no that part isn't difficult. Thank you for the tables. The trick is, it must be done without the traditional FPU, because that's not allowed in 64-bit kernel-mode drivers. I've described that problem in the first post of this thread. I've found with qWords help a good idea to realize that and I can do it branchless, which makes the thing fast.

Gunther
You have to know the facts before you can distort them.

dedndave

exactly - branching is fairly simple

for REAL4/8, all values are valid

for REAL10,
if the exponent bits are all 0, all values are valid (though, some are not generated by 387 or later)
if the exponent bits are non-0, the high bit of the fraction tells you if it's valid or not

if you want to see how the 8087/287 are different from 387 and later....

http://en.wikipedia.org/wiki/Extended_precision

there is a table about half-way down the page that explains it pretty well
i know you don't care about the older processors
but, understanding is important because some values are generated and interpreted differently
i.e., even though modern FPU's don't generate certain values, they must be correctly interpreted

Gunther

Dave,

it seems that you've done a lot of research about floating point numbers. Good to know. Have you ever handeled the new 128-bit floating point format. The compiler support is lousy and there's not enough hardware around here to support the new data type.

Gunther
You have to know the facts before you can distort them.

dedndave

i wasn't aware of any "new 128-bit floating point format" for windows-based machines
are you refering to quad IEEE floats ?
or are you refering to 128-bit SIMD operations that handle multiple singles/doubles ?

Gunther

Dave,

Quote from: dedndave on January 03, 2015, 05:25:23 AM
i wasn't aware of any "new 128-bit floating point format" for windows-based machines
are you refering to quad IEEE floats ?
or are you refering to 128-bit SIMD operations that handle multiple singles/doubles ?

no, I'm writing about the quadruple-precision format. PowerPC and SPARC machines support this new format.

Gunther
You have to know the facts before you can distort them.

dedndave

well, i've never worked with them
but, i like that format better than the 80-bit extended
nearly half the possible values are invalid - lol (may as well be called a 79-bit format)
still - the best format when working with the x86 FPU, though

makes no sense to me - specify a leading 1, then waste that bit of resolution

qWord

Actual the IEEE Standard 754-2008 defines explicit:

  • 4 Binary interchange formats: binary16, binary32, binary64 and binary128
  • 3 decimal interchange formats: decimal32, decimal64 and decimal128
Also arbitrary precision formats are defined as multiples of 32 (k*32|k≥4):

  • binary160, binary192, ...
  • decimal160, decimal192, ...

... just as side note
MREAL macros - when you need floating point arithmetic while assembling!

Gunther

Hi Dave,

Quote from: dedndave on January 03, 2015, 06:07:44 AM
well, i've never worked with them
but, i like that format better than the 80-bit extended

the REAL10 format is certainly not ideal, but it's the format with the largest accuracy, which is supported by hardware.

The binary128 format is supported (emulating) as  __float128 by the gcc, while some PowerPC and SPARC machines support the new format via hardware.

Gunther 
You have to know the facts before you can distort them.