Author Topic: High-accuracy floating point arithmetic (Win32)  (Read 14485 times)

Gunther

  • Member
  • *****
  • Posts: 3585
  • Forgive your enemies, but never forget their names
Re: High-accuracy floating point arithmetic (Win32)
« Reply #15 on: January 08, 2013, 06:08:33 AM »
Hi Edgar,

I don't even bother with anything before XP, and even that is mostly because I have to change the target OS otherwise and its easier to just make stuff XP compatible than to get all of the header messages telling me I've used unsupported API's. But more and more I'm getting "xxx (Not Available)" when I try to assemble a program with XP as the target OS so I may end up changing the default to Vista soon. Really have to thank Yuri for convincing me to include API filtering in the header files.

yes, I think that's the way to go.

Gunther
Get your facts first, and then you can distort them.

FORTRANS

  • Member
  • *****
  • Posts: 1077
Re: High-accuracy floating point arithmetic (Win32)
« Reply #16 on: January 08, 2013, 06:54:48 AM »
i don't think win98 "knows" about SSE
back then, they had the FPU and MMX and that was about it
but - i don't think it cares, either - it isn't going to generate exceptions like NT-based OS's   

Hi Dave(s),

   Well it won't run the FloatFail program.  On one machine,
Win98 says illegal instruction.  On the other an exception
of some sort.  Both showing the the same bytes at EIP.

Quote
i think Steve and Michael have win98 machines up and running - probably Sinsi, as well
i have one, but i'd have to knock 6 inches of dust off to run it - lol
mine has a pentium III MMX

   Yeah, right now a P-MMX laptop and a P-II MMX.  Have a
P-Pro, but it declined to boot last I tried.

Quote
i gave up trying to support anything older than win 2000
there are just too many features that the earlier OS's didn't have
and too few people still using them

you can figure that most users that have a CPU that supports SSE, also have win 2000 or newer

   True, though the P-III results were weird.

                 XMM result      FPU result      Long Accumulator result
                 ==========      ==========      =======================

Array 1          -1.#J           -1.#J           137.00
Array 2          -1.#J           137.00          137.00
Array 3          -1.#J           136.00          137.00
Array 4          -1.#J           139.00          137.00
Array 5          -1.#J           137.00          137.00
Array 6          -1.#J           134.00          137.00

The right result is 137.00 and nothing else!


   Not sure what to make of that, except it isn't too happy.

Cheers,

Steve N.

Gunther

  • Member
  • *****
  • Posts: 3585
  • Forgive your enemies, but never forget their names
Re: High-accuracy floating point arithmetic (Win32)
« Reply #17 on: January 08, 2013, 07:22:52 AM »
Hi Steve,

   True, though the P-III results were weird.

                 XMM result      FPU result      Long Accumulator result
                 ==========      ==========      =======================

Array 1          -1.#J           -1.#J           137.00
Array 2          -1.#J           137.00          137.00
Array 3          -1.#J           136.00          137.00
Array 4          -1.#J           139.00          137.00
Array 5          -1.#J           137.00          137.00
Array 6          -1.#J           134.00          137.00

The right result is 137.00 and nothing else!


   Not sure what to make of that, except it isn't too happy.

Cheers,

Steve N.

the results are clear, the P-III doesn't support SSE2. But the first FPU result is a bit strange. Please check fail.asm; it's pure, clean FPU code. I don't know the reason for that behaviour.

Gunther
Get your facts first, and then you can distort them.

FORTRANS

  • Member
  • *****
  • Posts: 1077
Re: High-accuracy floating point arithmetic (Win32)
« Reply #18 on: January 08, 2013, 08:36:47 AM »
Hi Dave,

   Right.  But if it does not support SSE2, how can the
program run at all?  Undefined opcodes were in the 80186
and 80286.

   From your BATch file, it appears you use yasm and gcc.
I do not have those.  Maybe Michael has them.  I can run
things, though I am not too familar with Windows code.

Regards,

Steve N.

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: High-accuracy floating point arithmetic (Win32)
« Reply #19 on: January 08, 2013, 10:38:06 AM »
there have been improvements on the FPU instruction set, almost every step of CPU upgrade
it's a very different animal than the original 8087   :P

Gunther

  • Member
  • *****
  • Posts: 3585
  • Forgive your enemies, but never forget their names
Re: High-accuracy floating point arithmetic (Win32)
« Reply #20 on: January 08, 2013, 09:32:04 PM »
Dave,

there have been improvements on the FPU instruction set, almost every step of CPU upgrade
it's a very different animal than the original 8087   :P

yes, of course, but the simple fadd instruction should work.

Steve,

From your BATch file, it appears you use yasm and gcc.
I do not have those.  Maybe Michael has them.  I can run
things, though I am not too familar with Windows code.

Regards,

Steve N.

I'll try to convert the code into MASM (JWASM) syntax and post it here as an attachment. That should work for you. But I can't do it before next weekend.

Gunther
Get your facts first, and then you can distort them.

Antariy

  • Member
  • ****
  • Posts: 551
Re: High-accuracy floating point arithmetic (Win32)
« Reply #21 on: January 09, 2013, 09:18:13 AM »
Hi Gunther :t

Code: [Select]

                 XMM result      FPU result      Long Accumulator result
                 ==========      ==========      =======================

Array 1          0.00            136.00          137.00
Array 2          17.00           137.00          137.00
Array 3          120.00          136.00          137.00
Array 4          147.00          139.00          137.00
Array 5          137.00          137.00          137.00
Array 6          -10.00          134.00          137.00

The right result is 137.00 and nothing else!


This is XP SP2

Win98 SE AFAIK supports SSE1/SSE2

Gunther

  • Member
  • *****
  • Posts: 3585
  • Forgive your enemies, but never forget their names
Re: High-accuracy floating point arithmetic (Win32)
« Reply #22 on: January 10, 2013, 04:44:02 AM »
Hi Alex,

This is XP SP2

Win98 SE AFAIK supports SSE1/SSE2

thank you for testing; it's a new information for me, but it seems possible.

Gunther
Get your facts first, and then you can distort them.

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
Re: High-accuracy floating point arithmetic (Win32)
« Reply #23 on: January 10, 2013, 04:34:25 PM »
Running on my Windows 2000 P3 system:
Code: [Select]
                 XMM result      FPU result      Long Accumulator result
                 ==========      ==========      =======================

Array 1          -1.#J           -1.#J           137.00
Array 2          -1.#J           137.00          137.00
Array 3          -1.#J           136.00          137.00
Array 4          -1.#J           139.00          137.00
Array 5          -1.#J           137.00          137.00
Array 6          -1.#J           134.00          137.00

I can recall testing other code that used SSE2 instructions on this system, that returned an incorrect result but did not trigger an exception.
Well Microsoft, here’s another nice mess you’ve gotten us into.

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
Re: High-accuracy floating point arithmetic (Win32)
« Reply #24 on: January 10, 2013, 05:52:23 PM »
Compiling with the VC++ Toolkit 2003 compiler, I get this output on my P3:
Code: [Select]
                 XMM result      FPU result      Long Accumulator result
                 ==========      ==========      =======================

Array 1          -1.#J           0.00            137.00
Array 2          -1.#J           17.00           137.00
Array 3          -1.#J           120.00          137.00
Array 4          -1.#J           147.00          137.00
Array 5          -1.#J           137.00          137.00
Array 6          -1.#J           -10.00          137.00

And the same FPU results on my P4. But looking at the compiled code I can’t see the problem.
Code: [Select]
; Listing generated by Microsoft (R) Optimizing Compiler Version 13.10.3077

TITLE floatfail.c
.386P
include listing.inc
if @Version gt 510
.model FLAT
else
_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS ENDS
$$SYMBOLS SEGMENT BYTE USE32 'DEBSYM'
$$SYMBOLS ENDS
_TLS SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS ENDS
FLAT GROUP _DATA, CONST, _BSS
ASSUME CS: FLAT, DS: FLAT, SS: FLAT
endif

INCLUDELIB LIBC
INCLUDELIB OLDNAMES

PUBLIC _N
CONST SEGMENT
_N DD 05H
CONST ENDS
_DATA SEGMENT
$SG1186 DB 0aH, 09H, 09H, ' XMM result', 09H, ' FPU result', 09H, ' '
DB 'Long Accumulator result', 0aH, 00H
ORG $+3
$SG1187 DB 09H, 09H, ' ==========', 09H, ' ==========', 09H, ' ====='
DB '==================', 0aH, 0aH, 00H
ORG $+3
$SG1188 DB 'Array 1', 09H, 09H, ' %.2f', 09H, 09H, ' %.2f', 09H, 09H
DB ' %.2f', 0aH, 00H
ORG $+2
$SG1189 DB 'Array 2', 09H, 09H, ' %.2f', 09H, 09H, ' %.2f', 09H, 09H
DB ' %.2f', 0aH, 00H
ORG $+2
$SG1190 DB 'Array 3', 09H, 09H, ' %.2f', 09H, 09H, ' %.2f', 09H, 09H
DB ' %.2f', 0aH, 00H
ORG $+2
$SG1191 DB 'Array 4', 09H, 09H, ' %.2f', 09H, 09H, ' %.2f', 09H, 09H
DB ' %.2f', 0aH, 00H
ORG $+2
$SG1192 DB 'Array 5', 09H, 09H, ' %.2f', 09H, 09H, ' %.2f', 09H, 09H
DB ' %.2f', 0aH, 00H
ORG $+2
$SG1193 DB 'Array 6', 09H, 09H, ' %.2f', 09H, 09H, ' %.2f', 09H, 09H
DB ' %.2f', 0aH, 00H
ORG $+2
$SG1194 DB 0aH, 'The right result is 137.00 and nothing else!', 0aH, 00H
_DATA ENDS
PUBLIC _main
EXTRN _SumUpFPU:NEAR
EXTRN _SumUpXMM:NEAR
EXTRN _SumUpLAC:NEAR
EXTRN __fltused:NEAR
EXTRN _printf:NEAR
; Function compile flags: /Odt
_TEXT SEGMENT
_sum2FPU$ = -192 ; size = 4
_V3$ = -188 ; size = 20
_sum4XMM$ = -168 ; size = 4
_sum5FPU$ = -164 ; size = 4
_V1$ = -160 ; size = 20
_sum1LAC$ = -140 ; size = 4
_sum4FPU$ = -136 ; size = 4
_sum1FPU$ = -132 ; size = 4
_sum3XMM$ = -128 ; size = 4
_V5$ = -124 ; size = 20
_sum5XMM$ = -104 ; size = 4
_sum2LAC$ = -100 ; size = 4
_sum6XMM$ = -96 ; size = 4
_V2$ = -92 ; size = 20
_sum1XMM$ = -72 ; size = 4
_sum3LAC$ = -68 ; size = 4
_sum3FPU$ = -64 ; size = 4
_sum6LAC$ = -60 ; size = 4
_V4$ = -56 ; size = 20
_V6$ = -36 ; size = 20
_sum5LAC$ = -16 ; size = 4
_sum2XMM$ = -12 ; size = 4
_sum4LAC$ = -8 ; size = 4
_sum6FPU$ = -4 ; size = 4
_argc$ = 8 ; size = 4
_argv$ = 12 ; size = 4
_main PROC NEAR
; File d:\downloads\gunther\floatfail.c
; Line 11
push ebp
mov ebp, esp
sub esp, 192 ; 000000c0H
; Line 17
mov DWORD PTR _V1$[ebp], 1621981420 ; 60ad78ecH
mov DWORD PTR _V1$[ebp+4], 1099431936 ; 41880000H
mov DWORD PTR _V1$[ebp+8], -1054867456 ; c1200000H
mov DWORD PTR _V1$[ebp+12], 1124204544 ; 43020000H
mov DWORD PTR _V1$[ebp+16], -525502228 ; e0ad78ecH
; Line 18
mov DWORD PTR _V2$[ebp], 1621981420 ; 60ad78ecH
mov DWORD PTR _V2$[ebp+4], -1054867456 ; c1200000H
mov DWORD PTR _V2$[ebp+8], 1124204544 ; 43020000H
mov DWORD PTR _V2$[ebp+12], -525502228 ; e0ad78ecH
mov DWORD PTR _V2$[ebp+16], 1099431936 ; 41880000H
; Line 19
mov DWORD PTR _V3$[ebp], 1621981420 ; 60ad78ecH
mov DWORD PTR _V3$[ebp+4], 1099431936 ; 41880000H
mov DWORD PTR _V3$[ebp+8], -525502228 ; e0ad78ecH
mov DWORD PTR _V3$[ebp+12], -1054867456 ; c1200000H
mov DWORD PTR _V3$[ebp+16], 1124204544 ; 43020000H
; Line 20
mov DWORD PTR _V4$[ebp], 1621981420 ; 60ad78ecH
mov DWORD PTR _V4$[ebp+4], -1054867456 ; c1200000H
mov DWORD PTR _V4$[ebp+8], -525502228 ; e0ad78ecH
mov DWORD PTR _V4$[ebp+12], 1124204544 ; 43020000H
mov DWORD PTR _V4$[ebp+16], 1099431936 ; 41880000H
; Line 21
mov DWORD PTR _V5$[ebp], 1621981420 ; 60ad78ecH
mov DWORD PTR _V5$[ebp+4], -525502228 ; e0ad78ecH
mov DWORD PTR _V5$[ebp+8], 1099431936 ; 41880000H
mov DWORD PTR _V5$[ebp+12], -1054867456 ; c1200000H
mov DWORD PTR _V5$[ebp+16], 1124204544 ; 43020000H
; Line 22
mov DWORD PTR _V6$[ebp], 1621981420 ; 60ad78ecH
mov DWORD PTR _V6$[ebp+4], 1099431936 ; 41880000H
mov DWORD PTR _V6$[ebp+8], 1124204544 ; 43020000H
mov DWORD PTR _V6$[ebp+12], -525502228 ; e0ad78ecH
mov DWORD PTR _V6$[ebp+16], -1054867456 ; c1200000H
; Line 35
mov eax, DWORD PTR _N
push eax
lea ecx, DWORD PTR _V1$[ebp]
push ecx
call _SumUpFPU
add esp, 8
fstp DWORD PTR _sum1FPU$[ebp]
; Line 36
mov edx, DWORD PTR _N
push edx
lea eax, DWORD PTR _V1$[ebp]
push eax
call _SumUpXMM
add esp, 8
fstp DWORD PTR _sum1XMM$[ebp]
; Line 37
mov ecx, DWORD PTR _N
push ecx
lea edx, DWORD PTR _V1$[ebp]
push edx
call _SumUpLAC
add esp, 8
fstp DWORD PTR _sum1LAC$[ebp]
; Line 38
mov eax, DWORD PTR _N
push eax
lea ecx, DWORD PTR _V2$[ebp]
push ecx
call _SumUpFPU
add esp, 8
fstp DWORD PTR _sum2FPU$[ebp]
; Line 39
mov edx, DWORD PTR _N
push edx
lea eax, DWORD PTR _V2$[ebp]
push eax
call _SumUpXMM
add esp, 8
fstp DWORD PTR _sum2XMM$[ebp]
; Line 40
mov ecx, DWORD PTR _N
push ecx
lea edx, DWORD PTR _V2$[ebp]
push edx
call _SumUpLAC
add esp, 8
fstp DWORD PTR _sum2LAC$[ebp]
; Line 41
mov eax, DWORD PTR _N
push eax
lea ecx, DWORD PTR _V3$[ebp]
push ecx
call _SumUpFPU
add esp, 8
fstp DWORD PTR _sum3FPU$[ebp]
; Line 42
mov edx, DWORD PTR _N
push edx
lea eax, DWORD PTR _V3$[ebp]
push eax
call _SumUpXMM
add esp, 8
fstp DWORD PTR _sum3XMM$[ebp]
; Line 43
mov ecx, DWORD PTR _N
push ecx
lea edx, DWORD PTR _V3$[ebp]
push edx
call _SumUpLAC
add esp, 8
fstp DWORD PTR _sum3LAC$[ebp]
; Line 44
mov eax, DWORD PTR _N
push eax
lea ecx, DWORD PTR _V4$[ebp]
push ecx
call _SumUpFPU
add esp, 8
fstp DWORD PTR _sum4FPU$[ebp]
; Line 45
mov edx, DWORD PTR _N
push edx
lea eax, DWORD PTR _V4$[ebp]
push eax
call _SumUpXMM
add esp, 8
fstp DWORD PTR _sum4XMM$[ebp]
; Line 46
mov ecx, DWORD PTR _N
push ecx
lea edx, DWORD PTR _V4$[ebp]
push edx
call _SumUpLAC
add esp, 8
fstp DWORD PTR _sum4LAC$[ebp]
; Line 47
mov eax, DWORD PTR _N
push eax
lea ecx, DWORD PTR _V5$[ebp]
push ecx
call _SumUpFPU
add esp, 8
fstp DWORD PTR _sum5FPU$[ebp]
; Line 48
mov edx, DWORD PTR _N
push edx
lea eax, DWORD PTR _V5$[ebp]
push eax
call _SumUpXMM
add esp, 8
fstp DWORD PTR _sum5XMM$[ebp]
; Line 49
mov ecx, DWORD PTR _N
push ecx
lea edx, DWORD PTR _V5$[ebp]
push edx
call _SumUpLAC
add esp, 8
fstp DWORD PTR _sum5LAC$[ebp]
; Line 50
mov eax, DWORD PTR _N
push eax
lea ecx, DWORD PTR _V6$[ebp]
push ecx
call _SumUpFPU
add esp, 8
fstp DWORD PTR _sum6FPU$[ebp]
; Line 51
mov edx, DWORD PTR _N
push edx
lea eax, DWORD PTR _V6$[ebp]
push eax
call _SumUpXMM
add esp, 8
fstp DWORD PTR _sum6XMM$[ebp]
; Line 52
mov ecx, DWORD PTR _N
push ecx
lea edx, DWORD PTR _V6$[ebp]
push edx
call _SumUpLAC
add esp, 8
fstp DWORD PTR _sum6LAC$[ebp]
; Line 56
push OFFSET FLAT:$SG1186
call _printf
add esp, 4
; Line 57
push OFFSET FLAT:$SG1187
call _printf
add esp, 4
; Line 58
fld DWORD PTR _sum1LAC$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum1FPU$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum1XMM$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
push OFFSET FLAT:$SG1188
call _printf
add esp, 28 ; 0000001cH
; Line 59
fld DWORD PTR _sum2LAC$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum2FPU$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum2XMM$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
push OFFSET FLAT:$SG1189
call _printf
add esp, 28 ; 0000001cH
; Line 60
fld DWORD PTR _sum3LAC$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum3FPU$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum3XMM$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
push OFFSET FLAT:$SG1190
call _printf
add esp, 28 ; 0000001cH
; Line 61
fld DWORD PTR _sum4LAC$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum4FPU$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum4XMM$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
push OFFSET FLAT:$SG1191
call _printf
add esp, 28 ; 0000001cH
; Line 62
fld DWORD PTR _sum5LAC$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum5FPU$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum5XMM$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
push OFFSET FLAT:$SG1192
call _printf
add esp, 28 ; 0000001cH
; Line 63
fld DWORD PTR _sum6LAC$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum6FPU$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
fld DWORD PTR _sum6XMM$[ebp]
sub esp, 8
fstp QWORD PTR [esp]
push OFFSET FLAT:$SG1193
call _printf
add esp, 28 ; 0000001cH
; Line 64
push OFFSET FLAT:$SG1194
call _printf
add esp, 4
; Line 65
xor eax, eax
; Line 66
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
END
Well Microsoft, here’s another nice mess you’ve gotten us into.

Gunther

  • Member
  • *****
  • Posts: 3585
  • Forgive your enemies, but never forget their names
Re: High-accuracy floating point arithmetic (Win32)
« Reply #25 on: January 11, 2013, 07:17:45 AM »
Hi Michael,

Compiling with the VC++ Toolkit 2003 compiler, I get this output on my P3:
Code: [Select]
                 XMM result      FPU result      Long Accumulator result
                 ==========      ==========      =======================

Array 1          -1.#J           0.00            137.00
Array 2          -1.#J           17.00           137.00
Array 3          -1.#J           120.00          137.00
Array 4          -1.#J           147.00          137.00
Array 5          -1.#J           137.00          137.00
Array 6          -1.#J           -10.00          137.00

at least, the VC++ Toolkit 2003 compiler calculates the sum for Array1 correct.

Quote
And the same FPU results on my P4. But looking at the compiled code I can’t see the problem.

I think there isn't a problem with the code. But what is the problem?

Gunther
Get your facts first, and then you can distort them.