News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

FPU funnies - One for Raymond :-)

Started by K_F, April 05, 2016, 05:17:53 PM

Previous topic - Next topic

K_F

Come across an interesting 'problem'..

If I pre-load the barrel outside the loop, and run in the loop with a single load.
thinking it'll make it faster...  The first calculation gives the correct result, but from the 2nd calc onwards.. it's garbage


fild DWORD PTR [esi].r_Rect.i_Top ; [top]
fild DWORD PTR [esi].r_Rect.i_Height ; [Hgt][top]
.

Loopcode:
fld     QWORD PTR [edi]         ; [data][hgt][top]
fmul     ST(0), ST(1) ; [data*hgt][hgt][top]
fadd     ST(0), ST(2) ; [data*hgt+top][hgt][top]
fistp     Y2
blah
blah
blah
       loop    loopcode


BUT... it works fine with this code, where I load from memory within the loop

Loopcode:
fld    QWORD PTR [edi] ; [data]
fimul     DWORD PTR [esi].r_Rect.i_Height ; [data*hgt]
fiadd    DWORD PTR [esi].r_Rect.i_Top ; [data*hgt+top]
        fistp     Y2
blah
blah
blah
       loop    loopcode

These things have been sent to test me - I know it??  :biggrin:

I'm not sure if I missed something.. but have narrowed it down to this lot
The FPU is saved just before this loop and restored afterwards..
Have a Intel Core I7 cpu
:icon_eek:
'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'

K_F

Let me plonk the whole procedure down.. might make it easier..



;------------------------------------------------------------------------------
;
;------------------------------------------------------------------------------
GDI_CreatePathData Proc USES ebx ecx edx esi edi \
ptr_GraphStruct:Dword

LOCAL GdiObj:Dword
LOCAL GdiPen:Dword

Local X1:Dword ; Line Co-ordinates
Local Y1:Dword ;
Local X2:Dword ;
Local Y2:Dword ;
Local xMax:Dword ; Array size
Local yMax:Dword ; Bottom of display area rectangle
local xInc:Dword ;
local ptrMax:Dword ;


ASSUME ESI:PTR GraphStruct

mov esi, ptr_GraphStruct ; ESI = ptr_GraphStruct
mov eax, [esi].h_GDI_Obj
mov GdiObj, eax
mov eax, [esi].h_Pen
mov GdiPen, eax

;--- FPU STATE SAVE ------------------------------------
fsave FFT_FPU_Save2 ;

;--- SET GRAPH PIXEL LIMITS ----------------------------
mov eax, [esi].r_Rect.i_Left ; start point
mov X2, eax ; X2 = left limit
mov eax, [esi].r_Rect.i_Bottom ;
mov Y2, eax ; Y2 = Max (bottom)
mov eax, [esi].r_Rect.i_Right ;
mov xMax, eax ;

;--- SET BUFFER LIMITS ---------------------------------
mov ecx, [esi].s_IPData.i_DataCount ; OP buffer size+
shl ecx, 3 ; Buffersize * 8
mov edi, [esi].s_OPData.ptr_Data ; OP buffer ptr
add ecx, edi ; ECX = END OF DATA BUFFER
mov ptrMax, ecx

;--- LOAD GRAPH SCALING MULTIPLIER ---------------------
;--- PROBLEM SECTION -----------------------------------
; fild DWORD PTR [esi].r_Rect.i_Top ; [top]
; fild DWORD PTR [esi].r_Rect.i_Height ; [Hgt][top]
;-------------------------------------------------------

;--- LOAD/CALC START POS AND PIX INCREMENTS ------------
mov edx, [esi].s_OPData.M_ScaleY.i_PixsPerMark ; EDX = x graph increment
mov edx, 1 ;TEST
mov xInc, edx ;TEST

;--- CALC ADJUSTED STARTPOINT --------------------------
mov eax, [esi].s_OPData.M_ScaleY.i_Pos ; EAX = INPUT DATA START POINT
shl eax, 3 ; x8
add edi, eax ; EDI = INPUT DATA POINTER
sub edi, 8 ;
;----------------- MAIN DATA LOOP ----------------------
@@:
;--- SHIFT LAST DATA AND CHECK GRAPH PIXEL LIMIT -------
mov ebx, X2 ; Adjust X2 to next value
add ebx, xInc ; X increment
cmp ebx, xMax ; Check if we're at end of Rectangle
ja @F ; Yep - Exit

mov eax, Y2
mov Y1, eax
mov eax, X2
mov X1, eax
mov X2, ebx ; X2 = new position

;--- CHECK FOR END OF OUTPUT BUFFER --------------------
add edi, 8 ; Check end of buffer
cmp edi, ptrMax ;
jae @F ; Yep.. finished

; DisFP edi
;--- ALL OK... CALCULATE INVERSE POSITION Y ------------
fld QWORD PTR [edi] ; [data]
fimul DWORD PTR [esi].r_Rect.i_Height ; [data*hgt]
fiadd DWORD PTR [esi].r_Rect.i_Top ; [data*hgt+top]
fistp Y2

;--- PROBLEM SECTION -----------------------------------
; fld QWORD PTR [edi] ; [data][hgt][top]
; fmul ST(0), ST(1) ; [data*hgt][hgt][top]
; fadd ST(0), ST(2) ; [data*hgt+top][hgt][top]
; fistp Y2
;-------------------------------------------------------
Invoke GdipDrawLineI, GdiObj, GdiPen, X1, Y1, X2, Y2 ; Drawline
jmp @B ;
@@:
frstor FFT_FPU_Save2 ;
Return TRUE ;

ASSUME ESI: NOTHING
GDI_CreatePathData Endp
'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'

qWord

Per calling convention the FPU registers are not saved by called function. Also it is expected that the FPU stack is empty when a function is called.
MREAL macros - when you need floating point arithmetic while assembling!

K_F

Damm.. right front of my eyes.. thanks :redface:

Didn't even think about the GDI api !!
I'm such a dickhead.. :biggrin:
'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'

raymond

QuoteAlso it is expected that the FPU stack is empty when a function is called.
I may be wrong but I don't believe that is a necessity.
QuotePer calling convention the FPU registers are not saved by called function.
That is correct and has been reported on several occasions in the past.
Whenever you assume something, you risk being wrong half the time.
https://masm32.com/masmcode/rayfil/index.html

jj2007

Quote from: qWord on April 05, 2016, 06:41:39 PM
Per calling convention the FPU registers are not saved by called function. Also it is expected that the FPU stack is empty when a function is called.

Actually, MSDN is quite vague about it:
QuoteThe MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved across context switches. There is no explicit calling convention for these registers.

On Win7-64, a quick test with console printing and opening a file might lead to the conclusion that "no explicit calling convention" means "the FPU is safe". Nope, it isn't. Using my old Gdi+ template (attached, uses FpuFill), it seems that Gdi+ trashes most FPU registers, and requires at least 4 free ones to work properly. OTOH most Windows messages seem to leave the FPU intact.

raymond

QuoteOTOH most Windows messages seem to leave the FPU intact.

The MessageBox API is well known to trash whatever fpu registers it needs.
Whenever you assume something, you risk being wrong half the time.
https://masm32.com/masmcode/rayfil/index.html

jj2007

Quote from: raymond on April 06, 2016, 10:09:37 AMThe MessageBox API is well known to trash whatever fpu registers it needs.

Win7-64:

  deb 4, "before", ST(0), ST(1), ST(2), ST(3), ST(4), ST(5), ST(6)
  invoke MessageBox, 0, str$(eax), chr$("Title"), MB_OK
  deb 4, "after", ST(0), ST(1), ST(2), ST(3), ST(4), ST(5), ST(6)


before
ST(0)           1001.000000000000000
ST(1)           1002.000000000000000
ST(2)           1003.000000000000000
ST(3)           1004.000000000000000
ST(4)           1005.000000000000000
ST(5)           1006.000000000000000
ST(6)           0.0

after
ST(0)           1001.000000000000000
ST(1)           1002.000000000000000
ST(2)           1003.000000000000000
ST(3)           1004.000000000000000
ST(4)           1005.000000000000000
ST(5)           1006.000000000000000
ST(6)           0.0

raymond

Can we believe that MS programmers are slowly improving their code for a better interface with users???   ::)

For example, up to Win 98 inclusive, it used to be that a program would crash if you modified the esi/edi registers before returning to the OS. Starting with WinXP, it didn't matter anymore; no more need to preserve them for the OS and vice versa!!!

Could we eventually expect a similar behavior with the fpu registers?
Whenever you assume something, you risk being wrong half the time.
https://masm32.com/masmcode/rayfil/index.html

qWord

According to Agner Fog it is:
QuoteThe floating point registers ST(0)-ST(7) need not be saved. The register stack must be
emptied before any call or return, except for registers used for return values. The 64-bit Microsoft compiler does not use ST(0)-ST(7).

msdn is not that clear:
Code ("msdn: Floating Point Coprocessor and Calling Conventions") Select
If you are writing assembly routines for the floating point coprocessor,
you must preserve the floating point control word and clean the coprocessor stack unless
you are returning a float or double value (which your function should return in ST(0)).


Saving required values is much simpler for caller than callees, because the caller knows how many  values are pushed on the stack. In that context it makes sense that the stack must be empty before calling APIs.

Quote from: raymond on April 07, 2016, 02:13:46 AM
For example, up to Win 98 inclusive, it used to be that a program would crash if you modified the esi/edi registers before returning to the OS. Starting with WinXP, it didn't matter anymore; no more need to preserve them for the OS and vice versa!!!
wrong! You must follow the calling convention - it is just that (for whatever reason) Microsoft decided to preserve nonvolatile register for Window procedures and in later versions also wraps calls to the WndProc in an silently-all-eating exception handler. Other Callback functions are not affected and follows the Convention: e.g. test the call back of EnumWindows - it will crash if you destroy ESI, EDI and EBX. (at least on Win7)

Quote from: jj2007 on April 06, 2016, 10:01:44 AM
Actually, MSDN is quite vague about it:
QuoteThe MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved across context switches. There is no explicit calling convention for these registers.
That quote applies to x64 only.

regards,
qWord
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

Quote from: raymond on April 07, 2016, 02:13:46 AM
Can we believe that MS programmers are slowly improving their code for a better interface with users???   ::)

Yes, I have that suspicion, too. Unfortunately it's not documented, so relying on it would be very unwise.
OTOH, the reason for this "change of strategy" is pretty obvious: There are lots of lousy programmers and compilers out there, and the first one they blame if there's a bug is Microsoft Windows. So it makes sense for the Windows team to eliminate this source of trouble...

K_F

I seem to be having a problem with the FCOMI instruction too.. :icon_eek:

edt... Not to worry.. Just losing my marbles  :biggrin: - used wrong instruction  :redface:

Mann.. I love Masm (smoochies.. muh muh muh)
This thing is so dam fast.. even with my 'bad' un-optimised code.  :badgrin:
Click a button and the chart/graphs whip into action before you can see..
Not perfect yet.. but GDI is now functioning smoothly - needs a lot of cleaning up - beer time!!!  :biggrin:

'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'

Siekmanski

Looks good, for what goal are you going to use the filters?
Creative coders use backward thinking techniques as a strategy.

K_F

The weights for artificial intelligence.. blending to nodes of 3D or higher dimensions.
Could get very busy.. so once working..speed and optimisations will be important...
Hence my GTX970 (good for games too ) ;)
'Sire, Sire!... the peasants are Revolting !!!'
'Yes, they are.. aren't they....'