News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Other floating point macros.

Started by jj2007, August 14, 2018, 07:38:48 PM

Previous topic - Next topic

jj2007

As you all know, I am a great fan of macros. But I rarely use my handful of fpu macros. It's just not worth the effort, because fpu is really simple and concise once you get the hang of it.

Otoh, there is an advanced set of macros by Qword, SmplMath. Any users around who could talk about their experience?

six_L

where is the error?
the result is ok when _LoopFrequency=1,2,3
error:
_LoopFrequency=4, result=1.0
_LoopFrequency=5, result=6.0
_LoopFrequency=6, result=1.0
_LoopFrequency=7, result=6.0
...
_floating_pointAdd Proc uses rbx _LoopFrequency:QWORD

fninit                      ;; clear FPU registers and flags
fldz                        ;; zero st(0)

mov rbx,_LoopFrequency

fld _One_real8
fld _One_real8
faddp
@@:
fld _One_real8
faddp
fld _One_real8
faddp
fld _One_real8
faddp
fld _One_real8
faddp
fld _One_real8
faddp

sub rbx,1
jnz @B

fld _One_real8
fsubp

fstp result
   
  invoke RtlZeroMemory,ADDR szBuffer, sizeof szBuffer
invoke FpuFLtoA64, ADDR result,40,ADDR szBuffer,SRC1_REAL Or SRC2_DIMM       
invoke SetWindowText,hEdithWnd,addr szBuffer

ret

_floating_pointAdd Endp
Say you, Say me, Say the codes together for ever.

RuiLoureiro

#2
Quote from: six_L on August 14, 2018, 08:28:43 PM
where is the error?
the result is ok when _LoopFrequency=1,2,3
error:
_LoopFrequency=4, result=1.0
_LoopFrequency=5, result=6.0
_LoopFrequency=6, result=1.0
_LoopFrequency=7, result=6.0
...

_floating_pointAdd Proc uses rbx _LoopFrequency:QWORD
   
   fninit                      ;; clear FPU registers and flags
   fldz                        ;; zero st(0)   <<<<<<<<<<<<<<<<<<---- st(0) = 0.0
;**********************************
;               remove fldz above
;**********************************

   mov   rbx,_LoopFrequency

   fld   _One_real8
   fld   _One_real8
   faddp
@@:
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   
   sub   rbx,1
   jnz   @B
   
   fld   _One_real8
   fsubp

   fstp   result
;----------------------------------------------------
; HERE we have st(0) = 0.0 FPU is not cleaned
;---------------------------------------------------
   
     invoke   RtlZeroMemory,ADDR szBuffer, sizeof szBuffer
   invoke   FpuFLtoA64, ADDR result,40,ADDR szBuffer,SRC1_REAL Or SRC2_DIMM       
   invoke   SetWindowText,hEdithWnd,addr szBuffer
   
   ret
   
_floating_pointAdd Endp

HSE

Quote from: jj2007 on August 14, 2018, 07:38:48 PM
It's just not worth the effort, because fpu is really simple and concise once you get the hang of it.

In 32bit, using last Hutch macros is 0.5% faster. A posible adventage of that macros (in future?) is to check that fpu instructions are not in procedures in wich MMX registers are used as regular registers (Hutch idea), but there is others ways. Of course, in complex programs, preprocessing eat time  :biggrin: (and make things more complex eat brains?  :icon_confused:)

Quote from: jj2007 on August 14, 2018, 07:38:48 PM
Otoh, there is an advanced set of macros by Qword, SmplMath. Any users around who could talk about their experience?

SmplMath is another kind of macros, it's a system that allow to write eficiently equations with no penalties (perhaps I have 1/100 penalties making things more easy, no matter). fSlv8 dnam = (0.143 * (iprot ^ 0.73)) /( (0.0461 * nut2) ^ (1.0 / 0.73));

If you only have one equation is nice, but in one model I have around 1200 lines! Without SmplMath is totally crazy to work that. 

Equations in Assembly: SmplMath

jj2007

Quote from: HSE on August 14, 2018, 10:50:26 PMin complex programs, preprocessing eat time

Don't confuse build time with run time: Well-written macros are exactly as fast as the hand-written "pure" equivalent.

QuoteIf you only have one equation is nice, but in one model I have around 1200 lines! Without SmplMath is totally crazy to work that.

Wow, that's a big project! I've never touched SmplMath, simply because I wanted to achieve the same for MasmBasic but didn't want my mind to be contaminated by QWord's thinking (but so far I didn't find the energy to complete it...). It seems he has done a great job :t

HSE

Quote from: jj2007 on August 15, 2018, 12:49:30 AM
I've never touched SmplMath, simply because I wanted to achieve the same for MasmBasic...

It's a challenge you enjoy, no doubt. In practical terms could be more efficient to defy and to improve what is already done. 
Equations in Assembly: SmplMath

RuiLoureiro

Quote from: six_L on August 14, 2018, 08:28:43 PM
where is the error?
the result is ok when _LoopFrequency=1,2,3
error:
_LoopFrequency=4, result=1.0   <<<<<<<<<< ???????
_LoopFrequency=5, result=6.0
_LoopFrequency=6, result=1.0   <<<<<<<<<< ???????
_LoopFrequency=7, result=6.0
...
_floating_pointAdd Proc uses rbx _LoopFrequency:QWORD
   
   fninit                      ;; clear FPU registers and flags
   fldz                        ;; zero st(0)

   mov   rbx,_LoopFrequency

   fld   _One_real8
   fld   _One_real8
   faddp
@@:
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   fld   _One_real8
   faddp
   
   sub   rbx,1
   jnz   @B
   
   fld   _One_real8
   fsubp

   fstp   result
   
     invoke   RtlZeroMemory,ADDR szBuffer, sizeof szBuffer
   invoke   FpuFLtoA64, ADDR result,40,ADDR szBuffer,SRC1_REAL Or SRC2_DIMM       
   invoke   SetWindowText,hEdithWnd,addr szBuffer
   
   ret
   
_floating_pointAdd Endp

Hi all,
        Did you test this prog ? Is it true that we get

                _LoopFrequency=4, result=1.0   <<<<<<<<<< ???????                                                          _LoopFrequency=5, result=6.0               
             _LoopFrequency=6, result=1.0   <<<<<<<<<< ???????               
              _LoopFrequency=7, result=6.0

I am not able to run it but i wrote the same code for console and the result is nothing of this and the result seems to be correct only looking at it. So, where is the problem ? Do you know ?

Ths file is in reply #1
Thanks  :t

My results (given by my ConverterDF):

FloatingPointAdd - 1

          6.0              2 + 5 - 1
  FloatingPointAdd - 2
          11.0         2 + 10 -1
  FloatingPointAdd - 3
          16.0             2 + 15 -1
  FloatingPointAdd - 4
          21.0             2 + 20 -1
  FloatingPointAdd - 5
          26.0             2 + 25 -1
  FloatingPointAdd - 6                <<<<<--- _LoopFrequency
          31.0             2 + 30 -1
  FloatingPointAdd - 7
          36.0             2 + 35 -1

          ************** END *****************

six_L

Hi,Rui
I guess the error maybe in FpuFLtoA64.
Do you have the 64 bit ConvertoDF?
Say you, Say me, Say the codes together for ever.

RuiLoureiro

Quote from: six_L on August 15, 2018, 10:09:14 AM
Hi,Rui
I guess the error maybe in FpuFLtoA64.
Do you have the 64 bit ConvertoDF?
Hi six_L,
             It seems, maybe.
             ConverterDF is to convert real4 to string and is elsewhere HERE in the forum
topic Converting real4 to string. There is also Converting real8 to string and Converting real10 to string. Also converting string to real4/real8/real10. You may search here or search by RuiLoureiro and you get topics where i wrote things.
Unfortunatly my CPU doesnt work with 64 bits so i cannot write code for that.
See you  :t

daydreamer

Quote from: jj2007 on August 14, 2018, 07:38:48 PM
As you all know, I am a great fan of macros. But I rarely use my handful of fpu macros. It's just not worth the effort, because fpu is really simple and concise once you get the hang of it.

Otoh, there is an advanced set of macros by Qword, SmplMath. Any users around who could talk about their experience?
I have written simd macros Before,not any fpu macros Before,but today I want to make a fpu macro influenced by SSE RCP** instruction
maybe some different macro could be useful,for example Before innerloop fill few of st1-st7 with reciprocals,so you can use faster FMUL's instead
Another thing could be fpu calculate few cosine/sine and load some of them to SIMD for smoothing algorithm
for example perlin noise needs some smoothing between the random pixels or just a bitmap resize algo
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Quote from: RuiLoureiro on August 16, 2018, 12:51:34 AMConverterDF is to convert real4 to string and is elsewhere HERE in the forum topic Converting real4 to string. There is also Converting real8 to string and Converting real10 to string. Also converting string to real4/real8/real10.

I recommend Val("123") and Str$(anynumber)  ;)

RuiLoureiro


six_L

QuoteHi six_L,
       It seems, maybe.
       ConverterDF is to convert real4 to string and is elsewhere HERE in the forum
topic Converting real4 to string. There is also Converting real8 to string and Converting real10 to string. Also converting string to real4/real8/real10. You may search here or search by RuiLoureiro and you get topics where i wrote things.
Unfortunatly my CPU doesnt work with 64 bits so i cannot write code for that.
See you  :t
Hi,RuiLoureiro
thanks your respones.
Say you, Say me, Say the codes together for ever.

RuiLoureiro


:biggrin: Hi six_L
             You are right, we get that wrong results you got so the problem is in the converter i guess also because you doesnt do anything else and the code is very simple.
See you
:t