Simple floating point macros.

hutch-- · August 13, 2018, 04:36:54 PM

I need to add at least some simple support for floating point in 64 bit MASM and since there are at least a few people who actually understand how the old co-processor works, I wondered if there was a better way to do these simple calculations. The criterion is to perform each calculation and leave the result popped into st(0) for further calculations. The input data has to be valid and as FP does not support immediate values, the old macros from the 32 bit version of MASM handle that OK with only some minor alignment changes.

Win64 does not specify co-processor performance or FP register usage but I have tried to keep this available for folks who want to do calculations for maths rather than video tasks.

fpadd MACRO arg1,arg2
fld arg1
fld arg2
faddp
ENDM

fpsub MACRO arg1,arg2
fld arg1
fld arg2
fsubp
ENDM

fpdiv MACRO arg1,arg2
fld arg1
fld arg2
fdivp
ENDM

fpmul MACRO arg1,arg2
fld arg1
fld arg2
fmulp
ENDM

fpsqrt MACRO number,target
fld number
fsqrt
fstp target
ENDM

daydreamer · August 13, 2018, 05:17:01 PM

I like to use old fpu,sometimes ,maybe you should follow the old ways of fpu mnemonic ,by have both FPMUL and FPMULP macros
but if you have content on fpu stack you might need support MACROS that take only one argument also to make it work

I also want to suggest a FPRCP reciprocal macro,it speeds up both fpu math and SIMD math and helps your coding,so you dont need to spend much time with calculator and type in long reciprocal numbers
FSQRT is both great for students make phytagoras calculation program,but also to add simplest round light function on bitmap

hutch-- · August 13, 2018, 05:25:12 PM

What I have done with these is to pop the result back into st(0) so that it is a consistent interface where one function can follow after the other.

fpadd st(0), addme
fpadd st(0), addme
fpadd st(0), addme

RuiLoureiro · August 13, 2018, 09:17:03 PM

Quote from: hutch-- on August 13, 2018, 04:36:54 PM

fpadd MACRO arg1,arg2
fld arg1
fld arg2
faddp
ENDM

fpadd st(0), addme
fpadd st(0), addme
fpadd st(0), addme

Hutch,
I dont know what you want to do but
the last code means this:

fld st(0) ; get a copy to st(0) <<<- why to load the first st(0) ?
fld addme ; load addme
faddp ; do st(1)+st(0) = st(0)
;
fld st(0) ; get a copy to st(0) <<<- why to get a copy ?
fld addme ; load addme
faddp ; do st(1)+st(0) = st(0)
;
fld st(0) ; get a copy to st(0) <<<- why to get a copy again ?
fld addme ; load addme
faddp ; do st(1)+st(0) = st(0)
;
; So we have st(1) and st(0)=st(0)+addme+addme+addme inside FPU
; where st(1) is the first st(0). Do we need to preserve the first st(0) ?
;----------------------------------------------------------------------------------
;
; another way
;--------------
fld firstst0 ; first st(0)
fld addme ; load addme
faddp ; do st(1)+st(0) = st(0)= firstst0+addme
;
fld addme ; load addme
faddp ; do st(1)+st(0) = st(0)=firstst0+addme+addme
;
fld addme ; load addme
faddp ; do st(1)+st(0) = st(0)=firstst0+addme+addme+addme
;
; So we have only st(0) inside FPU

HSE · August 13, 2018, 11:03:22 PM

I don't know what you mean by old co-procesors, but last 20 years you can make:

Code Select


    fpadd MACRO arg1,arg2
      fld arg1
      fadd arg2
    ENDM

    fpsub MACRO arg1,arg2
      fld arg1
      fsub arg2
    ENDM

    fpdiv MACRO arg1,arg2
      fld arg1
      fdiv arg2
    ENDM

    fpmul MACRO arg1,arg2
      fld arg1
      fmul arg2
    ENDM

hutch-- · August 13, 2018, 11:48:05 PM

He he, my first processor with a co-processor was in a i486 that cost me a fortune in about 1990. The co-processor has been around that long so 28 years says its old but for folks who want maths rather than just video processing, it is still a very useful capacity.

Rui,

You are right but in the first place I wanted each macro to be complete dumping the result into st(0). You are correct that in continuous code you would not keep loading st(0). What I am after is making the use simple.

HSE · August 14, 2018, 12:17:35 AM

Quote from: hutch-- on August 13, 2018, 11:48:05 PM
He he, my first processor with a co-processor was in a i486 that cost me a fortune in about 1990.

Yes, I used a software FPU emulator before 486dx but, belive me, I don't remember if was posible to make "fadd memory" at that time

. (Was my time of GWbasic, just some specific programs required FPU)

RuiLoureiro · August 14, 2018, 12:55:19 AM

Quote from: hutch-- on August 13, 2018, 11:48:05 PM

...
Rui,

You are right but in the first place I wanted each macro to be complete dumping the result into st(0). You are correct that in continuous code you would not keep loading st(0). What I am after is making the use simple.

Hutch,
That set of macros, i think they are useful, but now add another set with only 1 argument and the operation instruction. In that way it doesnt copy st(0). Something like this:

fpadd1 macro arg1
fld arg1
fadd
endm

So we start with
finit
fpadd addme0, addme
and then we use
fpadd1 addme
fpadd1 addme. ; st(0)=addme0+addme+addme+addme

and we have not any st(1) inside FPU.

When the macro name ends with 1 we know that we want to use st(0) inside FPU operated with that new instruction.
fpadd A,B
fpmul1 C ; st(0)= C*(A+B) and no st(1) inside

When we need to preserve the last st(0) we use the macro with 2 arguments.
It is only my opinion.

hutch-- · August 14, 2018, 02:44:12 AM

Rui,

Is this what you mean ? It produces the correct result and only loads st(0) at the end as you suggested..

.data
fpbuff REAL8 0.0
addme REAL8 111.111

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE

mov pbuf, ptr$(buff)

fld fpbuff

fld addme
faddp
fld addme
faddp
fld addme
faddp
fld st(0)
fstp fpbuff

invoke fptoa,fpbuff,pbuf
conout pbuf,lf

waitkey
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

This is the disassembly.

.text:0000000140001016 DD0540100000 fld qword ptr [0x14000205c]
.text:000000014000101c DD0542100000 fld qword ptr [0x140002064]
.text:0000000140001022 DEC1 faddp st(1)
.text:0000000140001024 DD053A100000 fld qword ptr [0x140002064]
.text:000000014000102a DEC1 faddp st(1)
.text:000000014000102c DD0532100000 fld qword ptr [0x140002064]
.text:0000000140001032 DEC1 faddp st(1)
.text:0000000140001034 D9C0 fld st(0)
.text:0000000140001036 DD1D20100000 fstp qword ptr [0x14000205c]

RuiLoureiro · August 14, 2018, 03:15:18 AM

Quote from: hutch-- on August 14, 2018, 02:44:12 AM
Rui,

Is this what you mean ? It produces the correct result and only loads st(0) at the end as you suggested..

;----------------------------------------------------------------------
fpadd MACRO arg1, arg2
fld arg1
fld arg2
faddp
ENDM
;----------------------------------------------------------------------
fpadd1 MACRO arg1 ; let me say that i prefer fpadd and fpadd1
fld arg1
faddp
ENDM
;+++++++++++++++++++++++++++++++++++++++
.data
fpbuff REAL8 0.0 ; we dont need to load this variable. If we want 0.0 we use fldz
; <--- this is only the output variable

addme REAL8 111.111 ; <<<--- this is the input variable

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE

mov pbuf, ptr$(buff)
finit

;fld fpbuff
;fld addme
;faddp

; start and add 2 arguments
;------------------------------
fpadd addme, addme ; st(0)= 2*addme

;fld addme
;faddp

; now add another argument
;-------------------------------
fpadd1 addme ; st(0)= 3*addme

;fld addme
;faddp
;fld st(0) <<<<- THIS HERE CREATE a new st(1)
; we may do it only if we need to go on with another operation
; that needs this value

;----remove st(0) to fpbuff and the FPU is cleaned------
fstp fpbuff

invoke fptoa,fpbuff, pbuf ; <<<--- convert to pbuf
conout pbuf,lf

waitkey
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

This is the disassembly.

.text:0000000140001016 DD0540100000 fld qword ptr [0x14000205c]
.text:000000014000101c DD0542100000 fld qword ptr [0x140002064]
.text:0000000140001022 DEC1 faddp st(1)
.text:0000000140001024 DD053A100000 fld qword ptr [0x140002064]
.text:000000014000102a DEC1 faddp st(1)
.text:000000014000102c DD0532100000 fld qword ptr [0x140002064]
.text:0000000140001032 DEC1 faddp st(1)
.text:0000000140001034 D9C0 fld st(0)
.text:0000000140001036 DD1D20100000 fstp qword ptr [0x14000205c]

No it was not my suggestion. Try now.

jj2007 · August 14, 2018, 03:16:21 AM

Code Select

fpadd MACRO arg1,arg2
  ifdifi <arg1>, <ST(0)>
	if type(arg1) eq REAL4 or type(arg1) eq REAL8 or type(arg1) eq REAL10
		fld arg1
	else
		fild arg1
	endif
  endif
  if type(arg2) eq REAL4 or type(arg2) eq REAL8
	fadd arg2              ; <<<<<<<<<<<<<< see remark by HSE above
  elseif type(arg2) eq REAL10
	.err <REAL10 not allowed for second arg>
  else
	fiadd arg2
  endif
ENDM

Only ST(7) is being used. Test code (MyDD are dwords, MyR4 is 1000.0):

Code Select

  fpadd MyR8a, MyR8b
  Print Str$("MyR8a+MyR8b=%f\n", ST(0)v)

  fpadd MyR8a, MyDDb
  Print Str$("MyR8a+MyDDb=%f\n", ST(0)v)

  fpadd MyDDa, MyR8b
  Print Str$("MyDDa+MyR8b=%f\n", ST(0)v)

  fpadd MyDDa, MyDDb
  Print Str$("MyDDa+MyDDb=%f\n", ST(0))

  fpadd ST(0), MyR4
  Print Str$("ST(0)+MyR4= %f\n", ST(0)v)

Results:

Code Select

MyR8a+MyR8b=777.7770
MyR8a+MyDDb=777.4560
MyDDa+MyR8b=777.3210
MyDDa+MyDDb=777.0000
ST(0)+MyR4= 7777.778

Testbed attached - MasmBasic, sorry.

Quote from: hutch-- on August 13, 2018, 05:25:12 PM
fpadd st(0), addme

Implemented but the syntax is longer than fadd addme - a matter of taste maybe ;)

hutch-- · August 14, 2018, 03:46:13 AM

Rui,

I tend to try things incrementally, first try was to remove the unnecessary st(0) loads which was your suggestion. Just remember I have not used this stuff for many years.

I will try out more of your suggestions as I find the instructions.

JJ,

WTF ?

hutch-- · August 14, 2018, 04:24:42 AM

Here is the next try. Removal of redundant variable, fninit to reset FPU, (did not see the point of the fwait) and fldz as it is more efficient that loading the memory operand. The dot prefix in the two macros is only so it does not clash with existing.

.ldst0 MACRO var
fld st(0) ;; load st(0)
fstp var ;; store it in variable
ENDM

.fpadd MACRO arg1
fld arg1
faddp
ENDM

.data
addme REAL8 111.111

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
LOCAL fpval :REAL8

mov pbuf, ptr$(buff) ; get buffer pointer

fninit ; clear FPU registers and flags
fldz ; zero st(0)

.fpadd addme ; add value to st(0)
.fpadd addme
.fpadd addme

.ldst0 fpval ; load st(0) into variable

invoke fptoa,fpval,pbuf ; convert fpval to string
conout pbuf,lf ; display at console

waitkey
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end

comment #

.text:0000000140001016 DBE3 fninit
.text:0000000140001018 D9EE fldz
.text:000000014000101a DD053C100000 fld qword ptr [0x14000205c]
.text:0000000140001020 DEC1 faddp st(1)
.text:0000000140001022 DD0534100000 fld qword ptr [0x14000205c]
.text:0000000140001028 DEC1 faddp st(1)
.text:000000014000102a DD052C100000 fld qword ptr [0x14000205c]
.text:0000000140001030 DEC1 faddp st(1)
.text:0000000140001032 D9C0 fld st(0)
.text:0000000140001034 DD9D70FFFFFF fstp qword ptr [rbp-0x90]

#

RuiLoureiro · August 14, 2018, 04:25:30 AM

Quote from: hutch-- on August 14, 2018, 03:46:13 AM
Rui,

I tend to try things incrementally, first try was to remove the unnecessary st(0) loads which was your suggestion. Just remember I have not used this stuff for many years.

I will try out more of your suggestions as I find the instructions.

JJ,

WTF ?

No problems, i am trying to give suggestions but ... no more.

About fpsqrt sometimes we need to do sqrt of st(0).
For example, to solve the equation A.x^2+B.x+C=0 we need to do sqrt(B^2-4.A.C). So after we have st(0)=B^2-4.A.C we need to do sqrt of st(0).

HSE · August 14, 2018, 04:26:24 AM

@Hutch:

Why not:
fld fpbuff

fadd addme
fadd addme
fadd addme
fld st(0) ; here you also are pushing st(0) to st(1)
fstp fpbuff
fstp <-- you forget this ( that was in st(1))

@JJ:
I agree. It's posible replace

Code Select

 elseif type(arg2) eq REAL10
	.err <REAL10 not allowed for second arg>

with

Code Select

 elseif type(arg2) eq REAL10
	fld arg2
        faddp     ;  or fsubp

But it not recomended to use REAL10 like variables. The idea is only to use that to store FPU state.

The MASM Forum

News:

Simple floating point macros.

hutch--

daydreamer

hutch--

RuiLoureiro

HSE

hutch--

HSE

RuiLoureiro

hutch--

RuiLoureiro

jj2007

hutch--

hutch--

RuiLoureiro

HSE