Simple floating point macros.

RuiLoureiro · August 14, 2018, 04:35:26 AM

Quote from: hutch-- on August 14, 2018, 04:24:42 AM
Here is the next try. Removal of redundant variable, fninit to reset FPU, (did not see the point of the fwait) and fldz as it is more efficient that loading the memory operand. The dot prefix in the two macros is only so it does not clash with existing.

.ldst0 MACRO var
fld st(0) ;; load st(0) <<<- get/make a copy of st(0)
;; the previous is now st(1)=st(0)
fstp var ;; store it in variable <<<-- and remove st(0)
;; the previous st(1) is now st(0)
ENDM

.fpadd MACRO arg1
fld arg1
faddp
ENDM

.data
addme REAL8 111.111

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
LOCAL fpval :REAL8

mov pbuf, ptr$(buff) ; get buffer pointer <<<< YES

fninit ; clear FPU registers and flags <<<< YES
fldz ; zero st(0) <<<< YES

.fpadd addme ; add value to st(0) <<<< YES
.fpadd addme
.fpadd addme
;-----------------------------------------------------------
; This is "get a copy of st(0)" and store it into variable
;-----------------------------------------------------------
.ldst0 fpval ; load st(0) into variable

;----------------------------------------------------------
; HERE the FPU has another equal st(0) inside
; to remove it we do: «fstp st»
;----------------------------------------------------------
invoke fptoa,fpval,pbuf ; convert fpval to string <<<< YES
conout pbuf,lf ; display at console <<<< YES

waitkey
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end

comment #

.text:0000000140001016 DBE3 fninit
.text:0000000140001018 D9EE fldz
.text:000000014000101a DD053C100000 fld qword ptr [0x14000205c]
.text:0000000140001020 DEC1 faddp st(1)
.text:0000000140001022 DD0534100000 fld qword ptr [0x14000205c]
.text:0000000140001028 DEC1 faddp st(1)
.text:000000014000102a DD052C100000 fld qword ptr [0x14000205c]
.text:0000000140001030 DEC1 faddp st(1)
.text:0000000140001032 D9C0 fld st(0)
.text:0000000140001034 DD9D70FFFFFF fstp qword ptr [rbp-0x90]

#

Thats OK with a little problem :t

HSE · August 14, 2018, 04:55:03 AM

Hutch!! Why You are not sleeping?

Code Select

    .ldst0 MACRO var
      fld st(0)                   ;; load st(0)
      fstp var                    ;; store it in variable
    ENDM

It's just:
fst var ; store st(0) in variable

hutch-- · August 14, 2018, 04:56:03 AM

He he, I will be shortly.

Rui,

Is this what you meant ?

.ldst0 MACRO var
fld st(0) ;; load st(0)
fstp var ;; store it in variable
fstp st(0) ;; pop st(0)
ENDM

RuiLoureiro · August 14, 2018, 05:32:29 AM

Quote from: hutch-- on August 14, 2018, 04:56:03 AM

He he, I will be shortly.

Rui,

Is this what you meant ?

.ldst0 MACRO var
fld st(0) ;; load st(0) <<<<<-- the best way is "get a copy of st(0)"
;; or load the current st(0) to a new st(0). The previous is now
;; st(1) and is equal to st(0)
fstp var ;; store it in variable
fstp st(0) ;; pop st(0) <<<<-- or remove the current st(0).
ENDM

No, we need to use only fstp var and not .ldst0. We dont need to add fld st(0) and at the end to do fstp st(0).
So use only fstp var.

fld st(0) is a trap because we think that we load st(0) etc. etc. But at the time we write fld st(0) there is an st(0) and then we make a copy of that st(0) to a new st(0) with fld st(0). So the previous st(0) is now st(1) and is equal st(0) after fld st(0).
Remember that when we need one copy of st(1), st(2) etc, we do fld st(1), fld st(2) etc.
And the previous st(1) or st(2) is now the new st(0) and st(1) is now st(2) and st(2) is now st(3) etc. etc.

About the code, we dont need to start with fldz. The better way is to start with
fpadd addme, addme and next .fpadd addme.
fldz is used when we want to compare st(0) with 0.0 (or another register).
Dont forget also that when we remove one st(0), the previous st(1) is the new st(0). If we have not any previous st(1) the FPU is cleaned (no variables inside).

HSE · August 14, 2018, 05:35:21 AM

There is no purpose to make a macro to load or to store st(0). Have a little more sense if you are thinking to retrive other fpu register:

Code Select

   .ldst MACRO register, var
      fld st(&register)                   ;; load st(?)
      fstp var                    ;; store it in variable
    ENDM

use: fst var1 (better than .ldst 0, var1)
.ldst 1, var2
.ldst 2, var3

The final fstp it's not in the macro.

Normal code:

Code Select

    fld fpbuff
    fadd addme
    fadd addme
    fadd addme
    fstp fpbuff

Not normal code (but no error):

Code Select

    fld fpbuff
    fadd addme
    fadd addme
    fadd addme
    fst fpbuff
    fstp

Strange Hutch idea:

Code Select

    fld fpbuff
    fadd addme
    fadd addme
    fadd addme
    .ldst 0, fpbuff
    fstp ; most assemblers know that this is fstp st(0)

jj2007 · August 14, 2018, 06:42:35 AM

Quote from: hutch-- on August 14, 2018, 03:46:13 AMWTF ?

Test them. Same syntax as the original version, just a bit more versatile because you can use REAL* and/or dword variables.

hutch-- · August 14, 2018, 10:48:10 AM

I think I have got the swing of most of it, effectively the x87 register stack functions like a circular buffer and the trick is to ensure that you do not imbalance the FPU stack. When I am a little more awake I will add the version of the test code that looks like its close to being reasonably efficient for what need to be simple to use macros.

To Rui and HSE, thank you both for your assistance in blowing out the cobwebs, I looked at the date of the last FP stuff I did and its around the year 2000 so it really has been a long time since I wrote any x87 code.

raymond · August 14, 2018, 11:18:14 AM

Hutch,

Sorry for being a bit harsh, but unfortunately you are playing with fire trying to "use" the FPU without knowing what you are doing.

One of the biggest trap for such action is that FPU registers are very different from the ALU registers: they CANNOT BE OVERWRITTEN with new data except in very specific circumstances. Trying to load new data when all 8 registers are full would result in generating GARBAGE. That is why keeping track of register usage is extremely important and offering macros which would leave results on the FPU may not be the best tool for fpu newbies.

Emulating what HLLs do with floats would be a better idea, i.e. do calculations on data from memory and return the result immediately to a memory variable, leaving all FPU registers EMPTY at the termination of each macro. The other option is to insert a 'slow' finit at the start of each macro to ensure that the user will never have any problem.

If you intend to continue with this project, you may want to:
i) have at least a quick glance at the tutorial you had asked me to prepare many moons ago,
ii) consider including the use of floats interacting with integers,
iii) design macros which will cater for the multitude of combinations of the size of each variable.

hutch-- · August 14, 2018, 11:55:52 AM

Hi Ray,

I have already taken option 1, have the link set up in my browser. The only 64 bit conversions I could find in the C runtime are set at REAL8 so at the moment I have 2 functions, atofp and fptoa that successfully convert REAL8 in both directions. As the x87 capacity is not specified in Win64, my choice is to try and use it OR simply ignore it and as x87 capacity is useful for people who want floating point maths rather than video, its probably worth a try to get at least some simple macros going.

I am already using the 8 MMX registers for other tasks as MMX is a redundant technology and the methods of using the shared registers are straight forward enough but with a choice of "play with fire" or ignore x87 code, I will at least give it a try as noone else is going to do it in 64 bit MASM. For what its worth, the macros and support code are testing up OK at the moment, both Rui and HSE have been very helpful in tidying up the test pieces I have posted and once I have a bit more code up and running, I will give it a serious bashing to make sure it works correctly.

hutch-- · August 14, 2018, 01:23:41 PM

This is the first test piece with the later macros. It cannot be built as there are macros that have not been published yet but it works fine and handles 500 million iterations with no problems so the stack does not go BANG.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

include \masm32\include64\masm64rt.inc

fpinit MACRO
fninit ;; clear FPU registers and flags
fldz ;; zero st(0)
ENDM

; -------------------------------

fpdiv MACRO arg1,arg2
fld arg1
fld arg2
fdivp
ENDM

fpmul MACRO arg1,arg2
fld arg1
fld arg2
fmulp
ENDM

fpadd MACRO arg1
fld arg1
faddp
ENDM

fpsub MACRO arg1
fld arg1
fsubp
ENDM

; -------------------------------

fpsqrt MACRO number,target
fld number
fsqrt
fstp target
ENDM

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL fpval :REAL8
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE

mov pbuf, ptr$(buff) ; get buffer pointer

addme = FLT8(1.0) ; statement form

fpinit ; initialise FPU & set st(0) to 0.0

; -----------------------------

mov r11, 100000000 ; 100 million iterations, 500 million macro calls
@@:
fpadd addme ; add value to st(0)
fpadd addme
fpadd addme
fpadd addme
fpadd addme
sub r11, 1
jnz @B

; -----------------------------

fpsub FLT8(1.0) ; function form

fstp fpval ; load st(0) into variable

invoke fptoa,fpval,pbuf ; convert fpval to string
conout pbuf,lf ; display at console

waitkey
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end

comment #

.text:0000000140001016 DBE3 fninit
.text:0000000140001018 D9EE fldz
.text:000000014000101a 49C7C300E1F505 mov r11, 0x5f5e100
.text:0000000140001021
.text:0000000140001021 0x140001021:
.text:0000000140001021 DD0539100000 fld qword ptr [0x140002060]
.text:0000000140001027 DEC1 faddp st(1)
.text:0000000140001029 DD0531100000 fld qword ptr [0x140002060]
.text:000000014000102f DEC1 faddp st(1)
.text:0000000140001031 DD0529100000 fld qword ptr [0x140002060]
.text:0000000140001037 DEC1 faddp st(1)
.text:0000000140001039 DD0521100000 fld qword ptr [0x140002060]
.text:000000014000103f DEC1 faddp st(1)
.text:0000000140001041 DD0519100000 fld qword ptr [0x140002060]
.text:0000000140001047 DEC1 faddp st(1)
.text:0000000140001049 4983EB01 sub r11, 0x1
.text:000000014000104d 75D2 jne 0x140001021
.text:000000014000104d
.text:000000014000104f DD0513100000 fld qword ptr [0x140002068]
.text:0000000140001055 DEE9 fsubp st(1)
.text:0000000140001057 DD5D98 fstp qword ptr [rbp-0x68]

#

raymond · August 14, 2018, 01:54:30 PM

Quotefpdiv MACRO arg1,arg2
fld arg1
fld arg2
fdivp
ENDM

How will the assembler know the actual size of arg1 and/or arg2 (i.e. REAL4, REAL8 or REAL10) in order to produce the appropriate code?
Or is it your intention to limit the use to only one specific size? And where would that be specified in clear terms for someone who may know absolutely nothing about floats apart from that it contains a decimal point followed by some decimal digits?

hutch-- · August 14, 2018, 03:22:17 PM

I thought that would be obvious, FLD works on REAL4, REAL8 and REAL10, the macro does not need the size information. The data item size is determined by how the argument is produced. Now given that FP does not support immediate values, an immediate value must be written in the initialised data section to a data variable where you must specify the size.

You can write everything else as LOCAL values, "LOCAL var :REAL10".

The only data size limitation I have at the moment is the C runtime conversions that only handle REAL4 and REAL8 but have no 80 bit support.

RuiLoureiro · August 14, 2018, 05:51:53 PM

Quote from: hutch-- on August 14, 2018, 01:23:41 PM
This is the first test piece with the later macros. It cannot be built as there are macros that have not been published yet but it works fine and handles 500 million iterations with no problems so the stack does not go BANG.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

include \masm32\include64\masm64rt.inc

fpinit MACRO
fninit ;; clear FPU registers and flags
fldz ;; zero st(0)
ENDM

; -------------------------------

fpdiv MACRO arg1,arg2
fld arg1
fld arg2
fdivp
ENDM

fpmul MACRO arg1,arg2
fld arg1
fld arg2
fmulp
ENDM

fpadd MACRO arg1,arg2
fld arg1
fld arg2
faddp
ENDM

fpsub MACRO arg1,arg2
fld arg1
fld arg2
fsubp
ENDM

.fpdiv MACRO arg1
fld arg1
fdivp
ENDM

.fpmul MACRO arg1
fld arg1
fmulp
ENDM

.fpadd MACRO arg1
fld arg1
faddp
ENDM

.fpsub MACRO arg1
fld arg1
fsubp
ENDM
; -------------------------------

fpsqrt MACRO number,target
fld number
fsqrt
fstp target
ENDM

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL fpval :REAL8
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE

mov pbuf, ptr$(buff) ; get buffer pointer

addme = FLT8(1.0) ; statement form

;fpinit ; initialise FPU & set st(0) to 0.0
finit ; initialise FPU
; -----------------------------

mov r11, 100000000 ; 100 million iterations, 500 million macro calls

fpadd addme,addme ; st(0)=addme+addme=2*addme
@@:
.fpadd addme
.fpadd addme
.fpadd addme
; .fpadd addme
sub r11, 1
jnz @B

; -----------------------------

.fpsub FLT8(1.0) ; function form

fstp fpval ; load st(0) into variable

invoke fptoa,fpval,pbuf ; convert fpval to string
conout pbuf,lf ; display at console

waitkey
.exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end

comment #

.text:0000000140001016 DBE3 fninit
.text:0000000140001018 D9EE fldz
.text:000000014000101a 49C7C300E1F505 mov r11, 0x5f5e100
.text:0000000140001021
.text:0000000140001021 0x140001021:
.text:0000000140001021 DD0539100000 fld qword ptr [0x140002060]
.text:0000000140001027 DEC1 faddp st(1)
.text:0000000140001029 DD0531100000 fld qword ptr [0x140002060]
.text:000000014000102f DEC1 faddp st(1)
.text:0000000140001031 DD0529100000 fld qword ptr [0x140002060]
.text:0000000140001037 DEC1 faddp st(1)
.text:0000000140001039 DD0521100000 fld qword ptr [0x140002060]
.text:000000014000103f DEC1 faddp st(1)
.text:0000000140001041 DD0519100000 fld qword ptr [0x140002060]
.text:0000000140001047 DEC1 faddp st(1)
.text:0000000140001049 4983EB01 sub r11, 0x1
.text:000000014000104d 75D2 jne 0x140001021
.text:000000014000104d
.text:000000014000104f DD0513100000 fld qword ptr [0x140002068]
.text:0000000140001055 DEE9 fsubp st(1)
.text:0000000140001057 DD5D98 fstp qword ptr [rbp-0x68]

#

Hi Hutch,
This code works correctly and it is correct. No doubts. But we never need to start with fldz in any case. So we are adding that instruction for nothing.
If we define fpadd and fpsub the same way as fpmul and fpdiv plus .fpadd, .fpsub, .fpmul, .fpdiv of one argument we start with instructions of 2 arguments and we go on with instructions of 1 argument. It is what i did above. You may try to run it. It should work corrctly and the FPU is cleaned at the end.

hutch-- · August 14, 2018, 06:09:43 PM

Thanks Rui, I will give it a blast a bit later. This is the latest one. I think the first one is the better of the two but the commented out one work OK as well. I have found another use for "fldz", if the results of following calculations are turning out wrong, place a fldz before it and if it corrects the following result, the calculation before it needs to be fixed.

fpercent MACRO num,pcnt ;; Get Percentage of number
fld num ;; load the number
fld FLT8(100.0) ;; load the 100 divider
fdivp ;; divide num by 100
fld pcnt ;; load required percentage
fmulp ;; multiple by percentage
ENDM

; fpercent MACRO num,pcnt ;; Get Percentage of number
; fld num ;; load the number
; fld pcnt ;; load required percentage
; fmulp ;; multiple by percentage
; fld FLT8(100.0) ;; load the 100 divider
; fdivp ;; divide num by 100
; ENDM

RuiLoureiro · August 14, 2018, 07:04:41 PM

fpercent seems to be OK, doenst give any problem.
Let me say that when we want to use an integer constant we load it this way

fpconst macro cst
push cst ;100
fild dword ptr [esp] ; load 100 into st(0)
pop eax ; remove from stack
endm

so we may do this also

fld num
fpconst 100
fdivp
fld pcnt
fmulp ; st(0)= (num/100)*pcnt

Another way for fpercent:

fld pnct ; load pnct <<<<<< st(2) >> st(1) >>> removed
fld num ; load num <<<<<< st(1) >>removed
fld FLT8(100.0) ; load constant 100 << st(0) >>removed
fdivp ; st(0)= num/100.0 >>>>>>> st(0) >>> removed
fmulp ; st(0)= (num/100.0)*pcnt >>>>>>>>>>>> st(0)

note: this last code seems to be better because first it loads the 3 factors and then we do the operations. The macros you are written will be efficient code !

The MASM Forum

News:

Simple floating point macros.

RuiLoureiro

HSE

hutch--

RuiLoureiro

HSE

jj2007

hutch--

raymond

hutch--

hutch--

raymond

hutch--

RuiLoureiro

hutch--

RuiLoureiro