I need to add at least some simple support for floating point in 64 bit MASM and since there are at least a few people who actually understand how the old co-processor works, I wondered if there was a better way to do these simple calculations. The criterion is to perform each calculation and leave the result popped into st(0) for further calculations. The input data has to be valid and as FP does not support immediate values, the old macros from the 32 bit version of MASM handle that OK with only some minor alignment changes.
Win64 does not specify co-processor performance or FP register usage but I have tried to keep this available for folks who want to do calculations for maths rather than video tasks.
fpadd MACRO arg1,arg2
fld arg1
fld arg2
faddp
ENDM
fpsub MACRO arg1,arg2
fld arg1
fld arg2
fsubp
ENDM
fpdiv MACRO arg1,arg2
fld arg1
fld arg2
fdivp
ENDM
fpmul MACRO arg1,arg2
fld arg1
fld arg2
fmulp
ENDM
fpsqrt MACRO number,target
fld number
fsqrt
fstp target
ENDM
I like to use old fpu,sometimes ,maybe you should follow the old ways of fpu mnemonic ,by have both FPMUL and FPMULP macros
but if you have content on fpu stack you might need support MACROS that take only one argument also to make it work
I also want to suggest a FPRCP reciprocal macro,it speeds up both fpu math and SIMD math and helps your coding,so you dont need to spend much time with calculator and type in long reciprocal numbers
FSQRT is both great for students make phytagoras calculation program,but also to add simplest round light function on bitmap
What I have done with these is to pop the result back into st(0) so that it is a consistent interface where one function can follow after the other.
fpadd st(0), addme
fpadd st(0), addme
fpadd st(0), addme
Quote from: hutch-- on August 13, 2018, 04:36:54 PM
fpadd MACRO arg1,arg2
fld arg1
fld arg2
faddp
ENDM
fpadd st(0), addme
fpadd st(0), addme
fpadd st(0), addme
Hutch,
I dont know what you want to do but
the last code means this:
fld
st(0) ; get a copy to st(0) <<<- why to load the first st(0) ?
fld addme ; load addme
faddp ; do st(1)+st(0) = st(0)
;
fld st(0)
; get a copy to st(0) <<<- why to get a copy ? fld addme
; load addme faddp
; do st(1)+st(0) = st(0) ;
fld st(0)
; get a copy to st(0) <<<- why to get a copy again ? fld addme
; load addme faddp
; do st(1)+st(0) = st(0) ; ; So we have st(1) and st(0)=st(0)+addme+addme+addme inside FPU ; where st(1) is the first st(0). Do we need to preserve the first st(0) ?
;----------------------------------------------------------------------------------
;
; another way
;--------------
fld
firstst0 ; first st(0)
fld
addme ; load addme
faddp ; do st(1)+st(0) = st(0)=
firstst0+
addme ;
fld
addme ; load addme faddp
; do st(1)+st(0) = st(0)=firstst0+addme+addme ;
fld
addme ; load addme faddp
; do st(1)+st(0) = st(0)=firstst0+addme+addme+addme ; ; So we have only st(0) inside FPU
I don't know what you mean by old co-procesors, but last 20 years you can make:
fpadd MACRO arg1,arg2
fld arg1
fadd arg2
ENDM
fpsub MACRO arg1,arg2
fld arg1
fsub arg2
ENDM
fpdiv MACRO arg1,arg2
fld arg1
fdiv arg2
ENDM
fpmul MACRO arg1,arg2
fld arg1
fmul arg2
ENDM
:biggrin:
He he, my first processor with a co-processor was in a i486 that cost me a fortune in about 1990. The co-processor has been around that long so 28 years says its old but for folks who want maths rather than just video processing, it is still a very useful capacity.
Rui,
You are right but in the first place I wanted each macro to be complete dumping the result into st(0). You are correct that in continuous code you would not keep loading st(0). What I am after is making the use simple.
Quote from: hutch-- on August 13, 2018, 11:48:05 PM
He he, my first processor with a co-processor was in a i486 that cost me a fortune in about 1990.
Yes, I used a software FPU emulator before 486dx but, belive me, I don't remember if was posible to make "fadd memory" at that time :biggrin:. (Was my time of GWbasic, just some specific programs required FPU)
Quote from: hutch-- on August 13, 2018, 11:48:05 PM
:biggrin:
...
Rui,
You are right but in the first place I wanted each macro to be complete dumping the result into st(0). You are correct that in continuous code you would not keep loading st(0). What I am after is making the use simple.
Hutch,
That set of macros, i think
they are useful, but now add another set with only 1 argument and the operation instruction. In that way it doesnt copy st(0). Something like this:
fpadd
1 macro arg1
fld arg1
fadd
endm
So we start with
finit
fpadd addme0, addme
and then we use
fpadd
1 addme
fpadd
1 addme. ; st(0)=addme0+addme+addme+addme
and we have not any st(1) inside FPU.
When the macro name ends with
1 we know that we want to use st(0) inside FPU operated with that new instruction.
fpadd A,B fpmul1 C ; st(0)= C*(A+B) and no st(1) insideWhen we need to preserve the last st(0) we use the macro with 2 arguments.
It is only my opinion.
Rui,
Is this what you mean ? It produces the correct result and only loads st(0) at the end as you suggested..
.data
fpbuff REAL8 0.0
addme REAL8 111.111
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
mov pbuf, ptr$(buff)
fld fpbuff
fld addme
faddp
fld addme
faddp
fld addme
faddp
fld st(0)
fstp fpbuff
invoke fptoa,fpbuff,pbuf
conout pbuf,lf
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
This is the disassembly.
.text:0000000140001016 DD0540100000 fld qword ptr [0x14000205c]
.text:000000014000101c DD0542100000 fld qword ptr [0x140002064]
.text:0000000140001022 DEC1 faddp st(1)
.text:0000000140001024 DD053A100000 fld qword ptr [0x140002064]
.text:000000014000102a DEC1 faddp st(1)
.text:000000014000102c DD0532100000 fld qword ptr [0x140002064]
.text:0000000140001032 DEC1 faddp st(1)
.text:0000000140001034 D9C0 fld st(0)
.text:0000000140001036 DD1D20100000 fstp qword ptr [0x14000205c]
Quote from: hutch-- on August 14, 2018, 02:44:12 AM
Rui,
Is this what you mean ? It produces the correct result and only loads st(0) at the end as you suggested..
;----------------------------------------------------------------------
fpadd MACRO arg1, arg2
fld arg1
fld arg2
faddp
ENDM
;----------------------------------------------------------------------
fpadd1 MACRO arg1 ; let me say that i prefer fpadd and fpadd1
fld arg1
faddp
ENDM
;+++++++++++++++++++++++++++++++++++++++
.data
fpbuff REAL8 0.0 ; we dont need to load this variable. If we want 0.0 we use fldz
; <--- this is only the output variable
addme REAL8 111.111 ; <<<--- this is the input variable
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
mov pbuf, ptr$(buff)
finit
;fld fpbuff
;fld addme
;faddp
; start and add 2 arguments
;------------------------------
fpadd addme, addme ; st(0)= 2*addme
;fld addme
;faddp
; now add another argument
;-------------------------------
fpadd1 addme ; st(0)= 3*addme
;fld addme
;faddp
;fld st(0) <<<<- THIS HERE CREATE a new st(1)
; we may do it only if we need to go on with another operation
; that needs this value
;----remove st(0) to fpbuff and the FPU is cleaned------
fstp fpbuff
invoke fptoa,fpbuff, pbuf ; <<<--- convert to pbuf
conout pbuf,lf
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
This is the disassembly.
.text:0000000140001016 DD0540100000 fld qword ptr [0x14000205c]
.text:000000014000101c DD0542100000 fld qword ptr [0x140002064]
.text:0000000140001022 DEC1 faddp st(1)
.text:0000000140001024 DD053A100000 fld qword ptr [0x140002064]
.text:000000014000102a DEC1 faddp st(1)
.text:000000014000102c DD0532100000 fld qword ptr [0x140002064]
.text:0000000140001032 DEC1 faddp st(1)
.text:0000000140001034 D9C0 fld st(0)
.text:0000000140001036 DD1D20100000 fstp qword ptr [0x14000205c]
No it was not my suggestion. Try now.
fpadd MACRO arg1,arg2
ifdifi <arg1>, <ST(0)>
if type(arg1) eq REAL4 or type(arg1) eq REAL8 or type(arg1) eq REAL10
fld arg1
else
fild arg1
endif
endif
if type(arg2) eq REAL4 or type(arg2) eq REAL8
fadd arg2 ; <<<<<<<<<<<<<< see remark by HSE above
elseif type(arg2) eq REAL10
.err <REAL10 not allowed for second arg>
else
fiadd arg2
endif
ENDM
Only ST(7) is being used. Test code (My
DD are dwords, MyR4 is 1000.0):
fpadd MyR8a, MyR8b
Print Str$("MyR8a+MyR8b=%f\n", ST(0)v)
fpadd MyR8a, MyDDb
Print Str$("MyR8a+MyDDb=%f\n", ST(0)v)
fpadd MyDDa, MyR8b
Print Str$("MyDDa+MyR8b=%f\n", ST(0)v)
fpadd MyDDa, MyDDb
Print Str$("MyDDa+MyDDb=%f\n", ST(0))
fpadd ST(0), MyR4
Print Str$("ST(0)+MyR4= %f\n", ST(0)v)
Results:
MyR8a+MyR8b=777.7770
MyR8a+MyDDb=777.4560
MyDDa+MyR8b=777.3210
MyDDa+MyDDb=777.0000
ST(0)+MyR4= 7777.778
Testbed attached - MasmBasic, sorry.
Quote from: hutch-- on August 13, 2018, 05:25:12 PM
fpadd st(0), addme
Implemented but the syntax is longer than
fadd addme - a matter of taste maybe ;)
Rui,
I tend to try things incrementally, first try was to remove the unnecessary st(0) loads which was your suggestion. Just remember I have not used this stuff for many years.
I will try out more of your suggestions as I find the instructions.
JJ,
WTF ?
Here is the next try. Removal of redundant variable, fninit to reset FPU, (did not see the point of the fwait) and fldz as it is more efficient that loading the memory operand. The dot prefix in the two macros is only so it does not clash with existing.
.ldst0 MACRO var
fld st(0) ;; load st(0)
fstp var ;; store it in variable
ENDM
.fpadd MACRO arg1
fld arg1
faddp
ENDM
.data
addme REAL8 111.111
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
LOCAL fpval :REAL8
mov pbuf, ptr$(buff) ; get buffer pointer
fninit ; clear FPU registers and flags
fldz ; zero st(0)
.fpadd addme ; add value to st(0)
.fpadd addme
.fpadd addme
.ldst0 fpval ; load st(0) into variable
invoke fptoa,fpval,pbuf ; convert fpval to string
conout pbuf,lf ; display at console
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
comment #
.text:0000000140001016 DBE3 fninit
.text:0000000140001018 D9EE fldz
.text:000000014000101a DD053C100000 fld qword ptr [0x14000205c]
.text:0000000140001020 DEC1 faddp st(1)
.text:0000000140001022 DD0534100000 fld qword ptr [0x14000205c]
.text:0000000140001028 DEC1 faddp st(1)
.text:000000014000102a DD052C100000 fld qword ptr [0x14000205c]
.text:0000000140001030 DEC1 faddp st(1)
.text:0000000140001032 D9C0 fld st(0)
.text:0000000140001034 DD9D70FFFFFF fstp qword ptr [rbp-0x90]
#
Quote from: hutch-- on August 14, 2018, 03:46:13 AM
Rui,
I tend to try things incrementally, first try was to remove the unnecessary st(0) loads which was your suggestion. Just remember I have not used this stuff for many years.
I will try out more of your suggestions as I find the instructions.
JJ,
WTF ?
:biggrin:
No problems, i am trying to give suggestions but ... no more.
About fpsqrt sometimes we need to do sqrt of st(0).
For example, to solve the equation A.x^2+B.x+C=0 we need to do sqrt(B^2-4.A.C). So after we have
st(0)=B^2-4.A.C we need to do sqrt of st(0).
@Hutch:
Why not:
fld fpbuff
fadd addme
fadd addme
fadd addme
fld st(0) ; here you also are pushing st(0) to st(1)
fstp fpbuff
fstp <-- you forget this ( that was in st(1))
@JJ:
I agree. It's posible replace
elseif type(arg2) eq REAL10
.err <REAL10 not allowed for second arg>
with
elseif type(arg2) eq REAL10
fld arg2
faddp ; or fsubp
But it not recomended to use REAL10 like variables. The idea is only to use that to store FPU state.
Quote from: hutch-- on August 14, 2018, 04:24:42 AM
Here is the next try. Removal of redundant variable, fninit to reset FPU, (did not see the point of the fwait) and fldz as it is more efficient that loading the memory operand. The dot prefix in the two macros is only so it does not clash with existing.
.ldst0 MACRO var
fld st(0) ;; load st(0) <<<- get/make a copy of st(0)
;; the previous is now st(1)=st(0)
fstp var ;; store it in variable <<<-- and remove st(0)
;; the previous st(1) is now st(0)
ENDM
.fpadd MACRO arg1
fld arg1
faddp
ENDM
.data
addme REAL8 111.111
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
LOCAL fpval :REAL8
mov pbuf, ptr$(buff) ; get buffer pointer <<<< YES
fninit ; clear FPU registers and flags <<<< YES
fldz ; zero st(0) <<<< YES
.fpadd addme ; add value to st(0) <<<< YES
.fpadd addme
.fpadd addme
;-----------------------------------------------------------
; This is "get a copy of st(0)" and store it into variable
;-----------------------------------------------------------
.ldst0 fpval ; load st(0) into variable
;----------------------------------------------------------
; HERE the FPU has another equal st(0) inside
; to remove it we do: «fstp st»
;----------------------------------------------------------
invoke fptoa,fpval,pbuf ; convert fpval to string <<<< YES
conout pbuf,lf ; display at console <<<< YES
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
comment #
.text:0000000140001016 DBE3 fninit
.text:0000000140001018 D9EE fldz
.text:000000014000101a DD053C100000 fld qword ptr [0x14000205c]
.text:0000000140001020 DEC1 faddp st(1)
.text:0000000140001022 DD0534100000 fld qword ptr [0x14000205c]
.text:0000000140001028 DEC1 faddp st(1)
.text:000000014000102a DD052C100000 fld qword ptr [0x14000205c]
.text:0000000140001030 DEC1 faddp st(1)
.text:0000000140001032 D9C0 fld st(0)
.text:0000000140001034 DD9D70FFFFFF fstp qword ptr [rbp-0x90]
#
Thats OK with a little problem :t
Hutch!! Why You are not sleeping? :biggrin:
.ldst0 MACRO var
fld st(0) ;; load st(0)
fstp var ;; store it in variable
ENDM
It's just:
fst var ; store st(0) in variable
:biggrin:
He he, I will be shortly.
Rui,
Is this what you meant ?
.ldst0 MACRO var
fld st(0) ;; load st(0)
fstp var ;; store it in variable
fstp st(0) ;; pop st(0)
ENDM
Quote from: hutch-- on August 14, 2018, 04:56:03 AM
:biggrin:
He he, I will be shortly.
Rui,
Is this what you meant ?
.ldst0 MACRO var
fld st(0) ;; load st(0) <<<<<-- the best way is "get a copy of st(0)"
;; or load the current st(0) to a new st(0). The previous is now
;; st(1) and is equal to st(0)
fstp var ;; store it in variable
fstp st(0) ;; pop st(0) <<<<-- or remove the current st(0).
ENDM
No, we need to use only
fstp var and not
.ldst0. We dont need to add fld st(0) and at the end to do fstp st(0).
So use only fstp var.
fld st(0) is a trap because we think that we load st(0) etc. etc. But at the time we write fld st(0) there is an st(0) and then we make a copy of that st(0) to a new st(0) with fld st(0). So the previous st(0) is now st(1) and is equal st(0) after fld st(0).
Remember that when we need one copy of st(1), st(2) etc, we do fld st(1), fld st(2) etc.
And the previous st(1) or st(2) is now the new st(0) and st(1) is now st(2) and st(2) is now st(3) etc. etc.
About the code, we dont need to start with fldz. The better way is to start with
fpadd addme, addme and next .fpadd addme.
fldz is used when we want to compare st(0) with 0.0 (or another register).Dont forget also that when we remove one st(0), the previous st(1) is the new st(0). If we have not any previous st(1) the FPU is cleaned (no variables inside).
There is no purpose to make a macro to load or to store st(0). Have a little more sense if you are thinking to retrive other fpu register: .ldst MACRO register, var
fld st(®ister) ;; load st(?)
fstp var ;; store it in variable
ENDM
use: fst var1 (better than .ldst 0, var1)
.ldst 1, var2
.ldst 2, var3
The final fstp it's not in the macro.
Normal code: fld fpbuff
fadd addme
fadd addme
fadd addme
fstp fpbuff
Not normal code (but no error): fld fpbuff
fadd addme
fadd addme
fadd addme
fst fpbuff
fstp
Strange Hutch idea:
fld fpbuff
fadd addme
fadd addme
fadd addme
.ldst 0, fpbuff
fstp ; most assemblers know that this is fstp st(0)
Quote from: hutch-- on August 14, 2018, 03:46:13 AMWTF ?
Test them. Same syntax as the original version, just a bit more versatile because you can use REAL* and/or dword variables.
I think I have got the swing of most of it, effectively the x87 register stack functions like a circular buffer and the trick is to ensure that you do not imbalance the FPU stack. When I am a little more awake I will add the version of the test code that looks like its close to being reasonably efficient for what need to be simple to use macros.
To Rui and HSE, thank you both for your assistance in blowing out the cobwebs, I looked at the date of the last FP stuff I did and its around the year 2000 so it really has been a long time since I wrote any x87 code.
Hutch,
Sorry for being a bit harsh, but unfortunately you are playing with fire trying to "use" the FPU without knowing what you are doing.
One of the biggest trap for such action is that FPU registers are very different from the ALU registers: they CANNOT BE OVERWRITTEN with new data except in very specific circumstances. Trying to load new data when all 8 registers are full would result in generating GARBAGE. That is why keeping track of register usage is extremely important and offering macros which would leave results on the FPU may not be the best tool for fpu newbies.
Emulating what HLLs do with floats would be a better idea, i.e. do calculations on data from memory and return the result immediately to a memory variable, leaving all FPU registers EMPTY at the termination of each macro. The other option is to insert a 'slow' finit at the start of each macro to ensure that the user will never have any problem.
If you intend to continue with this project, you may want to:
i) have at least a quick glance at the tutorial you had asked me to prepare many moons ago,
ii) consider including the use of floats interacting with integers,
iii) design macros which will cater for the multitude of combinations of the size of each variable.
Hi Ray,
I have already taken option 1, have the link set up in my browser. The only 64 bit conversions I could find in the C runtime are set at REAL8 so at the moment I have 2 functions, atofp and fptoa that successfully convert REAL8 in both directions. As the x87 capacity is not specified in Win64, my choice is to try and use it OR simply ignore it and as x87 capacity is useful for people who want floating point maths rather than video, its probably worth a try to get at least some simple macros going.
I am already using the 8 MMX registers for other tasks as MMX is a redundant technology and the methods of using the shared registers are straight forward enough but with a choice of "play with fire" or ignore x87 code, I will at least give it a try as noone else is going to do it in 64 bit MASM. For what its worth, the macros and support code are testing up OK at the moment, both Rui and HSE have been very helpful in tidying up the test pieces I have posted and once I have a bit more code up and running, I will give it a serious bashing to make sure it works correctly.
This is the first test piece with the later macros. It cannot be built as there are macros that have not been published yet but it works fine and handles 500 million iterations with no problems so the stack does not go BANG.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
fpinit MACRO
fninit ;; clear FPU registers and flags
fldz ;; zero st(0)
ENDM
; -------------------------------
fpdiv MACRO arg1,arg2
fld arg1
fld arg2
fdivp
ENDM
fpmul MACRO arg1,arg2
fld arg1
fld arg2
fmulp
ENDM
fpadd MACRO arg1
fld arg1
faddp
ENDM
fpsub MACRO arg1
fld arg1
fsubp
ENDM
; -------------------------------
fpsqrt MACRO number,target
fld number
fsqrt
fstp target
ENDM
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL fpval :REAL8
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
mov pbuf, ptr$(buff) ; get buffer pointer
addme = FLT8(1.0) ; statement form
fpinit ; initialise FPU & set st(0) to 0.0
; -----------------------------
mov r11, 100000000 ; 100 million iterations, 500 million macro calls
@@:
fpadd addme ; add value to st(0)
fpadd addme
fpadd addme
fpadd addme
fpadd addme
sub r11, 1
jnz @B
; -----------------------------
fpsub FLT8(1.0) ; function form
fstp fpval ; load st(0) into variable
invoke fptoa,fpval,pbuf ; convert fpval to string
conout pbuf,lf ; display at console
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
comment #
.text:0000000140001016 DBE3 fninit
.text:0000000140001018 D9EE fldz
.text:000000014000101a 49C7C300E1F505 mov r11, 0x5f5e100
.text:0000000140001021
.text:0000000140001021 0x140001021:
.text:0000000140001021 DD0539100000 fld qword ptr [0x140002060]
.text:0000000140001027 DEC1 faddp st(1)
.text:0000000140001029 DD0531100000 fld qword ptr [0x140002060]
.text:000000014000102f DEC1 faddp st(1)
.text:0000000140001031 DD0529100000 fld qword ptr [0x140002060]
.text:0000000140001037 DEC1 faddp st(1)
.text:0000000140001039 DD0521100000 fld qword ptr [0x140002060]
.text:000000014000103f DEC1 faddp st(1)
.text:0000000140001041 DD0519100000 fld qword ptr [0x140002060]
.text:0000000140001047 DEC1 faddp st(1)
.text:0000000140001049 4983EB01 sub r11, 0x1
.text:000000014000104d 75D2 jne 0x140001021
.text:000000014000104d
.text:000000014000104f DD0513100000 fld qword ptr [0x140002068]
.text:0000000140001055 DEE9 fsubp st(1)
.text:0000000140001057 DD5D98 fstp qword ptr [rbp-0x68]
#
Quotefpdiv MACRO arg1,arg2
fld arg1
fld arg2
fdivp
ENDM
How will the assembler know the actual size of arg1 and/or arg2 (i.e. REAL4, REAL8 or REAL10) in order to produce the appropriate code?
Or is it your intention to limit the use to only one specific size? And where would that be specified in clear terms for someone who may know absolutely nothing about floats apart from that it contains a decimal point followed by some decimal digits?
I thought that would be obvious, FLD works on REAL4, REAL8 and REAL10, the macro does not need the size information. The data item size is determined by how the argument is produced. Now given that FP does not support immediate values, an immediate value must be written in the initialised data section to a data variable where you must specify the size.
You can write everything else as LOCAL values, "LOCAL var :REAL10".
The only data size limitation I have at the moment is the C runtime conversions that only handle REAL4 and REAL8 but have no 80 bit support.
Quote from: hutch-- on August 14, 2018, 01:23:41 PM
This is the first test piece with the later macros. It cannot be built as there are macros that have not been published yet but it works fine and handles 500 million iterations with no problems so the stack does not go BANG.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
fpinit MACRO
fninit ;; clear FPU registers and flags
fldz ;; zero st(0)
ENDM
; -------------------------------
fpdiv MACRO arg1,arg2
fld arg1
fld arg2
fdivp
ENDM
fpmul MACRO arg1,arg2
fld arg1
fld arg2
fmulp
ENDM
fpadd MACRO arg1,arg2
fld arg1
fld arg2
faddp
ENDM
fpsub MACRO arg1,arg2
fld arg1
fld arg2
fsubp
ENDM
.fpdiv MACRO arg1
fld arg1
fdivp
ENDM
.fpmul MACRO arg1
fld arg1
fmulp
ENDM
.fpadd MACRO arg1
fld arg1
faddp
ENDM
.fpsub MACRO arg1
fld arg1
fsubp
ENDM
; -------------------------------
fpsqrt MACRO number,target
fld number
fsqrt
fstp target
ENDM
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL fpval :REAL8
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
mov pbuf, ptr$(buff) ; get buffer pointer
addme = FLT8(1.0) ; statement form
;fpinit ; initialise FPU & set st(0) to 0.0
finit ; initialise FPU
; -----------------------------
mov r11, 100000000 ; 100 million iterations, 500 million macro calls
fpadd addme,addme ; st(0)=addme+addme=2*addme
@@:
.fpadd addme
.fpadd addme
.fpadd addme
; .fpadd addme
sub r11, 1
jnz @B
; -----------------------------
.fpsub FLT8(1.0) ; function form
fstp fpval ; load st(0) into variable
invoke fptoa,fpval,pbuf ; convert fpval to string
conout pbuf,lf ; display at console
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
comment #
.text:0000000140001016 DBE3 fninit
.text:0000000140001018 D9EE fldz
.text:000000014000101a 49C7C300E1F505 mov r11, 0x5f5e100
.text:0000000140001021
.text:0000000140001021 0x140001021:
.text:0000000140001021 DD0539100000 fld qword ptr [0x140002060]
.text:0000000140001027 DEC1 faddp st(1)
.text:0000000140001029 DD0531100000 fld qword ptr [0x140002060]
.text:000000014000102f DEC1 faddp st(1)
.text:0000000140001031 DD0529100000 fld qword ptr [0x140002060]
.text:0000000140001037 DEC1 faddp st(1)
.text:0000000140001039 DD0521100000 fld qword ptr [0x140002060]
.text:000000014000103f DEC1 faddp st(1)
.text:0000000140001041 DD0519100000 fld qword ptr [0x140002060]
.text:0000000140001047 DEC1 faddp st(1)
.text:0000000140001049 4983EB01 sub r11, 0x1
.text:000000014000104d 75D2 jne 0x140001021
.text:000000014000104d
.text:000000014000104f DD0513100000 fld qword ptr [0x140002068]
.text:0000000140001055 DEE9 fsubp st(1)
.text:0000000140001057 DD5D98 fstp qword ptr [rbp-0x68]
#
Hi Hutch,
This code works correctly and it is correct. No doubts. But we never need to start with fldz in any case. So we are adding that instruction for nothing.
If we define fpadd and fpsub the same way as fpmul and fpdiv plus .fpadd, .fpsub, .fpmul, .fpdiv of one argument we start with instructions of 2 arguments and we go on with instructions of 1 argument. It is what i did above. You may try to run it. It should work corrctly and the FPU is cleaned at the end.
Thanks Rui, I will give it a blast a bit later. This is the latest one. I think the first one is the better of the two but the commented out one work OK as well. I have found another use for "fldz", if the results of following calculations are turning out wrong, place a fldz before it and if it corrects the following result, the calculation before it needs to be fixed.
fpercent MACRO num,pcnt ;; Get Percentage of number
fld num ;; load the number
fld FLT8(100.0) ;; load the 100 divider
fdivp ;; divide num by 100
fld pcnt ;; load required percentage
fmulp ;; multiple by percentage
ENDM
; fpercent MACRO num,pcnt ;; Get Percentage of number
; fld num ;; load the number
; fld pcnt ;; load required percentage
; fmulp ;; multiple by percentage
; fld FLT8(100.0) ;; load the 100 divider
; fdivp ;; divide num by 100
; ENDM
fpercent seems to be OK, doenst give any problem.
Let me say that when we want to use an integer constant we load it this way
fpconst macro cst
push cst ;100
fild dword ptr [esp] ; load 100 into st(0)
pop eax ; remove from stack
endm
so we may do this also
fld num
fpconst 100
fdivp
fld pcnt
fmulp ; st(0)= (num/100)*pcnt
Another way for fpercent:
fld pnct ; load pnct <<<<<< st(2) >> st(1) >>> removed
fld num ; load num <<<<<< st(1) >>removed
fld FLT8(100.0) ; load constant 100 << st(0) >>removed
fdivp ; st(0)= num/100.0 >>>>>>> st(0) >>> removed
fmulp ; st(0)= (num/100.0)*pcnt >>>>>>>>>>>> st(0)
note: this last code seems to be better because first it loads the 3 factors and then we do the operations. The macros you are written will be efficient code ! :biggrin:
This is the set I will probably go with, they are all testing up OK and are faster than I remembered FP code from long ago. I am in debt to Rui and HSE for their assistance and to Ray with his reference work that has been very useful. I have been eyeing off some SSE for the simple add, sub, mul and div operations and they generally look good.
fpinit MACRO ;; initialise the x87 co-processor
fninit
fldz
ENDM
fpdiv MACRO arg1,arg2 ;; divide arg1 by arg2
fld arg1
fld arg2
fdivp
ENDM
fpmul MACRO arg1,arg2 ;; multiply arg1 and arg2 together
fld arg1
fld arg2
fmulp
ENDM
fpadd MACRO arg ;; add a number
fld arg
faddp
ENDM
fpsub MACRO arg ;; subtract a number
fld arg
fsubp
ENDM
fpsqrt MACRO number ;; square root of number
fld number
fsqrt
ENDM
fpsqrd MACRO number ;; number squared
fld number
fld number
fmulp
ENDM
fpercent MACRO num,pcnt ;; Get Percentage of number
fld num
fld FLT8(100.0)
fdivp
fld pcnt
fmulp
ENDM
; ----------------------------
; assign result to FP variable
; ****************************
; fstp variable_name
; ****************************
Hutch, about fpinit or the problems with fpinit, take a look at this here (reply #1):
http://masm32.com/board/index.php?topic=7352.0 (http://masm32.com/board/index.php?topic=7352.0)
Remember that we may start with fpmul or fpdiv and not with fpadd or fpsub.
If we start with fpinit and then with fpmul we have problems... Repeat some block of code
10 times. Is the result always correct ? Yes it is: the FPU has st(0)=0.0.
About fpsqrd macro we dont need to load the number 2 times but only 1 time:
fld number ; load number
fld st(0) ; get/make a copy of number
fmulp ; st(0)= number ^2
There is a second format of fadd/fsub/fmul/fdiv Dst,Src as noted by Raymond
So it should be
fld number
fmul st,st ; st means st(0)
Rui,
I gather with the example that fld st(0) is faster than fld memory ?
I will look at the other later, I am a bit too tired to write another example.
Quote from: hutch-- on August 15, 2018, 03:13:30 AM
Rui,
I gather with the example that fld st(0) is faster than fld memory ?
I will look at the other later, I am a bit too tired to write another example.
First, it was very nice to work with you :t
It seems to be faster because the number is inside FPU it is not loaded again.
But HSE may test it. He did it in another topic. When he read this i guess he will help us
to know what is the best way.
See you
Good catch Rui, here is the changed macro, will take 1 or 2 arguments.
fpadd MACRO arg1, arg2
fld arg1
IFNB <arg2>
fld arg2
ENDIF
faddp
ENDM
This works fine.
fpadd val1,val2
fpadd val2
fpsub val1
fstp fpval
:biggrin: What a nightmare!
fpinit MACRO ;; initialise the x87 co-processor
fninit
fldz
HutchsoniansFP = 1
ENDM
fpadd MACRO arg1, arg2
fld arg1
IFNB <arg2>
fld arg2
faddp ; this is arg1+arg2
ENDIF
if HutchsoniansFP ; if there is zero in st(0) [ or something else]
faddp ; this (arg1[+arg2]) + original st(0) [now in st(1)]
endif
ENDM
fpclose macro
HutchsoniansFP =0
endm
:biggrin:
Why do I get the impression that this last post was not all that serious ? :P
Quotefld number ; load number
fld st(0) ; get/make a copy of number
fmulp ; st(0)= number ^2
Even more simple, there's not even any need to make a second copy in another FPU register:
fld number ; load number
fmul st,st ; st(0)= number ^2
I still don't like the idea of leaving data in FPU registers with macros. In my opinion, the risk of generating garbage is too high for unaware users. Results should be stored immediately in a memory variable defined by the user in an additional arg.
Quote from: hutch-- on August 15, 2018, 04:18:36 AM
Why do I get the impression that this last post was not all that serious ?
You can change the names :biggrin:
But there is a problem. I will change.
Ray,
The target market for 64 bit MASM is different to the 32 bit version, it is not recommended to beginners at all but folks who already know how to write 32 bit MASM code. The difference with macros is the reference material and its easy enough to specify a "fstp variable" when the data needs to be placed in a variable but the more efficient form without redundant loads and stores is in the direction that many who use legacy code like this want.
I am just about clapped out and ready to sleep but I will have a look at your suggestion when I get up later today.
Quote from: hutch-- on August 15, 2018, 04:18:36 AM
:biggrin:
Why do I get the impression that this last post was not all that serious ? :P
Hutch,
I guess that HSE is kidding with your idea of fpinit. It is not usual that we start the FPU with finit and load 0 to st(0). What happen if we use it and next fpmul and next fstp var ? We exit and the FPU is not cleaned: 0.0 is in st(0). Is only this, there is no other problem, all macros works correctly, it seems.
note: when i have my new i7 i will test all possible cases.
Hi Rui!
Quote from: RuiLoureiro on August 15, 2018, 06:47:23 AM
I guess that HSE is kidding with your idea of fpinit.
Just trying to guess what Hutch is making.
fldz in fpinit is a problem if you don't need it.
There is two types of macros:
1) Don't need a non-empty st(0) and left an additional non-empty st()
fpmul MACRO arg1,arg2 ;; multiply arg1 and arg2 together
fld arg1
fld arg2
fmulp
ENDM
2) Need a non-empty st(0) and don't modify number of non-empty st()
fpadd MACRO arg ;; add a number
fld arg
faddp
ENDM
To make two set of macros is a posible solution, mmm
Meanwhile I'm trying to solve some problems calculating adaptation value of vectors for a Genetic Algorithm, really slow with so many debugging messages. :(
Hi HSE !
There is no problem with you. We may kid with this things. It is fun !
Have a good work :t
The reason why I have used "fldz" is due to testing. when I have a simple test piece that I know works correctly, you then place another calculation before it and test if the second calculation gets the same result. You can simply turn this on and off with "fldz" between them.
calculation
fldz ; effect the following calculation here by commenting in or out
calculation
If the first calculation is sound, the second does not change, if it does change by commenting fldz in or out, then the first calculation has a stack error.
RE: The finit macro, I am yet to see what the problem is using fldz. If I comment out fldz I get incorrect results.
Now as far as the use of fldz, I can't find a test that shows any problem unless its the 2 bytes that it takes up. Here is the code from a 64 bit MASM test piece that tests with the fldz in or out and I can find no difference apart from the 2 bytes which I don't lose any sleep over.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL fvar1 :REAL8
LOCAL fvar2 :REAL8
LOCAL pbuf :QWORD
LOCAL buff[32]:BYTE
LOCAL rslt :REAL8
mov pbuf, ptr$(buff)
fninit
fldz ; 2 bytes
mrm fvar1, FLT8(100.0)
mrm fvar2, FLT8(100.0)
fld fvar2
fmul fvar1
fstp rslt
invoke fptoa,rslt,pbuf ; convert rslt to string
conout pbuf,lf ; display at console
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Quote from: hutch-- on August 15, 2018, 12:30:14 PMI am yet to see what the problem is using fldz. If I comment out fldz I get incorrect results.
Which definitely means there is a problem. Can you post an exe that shows these incorrect results?
It meant that the calculation before it did not clean up the stack, uncomment the fldz and the error in the first did not effect the second calculation.
Hi Hutch!
Quote from: hutch-- on August 15, 2018, 12:30:14 PM
The reason why I have used "fldz" is due to testing
No problem then. But remember that out of test You will have one more register. Like say Raymond, macros hide how many FPU registers were used.
:t
another one
.const
fpercent MACRO num,pcnt ;; Get Percentage of number
.const
F8_100 real8 100.0
.code
fld num ;; load the number
fdiv F8_100
fld pcnt ;; load required percentage
fmulp ;; multiple by percentage
EXITM <st(0)>
ENDM
.code
fld fpercent(FLT8(33.333),FLT8(90.0))
Quote from: hutch-- on August 15, 2018, 05:47:44 PM
It meant that the calculation before it did not clean up the stack, uncomment the fldz and the error in the first did not effect the second calculation.
Hi Hutch,
I understood why you use fldz and there is no problem in using that fpinit. The FPU has 8 registers st(7),st(6),st(5),st(4),st(3),st(2),st(1),st(0). If we use fpinit we may have have 8 registers to work (if we start with fpadd or fpsub with 1 argument ) or we have only
7: [ st(6),st(5),st(4),st(3),st(2),st(1),st(0) ], but we cannot get any error due to that fldz. I would like to know how we get an error if we use the macros correctly and normal arguments. Is this what i think.
Quote from: Siekmanski on August 16, 2018, 12:29:16 AM
another one
.const
fpercent MACRO num,pcnt ;; Get Percentage of number
.const
F8_100 real8 100.0
.code
fld pcnt ;; st(1)
fld num ;; load the number
fdiv F8_100 ;; st(0)= num/100
;fld pcnt ;; load required percentage
fmulp ;; st(0)= multiple by percentage
EXITM <st(0)>
ENDM
.code
fld fpercent(FLT8(33.333),FLT8(90.0))
Or this one
Quote from: RuiLoureiro on August 16, 2018, 12:39:07 AM
Quote from: hutch-- on August 15, 2018, 05:47:44 PM
It meant that the calculation before it did not clean up the stack, uncomment the fldz and the error in the first did not effect the second calculation.
Hi Hutch,
I understood why you use fldz and there is no problem in using that fpinit. The FPU has 8 registers st(7),st(6),st(5),st(4),st(3),st(2),st(1),st(0). If we use fpinit we may have have 8 registers to work (if we start with fpadd or fpsub with 1 argument ) or we have only 7: [ st(6),st(5),st(4),st(3),st(2),st(1),st(0) ], but we cannot get any error due to that fldz. I would like to know how we get an error if we use the macros correctly and normal arguments. Is this what i think.
FDECSTP/FINCSTP, works with MMX on one cpu at least,but maybe something I you can't trust on all cpus?
FPRCP macro x ;reciprocal macro with full precision compared to SSE RCP** instructions
fld1
fdiv x
endm
Marinus, why don't you go FMUL by 0.01 instead?faster, I don't know 64bit,if it works the same way using FIDIV Word 100 smaller?
if you really want smallest floating Point code isnt fpu best choice?,just use st0-7 as much as possible and first code like you using st0,st1 etc for specific use,two:singlestep thru debugger to correct the right values get used and you get in 32bit mode 2byte/fpu opcode compared to 3-4 bytes for SSE etc
Re fldz: It normally does no harm if you don't fully use the fpu. The point here is that ST(0) is not "cleared" by fldz; it is indeed set to zero, and therefore contains a valid number. If subsequently you load 7 more values onto the fpu, then all 8 registers are occupied and valid. Now if you try to fld onemore, it chokes!
Normally you don't load 7 more values, so that is a rare scenario. But you can reach this limit very quickly in a loop, if that loop doesn't respect the rule that at the end of an operation the fpu should return to its initial state.
An elegant way to test this is as follows:
finit ; optional, the OS does it anyway
fldpi
... your code ...
int 3
When you hit the breakpoint, and ST(0) does not contain 3.14159, then you have a bug.
Magnus you are right, totally overlooked the reciprocal technique. :t
And in Rui's order.
fpercent MACRO num,pcnt ;; Get Percentage of number
.const
F8_100 real8 0.01
.code
fld pcnt ;; load required percentage
fld num ;; load the number
fmul F8_100
fmulp ;; multiple by percentage
EXITM <st(0)>
ENDM
.code
fld fpercent(FLT8(33.333),FLT8(90.0))
[/[code]
Quote from: Siekmanski on August 16, 2018, 03:34:18 AM
Magnus you are right, totally overlooked the reciprocal technique. :t
And in Rui's order.
fpercent MACRO num,pcnt ;; Get Percentage of number
.const
F8_100 real8 0.01
.code
fld pcnt ;; load required percentage
fld num ;; load the number
fmul F8_100
fmulp ;; multiple by percentage
EXITM <st(0)>
ENDM
.code
fld fpercent(FLT8(33.333),FLT8(90.0))
Siekmanski, we are close of writing it by all ways of doing it
Dont forget multiply by 0.01 !!!
So we should call fpercent_v1, fpercent_v2, fpercent_v3...
What version is faster ? Is it hard to do ?
Do you want to write a test for it ?
See you
If you use fld fpercent(FLT8(33.333),FLT8(90.0)) a second time, you'll get Error A2056: Symbol already defined: F8_100
There is also the Percent (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1196) macro:
Print Str$("pc(srcdd, ebx)=%f\n", Percent(srcdd, ebx)) ; with DWORD source and reg32 percentage
Print Str$("pc(12345, 20)=%f\n", Percent(12345, 20)) ; with immediate integer source and percentage
This solves the repeat problem, each call gets a unique address.
FLT8 MACRO fpimm
LOCAL vname
.data
align 8
vname REAL8 fpimm
.code
EXITM < vname>
ENDM
It requires a register to load the result into.
deleted
nidud,
What is the string processing in your macro for ? The FP argument is simple enough to understand but I don't recognise the notation for the text you are constructing with it.
deleted
OK, that makes sense, I do it differently, single GLOBAL scope for when I need a global named variable and the FLT4/8/10 for when I need a unique variable attached to a local variable with local scope.
fpercent MACRO num,pcnt ;; Get Percentage of number
local F8rcp_100
.const
F8rcp_100 real8 0.01
.code
fld pcnt ;; load required percentage
fld num ;; load the number
fmul F8rcp_100
fmulp ;; multiple by percentage
EXITM <st(0)>
ENDM
Looks good Marinus, I confess being a maths illiterate that I don't know how reciprocals work.
A reciprocal is 1/value
This is useful if you want to divide by a fixed number, say 100.0
Then the reciprocal is 1.0/100.0 = 0.01
Instead of dividing by 100.0 you multiply with 0.01
Quote from: hutch-- on August 16, 2018, 08:34:39 AM
Looks good Marinus, I confess being a maths illiterate that I don't know how reciprocals work.
:biggrin:
fld
1 ; st(1)=
1.0fld num ; st(0) = num
fdivp ; st(0) = 1.0/num
The reciprocal of a fraction 4/5 is 5/4 ( n/m <-> m/n)
The reciprocal of x^n is x^-n
and the x^-n is x^n.
What is the reciprocal of 1/(1/2) ? is 1/2.
:biggrin:
It just happens to be that my formal logic is much better than my maths, I can in fact do most of the normal things and get reliable results but I don't really have a feel for maths, I just see it as a cipher for crunching numbers. The reciprocals look interesting in that with FP maths you can calculate the fractional sized numbers with a reasonably high degree of precision. In older integer code a mul is a lot faster than a div but I am not sure of the difference with later hardware where you have very fast FP processing units which I would imagine would be used for the old integer code.
I long ago used FP for some simple integer calculations related to the startup size of a window and they have been in the library since about 1998 but have had very little use of FP since then. Laziness meant if I wanted precision maths I had a compiler that routinely did 80 bit floating point calculations.
Quote from: hutch-- on August 16, 2018, 03:24:19 PM
:biggrin:
It just happens to be that my formal logic is much better than my maths, I can in fact do most of the normal things and get reliable results but I don't really have a feel for maths, I just see it as a cipher for crunching numbers. The reciprocals look interesting in that with FP maths you can calculate the fractional sized numbers with a reasonably high degree of precision. In older integer code a mul is a lot faster than a div but I am not sure of the difference with later hardware where you have very fast FP processing units which I would imagine would be used for the old integer code.
I do this because I been influenced with RCPPS/MULPS coding
dont forget the reason you avoid divide by zero error and one FDIV Before innerloop to calculate reciprocal for millions of FMUL's can matter
or combination of one FDIV reciprocal and innerloop that uses lots of SIMD SSE/AVX code
Plain Masm32:include \masm32\include\masm32rt.inc
_1byX macro num
fld1
if type(num) eq DWORD
fidiv num
else
fdiv num
endif
endm
.data
x100 dd 100
result REAL8 ?
.code
start:
_1byX FP8(0.25)
fstp result
printf("1/0.25=%f\n", result)
_1byX x100
fstp result
printf("1/100=%f\n", result)
exit
end start
(if your IDE doesn't know that the console must be kept open to see the result, insert an inkey before the exit)
This is the form for sequential additions. 1 arg macro, manual and 2 arg macro. The versions with fldz is how I would do loop code, for a single addition, the 2 argument macro version has an extra load but 1 less add.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
fpinit MACRO ;; initialise the x87 co-processor
fninit
fldz
ENDM
fpadd MACRO arg1, arg2 ;; add a number
fld arg1
IFNB <arg2>
fld arg2
ENDIF
faddp
ENDM
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL buff[32]:BYTE
LOCAL pbuf :QWORD
LOCAL addval:QWORD ; REAL8 ; either will do as a LOCAL
LOCAL rslt :QWORD ; REAL8
mrm addval, FLT8(111.111) ; get a pseudo immediate
mov pbuf, ptr$(buff) ; get buffer pointer
; -----------------------------
; start macro code
; -----------------------------
fpinit
fpadd addval ; sequential additions
fpadd addval
fpadd addval
fpadd addval
fstp rslt ; store result & pop
invoke fptoa,rslt,pbuf ; convert addval to string
conout pbuf,lf ; display at console
; -----------------------------
; identical manual mnemonic code
; -----------------------------
fldz ; with FLDZ
fld addval
faddp
fld addval
faddp
fld addval
faddp
fld addval
faddp
fstp rslt ; store result & pop
invoke fptoa,rslt,pbuf ; convert addval to string
conout pbuf,lf ; display at console
; -----------------------------
; alternate macro code - no fldz
; -----------------------------
fpadd addval, addval ; sequential additions
fpadd addval
fpadd addval
fstp rslt ; pop stack
invoke fptoa,rslt,pbuf ; convert addval to string
conout pbuf,lf ; display at console
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
:biggrin:
Hi Hutch
seems OK, all things are correct and FPU is cleaned at the end and at the end of each step (or case) :eusa_clap:
fpinit MACRO ;; initialise the x87 co-processor
fninit
fldz
ENDM
The fldz does nothing useful other than wasting a register. No need for a macro, a simple finit does the job (fninit btw doesn't exist, ML.exe encodes it exactly as finit).
Quote from: jj2007 on August 17, 2018, 07:03:11 AM
fpinit MACRO ;; initialise the x87 co-processor
fninit
fldz
ENDM
The fldz does nothing useful other than wasting a register. No need for a macro, a simple finit does the job (fninit btw doesn't exist, ML.exe encodes it exactly as finit).
Hi Jochen,
For me and for you «does nothing» because we dont need to start a sum with 0 or with one register with 0 and add the next argument, but for someone that starts with sum=0 and then he wants to add the next argument in a loop , it does.
To multiply [a b c d] by [A E I M] one by one and sum i start with sum=a.A then i get b and E
and b.E and sum=a.A+b.E etc. inside a loop (sum=st(0))
but the first is not inside the loop. This is the problem. If we dont know how to do the sum with a loop we need to start with sum=0.
It seems to be the problem. So we may have 2 solutions one of them is to start with 0.
note: See my procedures to multiply matrices with FPU any size.
Quote from: RuiLoureiro on August 17, 2018, 07:46:51 AMfor someone that starts with sum=0 and then he wants to add the next argument in a loop , it does.
Yes, that's correct. And for somebody who starts a loop with multiplications, 1 should be in ST(0) at the beginning. But these cases have nothing to do with a macro called "fpinit". If you want to add several numbers in a loop, start with
fldz before the loop but don't use a generic "fpinit" macro.
Quote from: jj2007 on August 17, 2018, 08:05:58 AM
Quote from: RuiLoureiro on August 17, 2018, 07:46:51 AMfor someone that starts with sum=0 and then he wants to add the next argument in a loop , it does.
Yes, that's correct. And for somebody who starts a loop with multiplications, 1 should be in ST(0) at the beginning. But these cases have nothing to do with a macro called "fpinit". If you want to add several numbers in a loop, start with fldz before the loop but don't use a generic "fpinit" macro.
Yes, the best way is to write rules to do sums, multiplications, etc. etc. inside loops ...
:biggrin:
> The fldz does nothing useful other than wasting a register. No need for a macro, a simple finit does the job (fninit btw doesn't exist, ML.exe encodes it exactly as finit).
"fldz" does do something useful, it loads 0.0, I imagine that is why Intel provide the instruction.
> (fninit btw doesn't exist, ML.exe encodes it exactly as finit).
He he, you will have to upgrade to ML64, it produces the correct opcode DBE3h.
.text:0000000140001027 DBE3 fninit <<<< HERE !!!!
.text:0000000140001029 D9EE fldz
.text:000000014000102b DD8568FFFFFF fld qword ptr [rbp-0x98]
.text:0000000140001031 DEC1 faddp st(1)
.text:0000000140001033 DD8568FFFFFF fld qword ptr [rbp-0x98]
.text:0000000140001039 DEC1 faddp st(1)
.text:000000014000103b DD8568FFFFFF fld qword ptr [rbp-0x98]
.text:0000000140001041 DEC1 faddp st(1)
.text:0000000140001043 DD8568FFFFFF fld qword ptr [rbp-0x98]
.text:0000000140001049 DEC1 faddp st(1)
.text:000000014000104b DD9D60FFFFFF fstp qword ptr [rbp-0xa0]
Quote from: hutch-- on August 17, 2018, 09:37:21 AM"fldz" does do something useful, it loads 0.0, I imagine that is why Intel provide the instruction.
Sure, just like all other 40 or so fpu instructions. But with fldz in a fpinit macro, you just block a register, because you don't "initialise" the fpu by setting a register to zero. Any calculation requires that you load ST(0) first, and only in rare cases with zero.
Quote> (fninit btw doesn't exist, ML.exe encodes it exactly as finit).
He he, you will have to upgrade to ML64, it produces the correct opcode DBE3h.
You are right, I had overlooked the
wait (same in ML32, UAsm, AsmC):
9B wait ; finit
DBE3 finit
DBE3 finit ; fninit
Quote(fninit btw doesn't exist, ML.exe encodes it exactly as finit)
That statement is partly right and partly wrong. Let's clear up everything for others unfamiliar with the FPU.
FNINIT
does exist.
And, the FINIT and FNINIT have exactly the same coding (i.e. DBE3h)
except that the FINIT is automatically preceded by the FWAIT code (i.e. 9Bh).
Several other 'similar' FPU instructions are provided to insert a fwait preceding instruction (FNCLEX, FNSAVE, FNSTCW, FNSTENV, FNSTSW) to insure that the FPU has effectively completed its latest instruction before proceeding with the current one.
This is what Intel say about it.
Opcode Instruction 64-Bit Mode Compat/Leg Mode Description
---------------------------------------------------------------------------
9B DB E3 FINIT Valid Valid Initialize FPU after checking for pending
unmasked floating-point exceptions.
---------------------------------------------------------------------------
DB E3 FNINIT Valid Valid Initialize FPU without checking for pending
unmasked floating-point exceptions.
---------------------------------------------------------------------------
Quote from: raymond on August 17, 2018, 10:15:43 AMFNINIT does exist.
Indeed - I had corrected my error above. The disassembler splits
finit as
wait + fninit.
Re use of
fldz, here is an example how an
fsum(array, elements) macro could look like; plain Masm32:
include \masm32\include\masm32rt.inc
fsum MACRO pSrc, elements
Local step, tmp$
step=type(pSrc)
lea eax, pSrc
lea edx, [eax+step*elements-step]
if Type(pSrc) eq QWORD
fild QWORD ptr [eax]
.Repeat
add eax, QWORD
fild QWORD ptr [eax]
fadd
.Until eax>=edx
elseif Type(pSrc) eq DWORD
fild DWORD ptr [eax]
.Repeat
add eax, DWORD
fiadd DWORD ptr [eax]
.Until eax>=edx
elseif Type(pSrc) eq WORD
fild WORD ptr [eax]
.Repeat
add eax, WORD
fiadd WORD ptr [eax]
.Until eax>=edx
elseif Type(pSrc) eq REAL10
fld REAL10 ptr [eax]
.Repeat
add eax, REAL10
fld REAL10 ptr [eax]
fadd
.Until eax>=edx
elseif Type(pSrc) eq REAL8
fld REAL8 ptr [eax]
.Repeat
add eax, REAL8
fadd REAL8 ptr [eax]
.Until eax>=edx
elseif Type(pSrc) eq REAL4
fld REAL4 ptr [eax]
.Repeat
add eax, REAL4
fadd REAL4 ptr [eax]
.Until eax>=edx
endif
ENDM
.data
MyWords dw 25, 18, 23, 17, 9, 2, 6
MyDwords dd 25, 18, 23, 17, 9, 2, 7
MyQwords dq 25, 18, 23, 17, 9, 2, 8
MyReal4s REAL4 25.0, 18.0, 23.0, 17.0, 9.0, 2.0, 9.0
MyReal8s REAL8 25.0, 18.0, 23.0, 17.0, 9.0, 2.0, 10.0
MyReal10s REAL10 25.0, 18.0, 23.0, 17.0, 9.0, 2.0, 11.0
Result dd ?
.code
start:
fsum MyWords, lengthof MyWords ; source, #elements
fistp Result
print str$(Result), " is the sum of WORD integers", 13, 10
fsum MyDwords, lengthof MyDwords ; source, #elements
fistp Result
print str$(Result), " is the sum of DWORD integers", 13, 10
fsum MyQwords, lengthof MyQwords ; source, #elements
fistp Result
print str$(Result), " is the sum of QWORD integers", 13, 10
fsum MyReal4s, lengthof MyReal4s ; source, #elements
fistp Result
print str$(Result), " is the sum of MyReal4s", 13, 10
fsum MyReal8s, lengthof MyReal8s ; source, #elements
fistp Result
print str$(Result), " is the sum of MyReal8s", 13, 10
fsum MyReal10s, lengthof MyReal10s ; source, #elements
fistp Result
inkey str$(Result), " is the sum of MyReal10s", 13, 10
exit
end start
Output:
100 is the sum of WORD integers
101 is the sum of DWORD integers
102 is the sum of QWORD integers
103 is the sum of MyReal4s
104 is the sum of MyReal8s
105 is the sum of MyReal10s
QuoteIndeed - I had corrected my error above.
I know you did Jochen. I saw it before posting. But I wanted to also emphasize for the less literate members that a few other similar instructions existed. However, I forgot to mention that those with the "N" in their name are the ones which get coded
without the preceding fwait instruction.
:biggrin:
fsum MACRO args:VARARG
fldz ;; start at 0
FOR arg,<args>
fld arg
fadd
ENDM
ENDM
Use like this, the repeat of the same number is for testing, you can put whatever you like there as FP values.
mrm num, FLT8(1000.0)
fsum num,num,num,num,num,num,num,num,num,num
fstp rslt
invoke fptoa,rslt,pbuf ; convert fpval to string
conout pbuf,lf ; display at console
Your fsum does something completely different. To avoid the unnecessary fldz step (which uses one fpu reg more than needed), it can be written as follows:
fsumsimple MACRO arg0, args:VARARG
fld arg0
for arg, <args>
fadd arg
endm
endm
Usage:
fsumsimple num, num, num, num, num, num, num, num, num
fstp result
Inkey Str$("Result: %f", result) ; convert result to string, display in console, wait for keypress
Full project attached, for 32- or 64-bit assembly with ML/ML64 or UAsm or AsmC.
:biggrin:
> (which uses one fpu reg more than needed)
What does it matter with sequential additions ?
> fld arg0
You are loading a variable into st(0)
fldz loads zero into st(0)
If you replace "fld arg0" with fldz in your macro, what is the difference apart from fldz being 2 bytes ?
Here is the revised version of your macro which is now a better version that the one I posted.
jjsum MACRO args:VARARG
fldz
for arg, <args>
fadd arg
endm
endm
This is the output.
.text:0000000140001026 D9EE fldz
.text:0000000140001028 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000102e DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001034 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000103a DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001040 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001046 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000104c DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001052 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001058 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000105e DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001064 DD9D50FFFFFF fstp qword ptr [rbp-0xb0]
How is you loading a variable more efficient than using FLDZ ?
Your version:
000000014000108A | D9 EE | fldz |
000000014000108C | DC 05 B2 02 00 00 | fadd qword ptr ds:[140001344] |
0000000140001092 | DC 05 AC 02 00 00 | fadd qword ptr ds:[140001344] |
0000000140001098 | DC 05 A6 02 00 00 | fadd qword ptr ds:[140001344] |
000000014000109E | DC 05 A0 02 00 00 | fadd qword ptr ds:[140001344] |
00000001400010A4 | DD 1D A2 02 00 00 | fstp qword ptr ds:[14000134C] |
My version:
0000000140001126 | DD 05 18 02 00 00 | fld qword ptr ds:[140001344] |
000000014000112C | DC 05 12 02 00 00 | fadd qword ptr ds:[140001344] |
0000000140001132 | DC 05 0C 02 00 00 | fadd qword ptr ds:[140001344] |
0000000140001138 | DC 05 06 02 00 00 | fadd qword ptr ds:[140001344] |
000000014000113E | DD 1D 08 02 00 00 | fstp qword ptr ds:[14000134C] |
:biggrin:
So your total gain is 2 bytes.
.text:0000000140001026 D9EE fldz
.text:0000000140001028 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000102e DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001034 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000103a DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001040 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001046 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000104c DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001052 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001058 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000105e DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001064 DD9D58FFFFFF fstp qword ptr [rbp-0xa8]
.text:000000014000108f DD8550FFFFFF fld qword ptr [rbp-0xb0]
.text:0000000140001095 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000109b DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010a1 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010a7 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010ad DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010b3 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010b9 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010bf DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010c5 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010cb DD9D58FFFFFF fstp qword ptr [rbp-0xa8]
I declare you the winner by 2 bytes. :P
I managed to get a benchmark around both and yours is faster by about the difference in the number of instructions. I may even steal a variation of it. :P
fsincos0 MACRO
fldz
fld1
ENDM
fsincos90 MACRO
fld1
fldz
ENDM
fsincos180 MACRO
fldz
fld1
fchs
ENDM
fsincos45 MACRO
fld sqrtreciprocalof2
fld st0 ;must ask Raymond and other fpu experienced if this is the best way of create two copies of st0,to be in both st0 and st1?
ENDM
fsincos30 MACRO
fld half
fld halfsqrt3
ENDM
fsincos60 MACRO
fld halfsqrt3
fld half
ENDM
;The only difference is cosine Changes sign after 90 degrees
fsincos120 MACRO
fsincos30
fchs
ENDM
fsincos150 MACRO
fchs
ENDM
;you could make more macros after 180degrees sine Changes sign
tan0 MACRO
fldz
ENDM
tan45 MACRO
fld1
ENDM
tan60 MACRO
fld sqrt3
ENDM
tan30 MACRO
fld reciprocalsqrt3
ENDM
first you can initalize most used sqrt constants used in trigo with help of calculate sqrt(2),sqrt(3),sqrt(5),sqrt(6),with help of SSE SQRTPD,or fpu in beginning of program
I know more exact trigo functions,if anyone is interested I can make more macros?(about 150 of them)
useful for producing radian angles from pi(180degrees),example 2pi(360degrees),pi/2 (90 degrees),pi/4 (45 degrees),pi/8 22.5 degrees
or to prepare a float constant before loop thru 256,512,1024 trigo calculations (fptan,fsin,fcos,fsincos),from zero to pi degrees, or zero to 2pi degrees
fscalepi MACRO scale
fld scale
fldpi
fscale
or you want radians to 360degrees,or need any constant for x iterations thru a trigo calculation loop zero to pi(180) degrees
ENDM
fdivpi MACRO x
fldpi
fdiv x
ENDM
Quote from: hutch-- on August 17, 2018, 09:53:29 PMI may even steal a variation of it
I'll be happy if you do that, Hutch, as it's simply the right way to do it :icon14:
:biggrin:
fpsum MACRO arg1, args:VARARG ;; sum a list of numbers
fld arg1
FOR arg, <args> ;; stolen from JJ :)
fadd arg
ENDM
ENDM
:P
Quote from: hutch-- on August 19, 2018, 12:12:05 AM
:biggrin:
fpsum MACRO arg1, args:VARARG ;; sum a list of numbers
fld arg1
FOR arg, <args> ;; stolen from JJ :)
fadd arg
ENDM
ENDM
:P
thats nice macro,wonder if its possible to extend it to a fpmedium macro?dont know if its possible with end it somehow with
fdiv numberofargs ?
Quote from: daydreamer on August 19, 2018, 12:41:26 AMwonder if its possible to extend it to a fpmedium macro?
Sure:
fpaverage MACRO arg1, args:VARARG
Local ct
ct=1
fld arg1
FOR arg, <args>
ct=ct+1
fadd arg
ENDM
push ct
fidiv dword ptr [esp]
add esp, DWORD
ENDM
This is the version I am adding to the macro file.
fpavrg MACRO arg1, args:VARARG ;; average a list of arguments
LOCAL cnt,var
cnt = argcount(args)
cnt = cnt + 1
fld arg1
FOR arg, <args>
fadd arg
ENDM
.data
var dq cnt
.code
fild var
fdivp
ENDM
An option I would prefer for no specific reason: fpavrg MACRO args:VARARG ;; average a list of arguments
LOCAL cnt,var
cnt = 0
FOR arg, <args>
if cnt eq 0
fld arg
else
fadd arg
endif
cnt = cnt + 1
ENDM
.data
var dq cnt
.code
fild var
fdivp
ENDM
Is possible to measure time required for macro preprocessing?
A minute later: perhaps I don't like to make two loops when is possible to make only one.
Quote from: HSE on August 19, 2018, 10:00:51 AMIs possible to measure time required for macro preprocessing?
In theory, yes. In practice, it's pretty irrelevant if your 10,000 lines take 300 or 301 milliseconds to build. A single macro will add nanoseconds only.
Proposal:
fpavrg MACRO arg1, args:VARARG ;; average a list of arguments
LOCAL cnt,var
cnt = 1
fld arg1
FOR arg, <args>
cnt=cnt+1
fadd arg
ENDM
.data
var dw cnt
.code
fild var
fdivp
ENDM
- doesn't depend on the argcount macro
- the trick with var d? cnt is nice, but a WORD is enough; ML64.exe chokes already if you arrive at column 200, i.e. at around 100 arguments if they are all single letters. Snippet for testing:
x REAL4 1.1111
.data?
result REAL8 ?
.code
fpavrg x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x
fstp result
> ML64.exe chokes already if you arrive at column 200
You will find that its the command line limit that stops more arguments. Simple solution if the code design is that bad is to use more than 1 macro. If you are using very large lists you would use a procedure and fed the list to it as an array.
I have never really cared about assembly time, unlike the old compilers, the assemblers are not slow anyway, what I do look at is the generated code which matters.
Quote from: hutch-- on August 19, 2018, 11:47:00 AM
> ML64.exe chokes already if you arrive at column 200
You will find that its the command line limit that stops more arguments.
UAsm stops at roughly column 900, but in any case it's irrelevant, no sane person would squeeze so many arguments into a macro. That's why the WORD is definitely enough.
> That's why the WORD is definitely enough.
Except that you put the .data section out of alignment and for the next arg in the .data section which them needs to be aligned, you don't gain anything.
Another point to consider is that if you're using a lot of these macros across modules/procedures, you might want to save/restore the FPU state.
I always do this with a long sequence of FPU operations and/or when crossing modules/procedures, otherwise strict manual monitoring of registers used.
;)
Quote from: hutch-- on August 19, 2018, 03:19:56 PM
Except that you put the .data section out of alignment
Does that mean you systematically use, in all your code
.DATA
align 16
bla db "oops", 0
?
Quote from: K_F on August 19, 2018, 04:36:54 PM
Another point to consider is that if you're using a lot of these macros across modules/procedures, you might want to save/restore the FPU state.
I always do this with a long sequence of FPU operations and/or when crossing modules/procedures, otherwise strict manual monitoring of registers used.
;)
Do any of the macros above
not restore the FPU state for ST(0)...ST(6)? Please show, because that would imply they are buggy. What is "strict manual monitoring of registers" btw?
> Does that mean you systematically use, in all your code .....
in 64 bit YES.
align 16
item dq 0 ; aligned
item dq 0 ; aligned
witm dw 0 ; aligned by at least 2
item dq 0 ; mis aligned
Magic rule is if you are going to use short .data(?) components as MASM can do
.data
align 16
dataitem dq 0
.code
Ensure you align it.
Quote from: jj2007 on August 19, 2018, 05:42:44 PM
Do any of the macros above not restore the FPU state for ST(0)...ST(6)? Please show, because that would imply they are buggy. What is "strict manual monitoring of registers" btw?
??
Bad morning for you ? :)
As an extreme example, say you have all 7 registers loaded with valid values.. one more load would corrupt the barrel.
This can happen across procedures and/or modules.
From what I see none of the macros have either FSAVE or FRSTOR type instructions - this is not necessary for short sequences, but it is advisable to have a start and end macro that one uses for a batch of FPU operations.
;)
Quote from: hutch-- on August 19, 2018, 08:58:20 PM
> Does that mean you systematically use, in all your code .....
in 64 bit YES.
align 16
item1 dq 0 ; aligned
item2 dq 0 ; MIS-aligned for movaps & friends
witm dw 0 ; aligned by at least 2
item dq 0 ; mis aligned
Data alignment is important, of course. But for a mov
aps xmm0 to fail, you need an OWORD. Your second item is already misaligned, in SIMD speak, but it wouldn't matter because mov
lps xmm0, item2 does
not need 16-byte alignment.
In any case, in my own code I wouldn't add
align 16 before each and every BYTE-sized variable. I do that on a needs basis.
:biggrin:
.avxdata SEGMENT align(64)
avx2a YMMWORD ?
avx2b YMMWORD ?
avx2c YMMWORD ?
avx2d YMMWORD ?
avx2e YMMWORD ?
avx2f YMMWORD ?
avx2g YMMWORD ?
avx2h YMMWORD ?
.avxdata ENDS
Quote from: jj2007 on August 20, 2018, 12:18:13 AM
Quote from: hutch-- on August 19, 2018, 08:58:20 PM
> Does that mean you systematically use, in all your code .....
in 64 bit YES.
align 16
item1 dq 0 ; aligned
item2 dq 0 ; MIS-aligned for movaps & friends
witm dw 0 ; aligned by at least 2
item dq 0 ; mis aligned
Data alignment is important, of course. But for a movaps xmm0 to fail, you need an OWORD. Your second item is already misaligned, in SIMD speak, but it wouldn't matter because movlps xmm0, item2 does not need 16-byte alignment.
In any case, in my own code I wouldn't add align 16 before each and every BYTE-sized variable. I do that on a needs basis.
BYTE-sized variables ? I prefer dword even if the value is only 0 or 1. :P
I never used OWORD it's too long. I prefer dd,dd,dd,dd,... :biggrin:
Funny part is that 64 bit MASM used XMMWORD and YMMWORD. :P
I have a question for Hutch,JJ and other macro experts
cnt = 1
fld arg1
FOR arg, <args>
cnt=cnt+1
fadd arg
these variables in macros, are they restricted to be 32bit integers,or can you make macros which have floating Point math inside them too?
or you need to solve it with having your calculation as fixed Point if you need that?I was thinking of ways of creating LUT at assembly time
Magnus,
In macros, a number when added to code is an immediate integer.
LOCAL num
num = 1
mov rax, num
Quote from: daydreamer on August 21, 2018, 06:01:33 PM
these variables in macros, are they restricted to be 32bit integers,or can you make macros which have floating Point math inside them too?
You can use qWord's "MREAL macros (http://masm32.com/board/index.php?topic=3225.msg33774#msg33774) - when you need floating point arithmetic while assembling!"
Quote from: HSE on August 21, 2018, 08:43:33 PM
Quote from: daydreamer on August 21, 2018, 06:01:33 PM
these variables in macros, are they restricted to be 32bit integers,or can you make macros which have floating Point math inside them too?
You can use qWord's "MREAL macros (http://masm32.com/board/index.php?topic=3225.msg33774#msg33774) - when you need floating point arithmetic while assembling!"
thanks
Works like a charm, but you need a modern assembler, e.g. UAsm or AsmC:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
Init
push 123.4567
fld stack
Print Str$("ST(0)=%f", ST(0)v) ; output: ST(0)=123.4567
EndOfCode
:biggrin:
push 123.4567
"push" is an Intel mnemonic that works on integer data, not floating point. Are you suggesting that the Watcom forks are not Intel compatible ?
Yeah, the Watcom folks are innovative :P
Yeah, they have to be with the quality of code they must process. :P
MASM is for folks who already know how to write assembler.
How does a modern assembler implement this
local TheNoReal:real4
push 123.4567
pop TheNoReal
under the cover? (of darkness)
May be like this
local TheRealDeal:real4
push 042F6E9D5h ;; 123.4567
pop TheRealDeal
I want to write a crossover macro for rotation
I have read a tutorial long ago about rotation and now I read a different tutorial that showed it simpler so I got an idea
X1 = X0 * cos(theta) + Y0 * sin(theta)
Y1 = -X0 * sin(theta) + Y0 * cos(theta)
so you need 4 values:sin alpha,cos alpha,-sin alpha and a second copy of cos alpha for prepare for many MULPS/HADDPS combo rotating many pixels or meshes...or maybe you want to use fpu afterwards so I follow Hutch's rule about leaving them in fpu registers
I dont what name it should have
prerotate alpha MACRO
fld alpha
fmul degtorad ;degtoradconstant=pi/180
fsincos ;st0 cos,st1 sin
fld st1 ;st0 sin,st1 cos,st2 sin
fchs ;st0 -sin,st1 cos,st2 sin
fld st1 ;st0 cos,st1-sin,st2 cos,st3 sin
ENDM
Quote from: Caché GB on August 22, 2018, 02:21:46 PM
How does a modern assembler implement this
local TheNoReal:real4
push 123.4567
pop TheNoReal
under the cover? (of darkness)
Not so difficult. Here is a simulation:
include \masm32\MasmBasic\MasmBasic.inc ; download (http://masm32.com/board/index.php?topic=94.0)
Init
push eax ; create a dword slot
fld FP10(12345.67890) ; load the FPU with the desired value
fstp REAL4 ptr [esp] ; pop the real10 into the dword slot
pop eax ; and get the corresponding integer
Inkey "Use this hex value to push REAL4: ", Hex$(eax), "h"
EndOfCodeOutput: Use this hex value to push REAL4: 4640E6B7h
Btw it's rarely worth the effort. These two snippets do the same, load 123... into ST(0):
push 4640E6B7h
fld REAL4 ptr [esp]
pop edx
fld FP4(12345.67890)
The first one is only one byte shorter, considering that FP4 creates a slot in the .DATA section. Speedwise the latter is faster.
If you are loading data to data, there is no problem using an integer register as the transfer medium so you can move a FP value to another if both are memory operands using a 32 bit register.
.data
fval REAL4 1234.5678
......
.code
LOCAL lval :REAL4
......
mov eax, fval
mov lval, eax
The only place I have needed to do this in 64 bit MASM is in getting the return value from a floating point conversion.