Simple floating point macros.

jj2007 · August 17, 2018, 09:59:29 AM

Quote from: hutch-- on August 17, 2018, 09:37:21 AM"fldz" does do something useful, it loads 0.0, I imagine that is why Intel provide the instruction.

Sure, just like all other 40 or so fpu instructions. But with fldz in a fpinit macro, you just block a register, because you don't "initialise" the fpu by setting a register to zero. Any calculation requires that you load ST(0) first, and only in rare cases with zero.

Quote> (fninit btw doesn't exist, ML.exe encodes it exactly as finit).

He he, you will have to upgrade to ML64, it produces the correct opcode DBE3h.

You are right, I had overlooked the wait (same in ML32, UAsm, AsmC):

Code Select

9B                   wait                           ; finit
DBE3                 finit
DBE3                 finit                          ; fninit

raymond · August 17, 2018, 10:15:43 AM

Quote(fninit btw doesn't exist, ML.exe encodes it exactly as finit)

That statement is partly right and partly wrong. Let's clear up everything for others unfamiliar with the FPU.

FNINIT does exist.
And, the FINIT and FNINIT have exactly the same coding (i.e. DBE3h) except that the FINIT is automatically preceded by the FWAIT code (i.e. 9Bh).

Several other 'similar' FPU instructions are provided to insert a fwait preceding instruction (FNCLEX, FNSAVE, FNSTCW, FNSTENV, FNSTSW) to insure that the FPU has effectively completed its latest instruction before proceeding with the current one.

hutch-- · August 17, 2018, 10:55:18 AM

This is what Intel say about it.

Opcode Instruction 64-Bit Mode Compat/Leg Mode Description
---------------------------------------------------------------------------
9B DB E3 FINIT Valid Valid Initialize FPU after checking for pending
unmasked floating-point exceptions.
---------------------------------------------------------------------------
DB E3 FNINIT Valid Valid Initialize FPU without checking for pending
unmasked floating-point exceptions.
---------------------------------------------------------------------------

jj2007 · August 17, 2018, 11:11:18 AM

Quote from: raymond on August 17, 2018, 10:15:43 AMFNINIT does exist.

Indeed - I had corrected my error above. The disassembler splits finit as wait + fninit.

Re use of fldz, here is an example how an fsum(array, elements) macro could look like; plain Masm32:

Code Select

include \masm32\include\masm32rt.inc

fsum MACRO pSrc, elements
Local step, tmp$
  step=type(pSrc)
  lea eax, pSrc
  lea edx, [eax+step*elements-step]
  if Type(pSrc) eq QWORD
	fild QWORD ptr [eax]
	.Repeat
		add eax, QWORD
		fild QWORD ptr [eax]
		fadd
	.Until eax>=edx
  elseif Type(pSrc) eq DWORD
	fild DWORD ptr [eax]
	.Repeat
		add eax, DWORD
		fiadd DWORD ptr [eax]
	.Until eax>=edx
  elseif Type(pSrc) eq WORD
	fild WORD ptr [eax]
	.Repeat
		add eax, WORD
		fiadd WORD ptr [eax]
	.Until eax>=edx

  elseif Type(pSrc) eq REAL10
	fld REAL10 ptr [eax]
	.Repeat
		add eax, REAL10
		fld REAL10 ptr [eax]
		fadd
	.Until eax>=edx
  elseif Type(pSrc) eq REAL8
	fld REAL8 ptr [eax]
	.Repeat
		add eax, REAL8
		fadd REAL8 ptr [eax]
	.Until eax>=edx
  elseif Type(pSrc) eq REAL4
	fld REAL4 ptr [eax]
	.Repeat
		add eax, REAL4
		fadd REAL4 ptr [eax]
	.Until eax>=edx
  endif
ENDM

.data
MyWords	dw 25, 18, 23, 17, 9, 2, 6
MyDwords	dd 25, 18, 23, 17, 9, 2, 7
MyQwords	dq 25, 18, 23, 17, 9, 2, 8
MyReal4s	REAL4 25.0, 18.0, 23.0, 17.0, 9.0, 2.0, 9.0
MyReal8s	REAL8 25.0, 18.0, 23.0, 17.0, 9.0, 2.0, 10.0
MyReal10s	REAL10 25.0, 18.0, 23.0, 17.0, 9.0, 2.0, 11.0
Result	dd ?

.code
start:

  fsum MyWords, lengthof MyWords		; source, #elements
  fistp Result
  print str$(Result), " is the sum of WORD integers", 13, 10

  fsum MyDwords, lengthof MyDwords		; source, #elements
  fistp Result
  print str$(Result), " is the sum of DWORD integers", 13, 10

  fsum MyQwords, lengthof MyQwords		; source, #elements
  fistp Result
  print str$(Result), " is the sum of QWORD integers", 13, 10

  fsum MyReal4s, lengthof MyReal4s		; source, #elements
  fistp Result
  print str$(Result), " is the sum of MyReal4s", 13, 10

  fsum MyReal8s, lengthof MyReal8s		; source, #elements
  fistp Result
  print str$(Result), " is the sum of MyReal8s", 13, 10

  fsum MyReal10s, lengthof MyReal10s		; source, #elements
  fistp Result
  inkey str$(Result), " is the sum of MyReal10s", 13, 10

  exit

end start

Output:

Code Select

100 is the sum of WORD integers
101 is the sum of DWORD integers
102 is the sum of QWORD integers
103 is the sum of MyReal4s
104 is the sum of MyReal8s
105 is the sum of MyReal10s

raymond · August 17, 2018, 12:25:55 PM

QuoteIndeed - I had corrected my error above.

I know you did Jochen. I saw it before posting. But I wanted to also emphasize for the less literate members that a few other similar instructions existed. However, I forgot to mention that those with the "N" in their name are the ones which get coded without the preceding fwait instruction.

hutch-- · August 17, 2018, 12:56:20 PM

fsum MACRO args:VARARG
fldz ;; start at 0
FOR arg,<args>
fld arg
fadd
ENDM
ENDM

Use like this, the repeat of the same number is for testing, you can put whatever you like there as FP values.

mrm num, FLT8(1000.0)
fsum num,num,num,num,num,num,num,num,num,num
fstp rslt
invoke fptoa,rslt,pbuf ; convert fpval to string
conout pbuf,lf ; display at console

jj2007 · August 17, 2018, 07:21:31 PM

Your fsum does something completely different. To avoid the unnecessary fldz step (which uses one fpu reg more than needed), it can be written as follows:

Code Select

fsumsimple MACRO arg0, args:VARARG
  fld arg0
  for arg, <args>
	fadd arg
  endm
endm

Usage:

fsumsimple num, num, num, num, num, num, num, num, num
fstp result
Inkey Str$("Result: %f", result) ; convert result to string, display in console, wait for keypress

Full project attached, for 32- or 64-bit assembly with ML/ML64 or UAsm or AsmC.

hutch-- · August 17, 2018, 08:01:30 PM

> (which uses one fpu reg more than needed)

What does it matter with sequential additions ?

> fld arg0

You are loading a variable into st(0)
fldz loads zero into st(0)

If you replace "fld arg0" with fldz in your macro, what is the difference apart from fldz being 2 bytes ?

Here is the revised version of your macro which is now a better version that the one I posted.

jjsum MACRO args:VARARG
fldz
for arg, <args>
fadd arg
endm
endm

This is the output.

.text:0000000140001026 D9EE fldz
.text:0000000140001028 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000102e DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001034 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000103a DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001040 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001046 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000104c DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001052 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001058 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000105e DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001064 DD9D50FFFFFF fstp qword ptr [rbp-0xb0]

How is you loading a variable more efficient than using FLDZ ?

jj2007 · August 17, 2018, 08:58:51 PM

Your version:

Code Select

000000014000108A   | D9 EE                            | fldz                                    |
000000014000108C   | DC 05 B2 02 00 00                | fadd qword ptr ds:[140001344]           |
0000000140001092   | DC 05 AC 02 00 00                | fadd qword ptr ds:[140001344]           |
0000000140001098   | DC 05 A6 02 00 00                | fadd qword ptr ds:[140001344]           |
000000014000109E   | DC 05 A0 02 00 00                | fadd qword ptr ds:[140001344]           |
00000001400010A4   | DD 1D A2 02 00 00                | fstp qword ptr ds:[14000134C]           |

My version:

Code Select

0000000140001126   | DD 05 18 02 00 00                | fld qword ptr ds:[140001344]            |
000000014000112C   | DC 05 12 02 00 00                | fadd qword ptr ds:[140001344]           |
0000000140001132   | DC 05 0C 02 00 00                | fadd qword ptr ds:[140001344]           |
0000000140001138   | DC 05 06 02 00 00                | fadd qword ptr ds:[140001344]           |
000000014000113E   | DD 1D 08 02 00 00                | fstp qword ptr ds:[14000134C]           |

hutch-- · August 17, 2018, 09:53:29 PM

So your total gain is 2 bytes.

.text:0000000140001026 D9EE fldz
.text:0000000140001028 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000102e DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001034 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000103a DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001040 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001046 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000104c DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001052 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001058 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000105e DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:0000000140001064 DD9D58FFFFFF fstp qword ptr [rbp-0xa8]

.text:000000014000108f DD8550FFFFFF fld qword ptr [rbp-0xb0]
.text:0000000140001095 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:000000014000109b DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010a1 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010a7 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010ad DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010b3 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010b9 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010bf DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010c5 DC8550FFFFFF fadd qword ptr [rbp-0xb0]
.text:00000001400010cb DD9D58FFFFFF fstp qword ptr [rbp-0xa8]

I declare you the winner by 2 bytes. :P

I managed to get a benchmark around both and yours is faster by about the difference in the number of instructions. I may even steal a variation of it. :P

daydreamer · August 18, 2018, 08:56:11 PM

fsincos0 MACRO
fldz
fld1
ENDM
fsincos90 MACRO
fld1
fldz
ENDM
fsincos180 MACRO
fldz
fld1
fchs
ENDM
fsincos45 MACRO
fld sqrtreciprocalof2
fld st0 ;must ask Raymond and other fpu experienced if this is the best way of create two copies of st0,to be in both st0 and st1?
ENDM
fsincos30 MACRO
fld half
fld halfsqrt3
ENDM
fsincos60 MACRO
fld halfsqrt3
fld half
ENDM
;The only difference is cosine Changes sign after 90 degrees
fsincos120 MACRO
fsincos30
fchs
ENDM
fsincos150 MACRO
fchs
ENDM
;you could make more macros after 180degrees sine Changes sign
tan0 MACRO
fldz
ENDM
tan45 MACRO
fld1
ENDM
tan60 MACRO
fld sqrt3
ENDM
tan30 MACRO
fld reciprocalsqrt3
ENDM

first you can initalize most used sqrt constants used in trigo with help of calculate sqrt(2),sqrt(3),sqrt(5),sqrt(6),with help of SSE SQRTPD,or fpu in beginning of program

I know more exact trigo functions,if anyone is interested I can make more macros?(about 150 of them)

useful for producing radian angles from pi(180degrees),example 2pi(360degrees),pi/2 (90 degrees),pi/4 (45 degrees),pi/8 22.5 degrees
or to prepare a float constant before loop thru 256,512,1024 trigo calculations (fptan,fsin,fcos,fsincos),from zero to pi degrees, or zero to 2pi degrees

fscalepi MACRO scale
fld scale
fldpi
fscale
or you want radians to 360degrees,or need any constant for x iterations thru a trigo calculation loop zero to pi(180) degrees
ENDM
fdivpi MACRO x
fldpi
fdiv x
ENDM

jj2007 · August 18, 2018, 10:10:49 PM

Quote from: hutch-- on August 17, 2018, 09:53:29 PMI may even steal a variation of it

I'll be happy if you do that, Hutch, as it's simply the right way to do it :icon14:

hutch-- · August 19, 2018, 12:12:05 AM

fpsum MACRO arg1, args:VARARG ;; sum a list of numbers
fld arg1
FOR arg, <args> ;; stolen from JJ :)
fadd arg
ENDM
ENDM

:P

daydreamer · August 19, 2018, 12:41:26 AM

Quote from: hutch-- on August 19, 2018, 12:12:05 AM

fpsum MACRO arg1, args:VARARG ;; sum a list of numbers
fld arg1
FOR arg, <args> ;; stolen from JJ :)
fadd arg
ENDM
ENDM

:P

thats nice macro,wonder if its possible to extend it to a fpmedium macro?dont know if its possible with end it somehow with
fdiv numberofargs ?

jj2007 · August 19, 2018, 03:39:42 AM

Quote from: daydreamer on August 19, 2018, 12:41:26 AMwonder if its possible to extend it to a fpmedium macro?

Sure:

Code Select

fpaverage MACRO arg1, args:VARARG
Local ct
  ct=1
  fld arg1
  FOR arg, <args>
    ct=ct+1
    fadd arg
  ENDM
  push ct
  fidiv dword ptr [esp]
  add esp, DWORD
ENDM

The MASM Forum

News:

Simple floating point macros.

jj2007

raymond

hutch--

jj2007

raymond

hutch--

jj2007

hutch--

jj2007

hutch--

daydreamer

jj2007

hutch--

daydreamer

jj2007