Author Topic: Large integers and floats  (Read 10573 times)

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #15 on: July 04, 2018, 09:12:24 PM »
so wouldnt this be possible to use for faster computation of PI with loads of decimals,compared to have some almost unlimited Array of numbers other ways?

Yes, it allows computation directly which simplify declarations. The internal calculation must be higher than max storage (REAL10 in MASM) so this is now extended from 96-bit to 128-bit to handle REAL16.

The values based on PI may thus be declared using macros as above or directly:
Code: [Select]
PI    equ 3.141592653589793238462643383279502884197169399375105820974945
...
PI_2  equ PI / 2.0
PI_4  equ PI / 4.0

Right taylor series,but i have todo research first,maybe arcsine where sine pi is
Also Real16 sine would be useful or you already made that nidud?

The assembler allows usage of Add(+), Subtract(-), Multiply(*), and Divide(/) directly which uses REAL16 internally, so it should be possible to construct static functions using macros based on this.

I'm currently implementing a DirectXMath library for testing out the vectorcall implementation which includes passing of arguments, assigning values and return codes
using vectors, so this may clarify how REAL16 values may be used.

The only compiler using REAL16 internally (as far as I know) is the Intel Fortran Compiler in addition to a few open source libraries, so this is more about how Asmc handles floats internally than any practical use, at least for now. This basically simplify the xmm versus pointers/regs using INVVOKE given all REALx values is passed in xmm registers.
Code: [Select]
VECTOR TYPEDEF REAL16

daydreamer

  • Member
  • *****
  • Posts: 1384
  • building nextdoor
Re: Large integers and floats
« Reply #16 on: July 05, 2018, 02:11:04 AM »
I have looked at taylor series for trigo
I tested with arcsin with a scientific calculator and I think 2xarcsin (1.0) would work for pi calculation
Sine is a series that include x^2/3! Changing every time sign and increasing x^3/ while increasing 4!,5! Etc
The x! Increases very fast to high integer numbers ,i think 10!,is the number of combinations you can put numbers 1-10 in
I hope raymond or any math skilled would like to explain best way to calculate this function ???
I wonder how many calculations in sine taylor series is needed for real16 precision

Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

daydreamer

  • Member
  • *****
  • Posts: 1384
  • building nextdoor
Re: Large integers and floats
« Reply #17 on: July 06, 2018, 03:52:57 AM »
I think sine Taylor series calculation,should use constants 1/(3!),1/(5!),1/(7!),1/(9!)... Until it's enough for real16 precision and many divisions will be replaced by faster mul and cache previous x*x*x calculations to be used in x^5,x^7,x^9... Calculations, maybe possible to write parallel code for simultaneously multiply 4 or more of the above constants to take advantage of cpu that can handle execute many mulpd in parallel
I am actually working on coding sine function to get it right first and compare to built in sine function,so I can compare precision vs speed

Now i know what a real16 is:
Real10 is a supermodell woman,10 on 1-10 scale
Real16 is the wife of a man who says "my wife is the most beautiful woman in the World" :biggrin:
« Last Edit: July 07, 2018, 01:53:15 AM by daydreamer »
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #18 on: September 14, 2018, 11:37:01 PM »
Added REAL2 binary16 support.

http://masm32.com/board/index.php?topic=7394.msg81065#msg81065

This works for declarations of data as used by REAL8/10/16 but invoke as vector is not implemented yet. Arguments is loaded in 16-bit registers but this may change. The following logic applies to size:

    mov ax,1.0          ; 3C00
    mov ax,-65504.0     ; FBFF  Min
    mov ax,65504.0      ; 7BFF  Max
    mov ax,0.0          ; 0000  zero
    mov ax,-0.0         ; 8000 -zero
    mov ax,1.0/0.0      ; 7C00  Inf  1/zero
    mov ax,-1.0/0.0     ; FC00 -Inf  -1/zero
    mov ax,0.0/0.0      ; FFFF  NaN  zero/zero
    mov ax,0.3333333333 ; 3555  1/3

raymond

  • Member
  • **
  • Posts: 244
    • Raymond's page
Re: Large integers and floats
« Reply #19 on: September 15, 2018, 04:27:37 AM »
I hope raymond or any math skilled would like to explain best way to calculate this function ???
I wonder how many calculations in sine taylor series is needed for real16 precision

As for the initial question, the first thing to do is to reduce the input angle to the 1st quadrant and adjust the result later for angles in the other three quadrants. Furthermore, in order to minimize computations, the sine (or cosine) function should only be done for angles in the 0-45 deg range. For angles in the 45-90 deg range, the cosine (or sine) function should be used with the complimentary angle.

As for the latter question, the accuracy of a computation depends on the accuracy of the least accurate of the elements used for the computation. Assuming the 128-bit real16 is made up similarly to a real10 (i.e. 1 bit for the sign, 15 bits for the biased exponent and 112 bits for the mantissa), its equivalent accuracy in base 10 would be 33 digits. The last term expected to be of significance in the series would thus have a factorial with 33 digits (i.e. 31!). With factorials going up by 2 with each term of the series, 15 terms would be required for full accuracy (again assuming the source angle would also have such accuracy).
Whenever you assume something, you risk being wrong half the time.
http://www.ray.masmcode.com/

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #20 on: September 15, 2018, 09:23:22 PM »
Added INVOKE(REAL2) as float for syscall, vectorcall and fastcall.

The definition of HALF and test case:

DirectXPackedVector.inc

 - XMConvertHalfToFloat(HALF)
 - XMConvertFloatToHalf(float)
 * PackedVector.s

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #21 on: September 16, 2018, 12:38:56 AM »
Added the streamed versions of the conversion.
Code: [Select]
include DirectXPackedVector.inc

    .code

XMConvertHalfToFloatStream proc uses rsi rdi rbx pOutputStream:ptr float, OutputStride:size_t,
        pInputStream:ptr HALF, InputStride:size_t, HalfCount:size_t

    .assert(pOutputStream)
    .assert(pInputStream)
    .assert(InputStride >= sizeof(HALF))
    .assert(OutputStride >= sizeof(float))

    .for (rsi = r8, rdi = rcx, ebx = 0: rbx < HalfCount: ebx++)

        XMConvertHalfToFloat([rsi])
        mov [rdi],eax
        add rsi,InputStride
        add rdi,OutputStride
    .endf

    mov rax,pOutputStream
    ret

XMConvertHalfToFloatStream endp

    end
So, mostly about storage.

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #22 on: September 19, 2018, 10:17:03 PM »
Added support for invoke(_int128) in syscall. OWORD will span registers and REAL16 (and floats) will use xmm.

    __int128    typedef oword
    __float128  typedef real16

Three 128-bit register arguments may be used:

p1  proc syscall a1:__int128, a2:__int128, a3:__int128

    mov rax,a1 ; RDI: high64 in RSI
    mov rax,a2 ; RDX: high64 in RCX
    mov rax,a3 ; R8:  high64 in R9
    ret
p1  endp


In addition to six 128-bit xmm arguments:
p2  proc syscall a1:__int128, a2:__int128, a3:__int128, a4:__float128, a5:__float128, a6:__float128

    p1(a1, a2, a3) ; no params set
    movaps xmm0,a4 ; xmm0
    movaps xmm0,a5 ; xmm1
    movaps xmm0,a6 ; xmm2
    ret
p2  endp


Spanning of args, const and memory. Negative values extends to HIGH64 if sign are used and value not zero:

    .data
    x __int128 0
...
    p1( 0, 1, 2 )
*    mov r8d, 2
*    xor r9d, r9d
*    mov edx, 1
*    xor ecx, ecx
*    xor edi, edi
*    xor esi, esi
    p1( 0x0000000000000000FFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF0000000000000000,
        0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF )
*    mov r8, low64 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
*    mov r9, high64 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
*    xor edx, edx
*    mov rcx, high64 0xFFFFFFFFFFFFFFFF0000000000000000
*    mov rdi, low64 0x0000000000000000FFFFFFFFFFFFFFFF
*    xor esi, esi
    p1( -0, -1, x )
*    mov r8, qword ptr x
*    mov r9, qword ptr x[8]
*    mov rdx, low64 -1
*    mov rcx, -1
*    xor edi, edi
*    xor esi, esi

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #23 on: September 24, 2018, 06:48:47 AM »
Added support for double::register arguments in 64-bit.

syscall:

p1  proto syscall :oword, :oword, :oword

p2  proc syscall a1:oword, a2:oword, a3:oword
    p1(rsi::rdi, rcx::rdx, r9::r8)  ; no params set
    p1(rsi::rax, rcx::rdx, r9::r8)  ; rdi
    p1(rax::rax, rax::rax, rax::rax); all
    ret
p2  endp

   0:   55                      push   rbp
   1:   48 8b ec                mov    rbp,rsp
   4:   e8 21 00 00 00          call   0x2a
   9:   48 8b f8                mov    rdi,rax
   c:   e8 19 00 00 00          call   0x2a
  11:   4c 8b c0                mov    r8,rax
  14:   4c 8b c8                mov    r9,rax
  17:   48 8b d0                mov    rdx,rax
  1a:   48 8b c8                mov    rcx,rax
  1d:   48 8b f8                mov    rdi,rax
  20:   48 8b f0                mov    rsi,rax
  23:   e8 02 00 00 00          call   0x2a
  28:   c9                      leave 
  29:   c3                      ret   

fastcall:

p1  proto a1:dword
p2  proto a1:oword, a2:oword

p3  proc 1:oword, a2:oword
    p2(rdx::rcx, r9::r8)    ; no params set
    p2(p1(0), r9::r8)       ; rcx
    p2(rax::rcx, r9::r8)    ; rdx
    p2(rax::rax, r11::r10)  ; all
    ret
p3  endp

   0:   55                      push   rbp
   1:   48 8b ec                mov    rbp,rsp
   4:   48 83 ec 20             sub    rsp,0x20
   8:   e8 34 00 00 00          call   0x41
   d:   33 c9                   xor    ecx,ecx
   f:   e8 23 00 00 00          call   0x37
  14:   48 8b c8                mov    rcx,rax
  17:   e8 25 00 00 00          call   0x41
  1c:   48 8b d0                mov    rdx,rax
  1f:   e8 1d 00 00 00          call   0x41
  24:   48 8b c8                mov    rcx,rax
  27:   48 8b d0                mov    rdx,rax
  2a:   4d 8b c2                mov    r8,r10
  2d:   4d 8b cb                mov    r9,r11
  30:   e8 0c 00 00 00          call   0x41
  35:   c9                      leave 
  36:   c3                      ret   

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #24 on: February 05, 2020, 04:47:19 AM »
Added support for YWORD and ZWORD parameters to INVOKE. This currently only handles registers for 64-bit FAST, VECTOR and SYS-CALL.

    y4 proto fastcall   :yword, :yword, :yword, :yword
    y6 proto vectorcall :yword, :yword, :yword, :yword, :yword, :yword
    y8 proto syscall    :yword, :yword, :yword, :yword, :yword, :yword, :yword, :yword

    z4 proto fastcall   :zword, :zword, :zword, :zword
    z6 proto vectorcall :zword, :zword, :zword, :zword, :zword, :zword
    z8 proto syscall    :zword, :zword, :zword, :zword, :zword, :zword, :zword, :zword

Note that the MS versions have fixed positions:

    y1 proto vectorcall :ptr, :yword
    s1 proto syscall :ptr, :yword

    y1(rcx, ymm1)
    s1(rdi, ymm0)

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #25 on: March 04, 2020, 01:21:09 AM »
So this needs an extended call stack for YWORD and ZWORD to be able to save the extended registers. It also means the largest sized argument dictates step-size for the stack.

    option win64:save

p4 proc a:ptr, b:ptr, c:ptr, d:ptr

    mov [rsp+0x20], r9
    mov [rsp+0x18], r8
    mov [rsp+0x10], rdx
    mov [rsp+0x08], rcx


    add a,b [rbp+0x10] + [rbp+0x18]
    add c,d [rbp+0x20] + [rbp+0x28]
    ret
p4 endp
call_p4 proc
    sub rsp, 4*8
    p4(rcx, rdx, r8, r9)
    ret
call_p4 endp

x4 proc a:real16, b:real16, c:real16, d:real16

    movaps [rsp+0x38], xmm3
    movaps [rsp+0x28], xmm2
    movaps [rsp+0x18], xmm1
    movaps [rsp+0x08], xmm0


    addps xmm4,a [rbp+0x10]
    addps xmm4,b [rbp+0x20]
    addps xmm4,c [rbp+0x30]
    addps xmm4,d [rbp+0x40]
    ret
x4 endp
call_x4 proc
    sub rsp, 4*16
    x4(xmm0, xmm1, xmm2, xmm3)
    ret
call_x4 endp

The stack will be aligned 16 so anything above is unknown.

y4 proc a:yword, b:yword, c:yword, d:yword

    vmovups [rsp+0x68], ymm3
    vmovups [rsp+0x48], ymm2
    vmovups [rsp+0x28], ymm1
    vmovups [rsp+0x08], ymm0

...
z4 proc a:zword, b:zword, c:zword, d:zword

    vmovups [rsp+0xC8], zmm3
    vmovups [rsp+0x88], zmm2
    vmovups [rsp+0x48], zmm1
    vmovups [rsp+0x08], zmm0

...
mix proc a:zword, b:dword, c:byte

    mov [rsp+0x88], r8
    mov [rsp+0x48], rdx
    vmovups [rsp+0x08], zmm0


    vmovups zmm1,a  [rbp+0x10]
    mov eax,b       [rbp+0x50]
    mov al,c        [rbp+0x90]
    ret
mix endp
call_mix proc
    sub rsp, 4*64
    mix(zmm0, edx, r8b)
    ret
call_mix endp

Note that OWORD will currently be saved as a vector in VECTORCALL but as a single register in FASTCALL.

fo proc a:oword

    mov [rsp+8], rcx

    addps xmm4,a ; [rbp+0x10]
    ret

fo endp

vo proc vectorcall a:oword

    movaps [rsp+8], xmm0

    addps xmm4,a ; [rbp+0x10]
    ret

vo endp

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #26 on: March 24, 2020, 04:03:01 AM »
Some changes made to immediate float values.

As arguments to INVOKE, real10 and real16 will be converted to data.

foo proto :real4, :real8, :real10, :real16

    foo(1.0, 1.0, 1.0, 1.0)

        movaps  xmm3, xmmword ptr [F0000]
        movaps  xmm2, xmmword ptr [F0001]
        mov     rax, 3FF0000000000000H   
        movq    xmm1, rax               
        mov     eax, 1065353216         
        movd    xmm0, eax               
        call    foo                     

Real number designation defines the size of the actual number and target size is handled later. This means that these are all valid assignments:

    real4 3C00r
    real4 3F800000r 
    real4 3FF0000000000000r 
    real4 3FFF8000000000000000r 
    real4 3FFF0000000000000000000000000000r 

The .return directive also follows this logic and return a float size based on the length of the number.

    .return 3C00r
    .return 3F800000r
    .return 3FF0000000000000r
    .return 3FFF8000000000000000r
    .return 3FFF0000000000000000000000000000r

A zero in front will also work: 0FFFF0...
In addition to this a type(...) may be used to size up the returned value:

    .return real2  ( 1.0 )
    .return real4  ( 1.0 )
    .return real8  ( 1.0 )
    .return real10 ( 1.0 )
    .return real16 ( 1.0 )

Erroneous numbers are also allowed here:

    .return real4  ( 1.0 / 0,0 ) ; nan
    .return real8  (-1.0 / 0.0 ) ; -nan

The first number defines the type so this will return a float (real4):

    .return 3F800000r * 2.0 / 3.0

XMM registers may now be used as VARARG parameters. They will be loaded in 64-bit registers. Immediate float values is currently assumed to be real8's.

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #27 on: March 24, 2020, 06:08:55 AM »
Some changes made to inline functions and templates.

An inline function will not generate any stack frame unless needed. This in case the procedure do not allocate any stack.

stack   proto :byte, :byte, :byte, :byte, :byte { exitm<> }
nostack proto :byte, :byte, :byte, :byte { exitm<> }

Operators added to the class structure follows a strict naming logic which is difficult to figure out by the assembler given you will need the proto type in order to do something meaningful in an expression.

A simplified template based on types will enable parsing of such expressions. The logic is to assign *this to the logical vector (AL to ZMM0) based on the type used.

    float typedef real4

.template float vectorcall

    .operator = :float {
        movss   this,_1
        retm    <this>
        }
    .operator + :float {
        addss   this,_1
        retm    <this>
        }
    .operator - :float {
        subss   this,_1
        retm    <this>
        }
    .operator / :float {
        divss   this,_1
        retm    <this>
        }
    .operator * :float {
        mulss   this,_1
        retm    <this>
        }
    .operator == :float {
        comiss  this,_1
        retm    <this>
        }
    .operator ++ { exitm<float::add(this, 1.0)> }
    .operator -- { exitm<float::sub(this, 1.0)> }
    .ends

There is no decoration here so the name is add.

    float :: add ( xmm0, 1.0 )

The operator call skips *this.

    float :: + ( 1.0 )

And the parsing is recursive.

    float :: = ( xmm1 ) + ( a ) / ( b ) * ( c ) == ( 3.0 )

        movss   xmm0, xmm1         
        movd    xmm1, dword ptr [a]
        addss   xmm0, xmm1         
        movd    xmm1, dword ptr [_b]
        divss   xmm0, xmm1         
        movd    xmm1, dword ptr [c]
        mulss   xmm0, xmm1     
        mov     eax, 1077936128
        movd    xmm1, eax
        comiss  xmm0, xmm1

Using absolute values (:ABS) instead of float:

        movss   xmm0, xmm1         
        addss   xmm0, dword ptr [a]
        divss   xmm0, dword ptr [_b]
        mulss   xmm0, dword ptr [c]
        mov     eax, 1077936128
        movd    xmm1, eax
        comiss  xmm0, xmm1

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #28 on: May 09, 2020, 08:28:25 PM »
The CreateFloat() function is now used for immediate float assignment for REAL4 and REAL8. This is used by INVOKE mainly for REAL10 and REAL16 but noe extend to direct use. The use of AX is then removed from MOVD/Q and MOVSS/D. This needed RAX for 64-bit values. These are currently the instructions added based on library usage but more may be added later.

    movq    xmm0,3FF0000000000000r
    movsd   xmm0,1.0
    addsd   xmm0,1.0
    subsd   xmm0,1.0
    mulsd   xmm0,1.0
    divsd   xmm0,1.0
    comisd  xmm0,1.0
    ucomisd xmm0,4.0 / 2.0 - 1.0

    movd    xmm0,3F800000r
    movss   xmm0,1.0
    addss   xmm0,1.0
    subss   xmm0,1.0
    mulss   xmm0,1.0
    divss   xmm0,1.0
    comiss  xmm0,1.0
    ucomiss xmm0,4.0 / 2.0 - 1.0

Code generated:

        movq    xmm0, qword ptr [F0000]
        movsd   xmm0, qword ptr [F0000]
        addsd   xmm0, qword ptr [F0000]
        subsd   xmm0, qword ptr [F0000]
        mulsd   xmm0, qword ptr [F0000]
        divsd   xmm0, qword ptr [F0000]
        comisd  xmm0, qword ptr [F0000]
        ucomisd xmm0, qword ptr [F0000]
        movd    xmm0, dword ptr [F0001]
        movss   xmm0, dword ptr [F0001]
        addss   xmm0, dword ptr [F0001]
        subss   xmm0, dword ptr [F0001]
        mulss   xmm0, dword ptr [F0001]
        divss   xmm0, dword ptr [F0001]
        comiss  xmm0, dword ptr [F0001]
        ucomiss xmm0, dword ptr [F0001]

        .data

F0000   label qword
        dq 3FF0000000000000H ; 0000 _ 1.0

F0001   dd 3F800000H         ; 0008 _ 1.0

nidud

  • Member
  • *****
  • Posts: 1991
    • https://github.com/nidud/asmc
Re: Large integers and floats
« Reply #29 on: May 26, 2020, 06:28:56 AM »
Added a vector(16) array to this. Applies to XMM registers so anything from bytes to oword, real2 to real16 is acceptable as input.

Example.

    ; assign vector(16)

    movaps xmm0,{ 1.0 }
    movapd xmm0,{ 1.0, 2.0 }
    movaps xmm0,{ 1.0, 2.0, 3.0, 4.0 }
    movaps xmm0,{ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 }

    movdqa xmm0,{ 1 }
    movapd xmm0,{ 1, 2 }
    movups xmm0,{ 1, 2, 3, 4 }
    movaps xmm0,{ 1, 2, 3, 4, 5, 6, 7, 8 }
    movupd xmm0,{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }

    ; use vector(16)

    divpd xmm0,{ 1.0 }
    addpd xmm0,{ 1.0, 2.0 }
    xorpd xmm0,{ 1.0, 2.0, 3.0, 4.0 }
    mulpd xmm0,{ 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 }

    divps xmm0,{ 1 }
    addps xmm0,{ 1, 2 }
    xorps xmm0,{ 1, 2, 3, 4 }
    mulps xmm0,{ 1, 2, 3, 4, 5, 6, 7, 8 }
    subps xmm0,{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }

    ; return vector(16)

    .return { 1.0 }
    .return { 1.0, 2.0 }
    .return { 1.0, 2.0, 3.0, 4.0 }
    .return { 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0 }

    .return { 1 }
    .return { 1, 2 }
    .return { 1, 2, 3, 4 }
    .return { 1, 2, 3, 4, 5, 6, 7, 8 }
    .return { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }

        movaps  xmm0, xmmword ptr [F0002]
        movapd  xmm0, xmmword ptr [F0003]
        movaps  xmm0, xmmword ptr [F0004]
        movaps  xmm0, xmmword ptr [F0005]
        movdqa  xmm0, xmmword ptr [F0006]
        movapd  xmm0, xmmword ptr [F0007]
        movups  xmm0, xmmword ptr [F0008]
        movaps  xmm0, xmmword ptr [F0009]
        movupd  xmm0, xmmword ptr [F000A]
        divpd   xmm0, xmmword ptr [F0002]
        addpd   xmm0, xmmword ptr [F0003]
        xorpd   xmm0, xmmword ptr [F0004]
        mulpd   xmm0, xmmword ptr [F0005]
        divps   xmm0, xmmword ptr [F0006]
        addps   xmm0, xmmword ptr [F0007]
        xorps   xmm0, xmmword ptr [F0008]
        mulps   xmm0, xmmword ptr [F0009]
        subps   xmm0, xmmword ptr [F000A]
        movaps  xmm0, xmmword ptr [F0002]
        jmp     ?_001
        movaps  xmm0, xmmword ptr [F0003]
        jmp     ?_001                   
        movaps  xmm0, xmmword ptr [F0004]
        jmp     ?_001                   
        movaps  xmm0, xmmword ptr [F0005] 
        jmp     ?_001                     
        movaps  xmm0, xmmword ptr [F0006] 
        jmp     ?_001                     
        movaps  xmm0, xmmword ptr [F0007] 
        jmp     ?_001                     
        movaps  xmm0, xmmword ptr [F0008] 
        jmp     ?_001                     
        movaps  xmm0, xmmword ptr [F0009] 
        jmp     ?_001                     
        movaps  xmm0, xmmword ptr [F000A] 
        jmp     ?_001     
...
F0000   label qword
        dq 4000000000000000H 
F0001   dd 3F800000H, 00000000H
ALIGN   16
F0002   label xmmword
        dq 0000000000000000H
        dq 3FFF000000000000H 
F0003   label xmmword
        dq 3FF0000000000000H 
        dq 4000000000000000H 
F0004   label xmmword
        dq 400000003F800000H 
        dq 4080000040400000H 
F0005   label xmmword
        dq 4400420040003C00H   
        dq 4800470046004500H   
F0006   label xmmword
        dq 0000000000000001H   
        dq 0000000000000000H   
F0007   label xmmword
        dq 0000000000000001H   
        dq 0000000000000002H   
F0008   label xmmword
        dq 0000000200000001H   
        dq 0000000400000003H   
F0009   label xmmword
        dd 00020001H, 00040003H
        dd 00060005H, 00080007H
F000A   label xmmword
        dq 0807060504030201H
        dq 100F0E0D0C0B0A09H